首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Recently, it has been shown that the regret of the Follow the Regularized Leader (FTRL) algorithm for online linear optimization can be bounded by the total variation of the cost vectors rather than the number of rounds. In this paper, we extend this result to general online convex optimization. In particular, this resolves an open problem that has been posed in a number of recent papers. We first analyze the limitations of the FTRL algorithm as proposed by Hazan and Kale (in Machine Learning 80(2–3), 165–188, 2010) when applied to online convex optimization, and extend the definition of variation to a gradual variation which is shown to be a lower bound of the total variation. We then present two novel algorithms that bound the regret by the gradual variation of cost functions. Unlike previous approaches that maintain a single sequence of solutions, the proposed algorithms maintain two sequences of solutions that make it possible to achieve a variation-based regret bound for online convex optimization. To establish the main results, we discuss a lower bound for FTRL that maintains only one sequence of solutions, and a necessary condition on smoothness of the cost functions for obtaining a gradual variation bound. We extend the main results three-fold: (i) we present a general method to obtain a gradual variation bound measured by general norm; (ii) we extend algorithms to a class of online non-smooth optimization with gradual variation bound; and (iii) we develop a deterministic algorithm for online bandit optimization in multipoint bandit setting.  相似文献   

2.
Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. This paper considers a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. We provide the first analysis of the expected regret for such algorithms. As expected, our results show that the algorithm that uses the variance estimates has a major advantage over its alternatives that do not use such estimates provided that the variances of the payoffs of the suboptimal arms are low. We also prove that the regret concentrates only at a polynomial rate. This holds for all the upper confidence bound based algorithms and for all bandit problems except those special ones where with probability one the payoff obtained by pulling the optimal arm is larger than the expected payoff for the second best arm. Hence, although upper confidence bound bandit algorithms achieve logarithmic expected regret rates, they might not be suitable for a risk-averse decision maker. We illustrate some of the results by computer simulations.  相似文献   

3.
Algorithm selection can be performed using a model of runtime distribution, learned during a preliminary training phase. There is a trade-off between the performance of model-based algorithm selection, and the cost of learning the model. In this paper, we treat this trade-off in the context of bandit problems. We propose a fully dynamic and online algorithm selection technique, with no separate training phase: all candidate algorithms are run in parallel, while a model incrementally learns their runtime distributions. A redundant set of time allocators uses the partially trained model to propose machine time shares for the algorithms. A bandit problem solver mixes the model-based shares with a uniform share, gradually increasing the impact of the best time allocators as the model improves. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark; and with a set of solvers for the Auction Winner Determination problem. This work was supported by SNF grant 200020-107590/1.  相似文献   

4.

We present algorithms for solving multi-armed and linear-contextual bandit tasks in the face of adversarial corruptions in the arm responses. Traditional algorithms for solving these problems assume that nothing but mild, e.g., i.i.d. sub-Gaussian, noise disrupts an otherwise clean estimate of the utility of the arm. This assumption and the resulting approaches can fail catastrophically if there is an observant adversary that corrupts even a small fraction of the responses generated when arms are pulled. To rectify this, we propose algorithms that use recent advances in robust statistical estimation to perform arm selection in polynomial time. Our algorithms are easy to implement and vastly outperform several existing UCB and EXP-style algorithms for stochastic and adversarial multi-armed and linear-contextual bandit problems in wide variety of experimental settings. Our algorithms enjoy minimax-optimal regret bounds, as well as can tolerate an adversary that is allowed to corrupt upto a universally constant fraction of the arms pulled by the algorithm.

  相似文献   

5.
Intuitively, it is clear that trust or shared taste enables a community of users to make better decisions over time, by learning cooperatively and avoiding one another's mistakes. However, it is also clear that the presence of malicious, dishonest users in the community threatens the usefulness of such collaborative learning processes. We investigate this issue by developing algorithms for a multi-user online learning problem in which each user makes a sequence of decisions about selecting products or resources. Our model, which generalizes the adversarial multi-armed bandit problem, is characterized by two key features:
(1)
The quality of the products or resources may vary over time.
(2)
Some of the users in the system may be dishonest, Byzantine agents.
Decision problems with these features underlie applications such as reputation and recommendation systems in e-commerce, and resource location systems in peer-to-peer networks. Assuming the number of honest users is at least a constant fraction of the number of resources, and that the honest users can be partitioned into groups such that individuals in a group make identical assessments of resources, we present an algorithm whose expected regret per user is linear in the number of groups and only logarithmic in the number of resources. This bound compares favorably with the naïve approach in which each user ignores feedback from peers and chooses resources using a multi-armed bandit algorithm; in this case the expected regret per user would be polynomial in the number of resources.  相似文献   

6.
This paper studies an online linear optimization problem generalizing the multi-armed bandit problem. Motivated primarily by the task of designing adaptive routing algorithms for overlay networks, we present two randomized online algorithms for selecting a sequence of routing paths in a network with unknown edge delays varying adversarially over time. In contrast with earlier work on this problem, we assume that the only feedback after choosing such a path is the total end-to-end delay of the selected path. We present two algorithms whose regret is sublinear in the number of trials and polynomial in the size of the network. The first of these algorithms generalizes to solve any online linear optimization problem, given an oracle for optimizing linear functions over the set of strategies; our work may thus be interpreted as a general-purpose reduction from offline to online linear optimization. A key element of this algorithm is the notion of a barycentric spanner, a special type of basis for the vector space of strategies which allows any feasible strategy to be expressed as a linear combination of basis vectors using bounded coefficients.We also present a second algorithm for the online shortest path problem, which solves the problem using a chain of online decision oracles, one at each node of the graph. This has several advantages over the online linear optimization approach. First, it is effective against an adaptive adversary, whereas our linear optimization algorithm assumes an oblivious adversary. Second, even in the case of an oblivious adversary, the second algorithm performs slightly better than the first, as measured by their additive regret.  相似文献   

7.
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. One of the main results in the case of a finite number of arms is a general lower bound on the simple regret of a forecaster in terms of its cumulative regret: the smaller the latter, the larger the former. Keeping this result in mind, we then exhibit upper bounds on the simple regret of some forecasters. The paper ends with a study devoted to continuous-armed bandit problems; we show that the simple regret can be minimized with respect to a family of probability distributions if and only if the cumulative regret can be minimized for it. Based on this equivalence, we are able to prove that the separable metric spaces are exactly the metric spaces on which these regrets can be minimized with respect to the family of all probability distributions with continuous mean-payoff functions.  相似文献   

8.
Noisy optimization is the optimization of objective functions corrupted by noise. A portfolio of solvers is a set of solvers equipped with an algorithm selection tool for distributing the computational power among them. Portfolios are widely and successfully used in combinatorial optimization. In this work, we study portfolios of noisy optimization solvers. We obtain mathematically proved performance (in the sense that the portfolio performs nearly as well as the best of its solvers) by an ad hoc portfolio algorithm dedicated to noisy optimization. A somehow surprising result is that it is better to compare solvers with some lag, i.e., propose the current recommendation of best solver based on their performance earlier in the run. An additional finding is a principled method for distributing the computational power among solvers in the portfolio.  相似文献   

9.
In the multiarmed bandit problem the dilemma between exploration and exploitation in reinforcement learning is expressed as a model of a gambler playing a slot machine with multiple arms. A policy chooses an arm in each round so as to minimize the number of times that arms with suboptimal expected rewards are pulled. We propose the minimum empirical divergence (MED) policy and derive an upper bound on the finite-time regret which meets the asymptotic bound for the case of finite support models. In a setting similar to ours, Burnetas and Katehakis have already proposed an asymptotically optimal policy. However, we do not assume any knowledge of the support except for its upper and lower bounds. Furthermore, the criterion for choosing an arm, minimum empirical divergence, can be computed easily by a convex optimization technique. We confirm by simulations that the MED policy demonstrates good performance in finite time in comparison to other currently popular policies.  相似文献   

10.
A memetic approach that combines a genetic algorithm (GA) and quadratic programming is used to address the problem of optimal portfolio selection with cardinality constraints and piecewise linear transaction costs. The framework used is an extension of the standard Markowitz mean–variance model that incorporates realistic constraints, such as upper and lower bounds for investment in individual assets and/or groups of assets, and minimum trading restrictions. The inclusion of constraints that limit the number of assets in the final portfolio and piecewise linear transaction costs transforms the selection of optimal portfolios into a mixed-integer quadratic problem, which cannot be solved by standard optimization techniques. We propose to use a genetic algorithm in which the candidate portfolios are encoded using a set representation to handle the combinatorial aspect of the optimization problem. Besides specifying which assets are included in the portfolio, this representation includes attributes that encode the trading operation (sell/hold/buy) performed when the portfolio is rebalanced. The results of this hybrid method are benchmarked against a range of investment strategies (passive management, the equally weighted portfolio, the minimum variance portfolio, optimal portfolios without cardinality constraints, ignoring transaction costs or obtained with L1 regularization) using publicly available data. The transaction costs and the cardinality constraints provide regularization mechanisms that generally improve the out-of-sample performance of the selected portfolios.  相似文献   

11.
We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this “Sleeping Experts” problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.  相似文献   

12.
The Feedback Vertex Set problem on unweighted, undirected graphs is considered. Improving upon a result by Burrage et al. (Proceedings 2nd International Workshop on Parameterized and Exact Computation, pp. 192–202, 2006), we show that this problem has a kernel with O(k 3) vertices, i.e., there is a polynomial time algorithm, that given a graph G and an integer k, finds a graph G′ with O(k 3) vertices and integer k′≤k, such that G has a feedback vertex set of size at most k, if and only if G′ has a feedback vertex set of size at most k′. Moreover, the algorithm can be made constructive: if the reduced instance G′ has a feedback vertex set of size k′, then we can easily transform a minimum size feedback vertex set of G′ into a minimum size feedback vertex set of G. This kernelization algorithm can be used as the first step of an FPT algorithm for Feedback Vertex Set, but also as a preprocessing heuristic for Feedback Vertex Set.  相似文献   

13.
This paper considers a sparse portfolio rebalancing problem in which rebalancing portfolios with minimum number of assets are sought. This problem is motivated by the need to understand whether the initial portfolio is worthwhile to adjust or not, inducing sparsity on the selected rebalancing portfolio to reduce transaction costs (TCs), out-of-sample performance and small changes in portfolio. We propose a sparse portfolio rebalancing model by adding an l1 penalty item into the objective function of a general portfolio rebalancing model. In this way, the model is sparse with low TCs and can decide whether and which assets to adjust based on inverse optimization. Numerical tests on four typical data sets show that the optimal adjustment given by the proposed sparse portfolio rebalancing model has the advantage of sparsity and better out-of-sample performance than the general portfolio rebalancing model.  相似文献   

14.
The distributed online optimization (DOO) problem with privacy-preserving properties over multiple agents is considered in this paper, where the network model is built by a strongly connected directed graph. To solve this problem, a stochastic bandit DOO algorithm based on differential privacy is proposed. This algorithm uses row- and column-stochastic matrix as the weighting matrices, the requirement of the double random weighting matrix is released. To handle the unknown objective function, the one-point bandit is used to estimate the true gradient information, and the estimated gradient information is used to update of decision variables. Different from the existing DOO algorithms that ignore privacy issues, this algorithm successfully protects the privacy of nodes through a differential privacy policy. Theoretical results show that the algorithm can not only achieve sublinear regret bounds but also protect the privacy of nodes. Finally, simulation results verify the effectiveness of the algorithm.  相似文献   

15.
This work addresses the problem of finding the maximum number of unweighted vertex-disjoint triangles in an undirected graph G. It is a challenging NP-hard combinatorial problem and it is well-known to be APX-hard. A branch-and-bound algorithm which uses a lower bound based on neighborhood degree is presented. A naive upper bound is proposed as well as another one based on a surrogate relaxation of the related integer linear program which is analogous to a multidimensional knapsack problem. Further, a Greedy Search algorithm and a genetic algorithm are described to improve the lower bound. A computational comparison of lower bounds, branch-and-bound algorithm and CPLEX solver is provided using randomly generated benchmarks and well-known DIMACS implementation challenges. The empirical study shows that the branch-and-bound finds the optimal triangle packing solution for small randomly generated MTP instances (up to 100 vertices and 200 triangles) and some DIMACS graphs. For some larger instances and DIMACS challenges graphs, we remark that our lower bound outperforms CPLEX solver regarding the triangle packing solution and the computation time.  相似文献   

16.
We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.  相似文献   

17.
We study online regret minimization algorithms in an experts setting. In this setting, the algorithm chooses a distribution over experts at each time step and receives a gain that is a weighted average of the experts’ instantaneous gains. We consider a bicriteria setting, examining not only the standard notion of regret to the best expert, but also the regret to the average of all experts, the regret to any given fixed mixture of experts, or the regret to the worst expert. This study leads both to new understanding of the limitations of existing no-regret algorithms, and to new algorithms with novel performance guarantees. More specifically, we show that any algorithm that achieves only $O(\sqrt{T})$ cumulative regret to the best expert on a sequence of T trials must, in the worst case, suffer regret $\varOmega(\sqrt{T})$ to the average, and that for a wide class of update rules that includes many existing no-regret algorithms (such as Exponential Weights and Follow the Perturbed Leader), the product of the regret to the best and the regret to the average is, in the worst case, Ω(T). We then describe and analyze two alternate new algorithms that both achieve cumulative regret only $O(\sqrt{T}\log T)$ to the best expert and have only constant regret to any given fixed distribution over experts (that is, with no dependence on either T or the number of experts N). The key to the first algorithm is the gradual increase in the “aggressiveness” of updates in response to observed divergences in expert performances. The second algorithm is a simple twist on standard exponential-update algorithms.  相似文献   

18.
A graph G is said to be a bicluster graph if G is a disjoint union of bicliques (complete bipartite subgraphs), and a cluster graph if G is a disjoint union of cliques (complete subgraphs). In this work, we study the parameterized versions of the NP-hard Bicluster Graph Editing and Cluster Graph Editing problems. The former consists of obtaining a bicluster graph by making the minimum number of modifications in the edge set of an input bipartite graph. When at most k modifications are allowed (Bicluster(k) Graph Editing problem), this problem is FPT, and can be solved in O(4 k nm) time by a standard search tree algorithm. We develop an algorithm of time complexity O(4 k +n+m), which uses a strategy based on modular decomposition techniques; we slightly generalize the original problem as the input graph is not necessarily bipartite. The algorithm first builds a problem kernel with O(k 2) vertices in O(n+m) time, and then applies a bounded search tree. We also show how this strategy based on modular decomposition leads to a new way of obtaining a problem kernel with O(k 2) vertices for the Cluster(k) Graph Editing problem, in O(n+m) time. This problem consists of obtaining a cluster graph by modifying at most k edges in an input graph. A previous FPT algorithm of time O(1.92 k +n 3) for this problem was presented by Gramm et al. (Theory Comput. Syst. 38(4), 373–392, 2005, Algorithmica 39(4), 321–347, 2004). In their solution, a problem kernel with O(k 2) vertices is built in O(n 3) time.  相似文献   

19.
Consider an n-vertex planar graph G. The depth of an embedding Γ of G is the maximum distance of its internal faces from the external one. Several researchers pointed out that the quality of a planar embedding can be measured in terms of its depth. We present an O(n 4)-time algorithm for computing an embedding of G with minimum depth. This bound improves on the best previous bound by an O(nlog n) factor. As a side effect, our algorithm improves the bounds of several algorithms that require the computation of a minimum-depth embedding.  相似文献   

20.
A multi-armed bandit episode consists of n trials, each allowing selection of one of K arms, resulting in payoff from a distribution over [0,1] associated with that arm. We assume contextual side information is available at the start of the episode. This context enables an arm predictor to identify possible favorable arms, but predictions may be imperfect so that they need to be combined with further exploration during the episode. Our setting is an alternative to classical multi-armed bandits which provide no contextual side information, and is also an alternative to contextual bandits which provide new context each individual trial. Multi-armed bandits with episode context can arise naturally, for example in computer Go where context is used to bias move decisions made by a multi-armed bandit algorithm. The UCB1 algorithm for multi-armed bandits achieves worst-case regret bounded by \(O\left(\sqrt{Kn\log(n)}\right)\). We seek to improve this using episode context, particularly in the case where K is large. Using a predictor that places weight M i ?>?0 on arm i with weights summing to 1, we present the PUCB algorithm which achieves regret \(O\left(\frac{1}{M_{\ast}}\sqrt{n\log(n)}\right)\) where M ??? is the weight on the optimal arm. We illustrate the behavior of PUCB with small simulation experiments, present extensions that provide additional capabilities for PUCB, and describe methods for obtaining suitable predictors for use with PUCB.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号