期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Special Issue on New Theoretical Challenges in Machine Learning

Avrim Blum Philip M. Long 《Algorithmica》2015,72(1):191-192

相似文献

2.

Reducing mechanism design to algorithm design via machine learning

Maria-Florina Balcan Avrim Blum Jason D. Hartline 《Journal of Computer and System Sciences》2008,74(8):1245-1270

We use techniques from sample-complexity in machine learning to reduce problems of incentive-compatible mechanism design to standard algorithmic questions, for a broad class of revenue-maximizing pricing problems. Our reductions imply that for these problems, given an optimal (or β-approximation) algorithm for an algorithmic pricing problem, we can convert it into a (1+?)-approximation (or β(1+?)-approximation) for the incentive-compatible mechanism design problem, so long as the number of bidders is sufficiently large as a function of an appropriate measure of complexity of the class of allowable pricings. We apply these results to the problem of auctioning a digital good, to the attribute auction problem which includes a wide variety of discriminatory pricing problems, and to the problem of item-pricing in unlimited-supply combinatorial auctions. From a machine learning perspective, these settings present several challenges: in particular, the “loss function” is discontinuous, is asymmetric, and has a large range. We address these issues in part by introducing a new form of covering-number bound that is especially well-suited to these problems and may be of independent interest. 相似文献

3.

Universal Portfolios With and Without Transaction Costs 总被引：1，自引：0，他引：1

Blum Avrim Kalai Adam 《Machine Learning》1999,35(3):193-205

A constant rebalanced portfolio is an investment strategy which keeps the same distribution of wealth among a set of stocks from period to period. Recently there has been work on on-line investment strategies that are competitive with the best constant rebalanced portfolio determined in hindsight (Cover, 1991, 1996; Helmbold et al., 1996; Cover & Ordentlich, 1996a, 1996b; Ordentlich & Cover, 1996). For the universal algorithm of Cover (Cover, 1991),we provide a simple analysis which naturallyextends to the case of a fixed percentage transaction cost (commission ), answering a question raised in (Cover, 1991; Helmbold et al., 1996; Cover & Ordentlich, 1996a, 1996b; Ordentlich & Cover, 1996; Cover, 1996). In addition, we present a simple randomized implementation that is significantly faster in practice. We conclude by explaining how these algorithms can be applied to other problems, such as combining the predictions of statistical language models, where the resulting guarantees are more striking. 相似文献

4.

Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain 总被引：1，自引：0，他引：1

Blum Avrim 《Machine Learning》1997,26(1):5-23

This paper describes experimental results on using Winnow and Weighted-Majority based algorithms on a real-world calendar scheduling domain. These two algorithms have been highly studied in the theoretical machine learning literature. We show here that these algorithms can be quite competitive practically, outperforming the decision-tree approach currently in use in the Calendar Apprentice system in terms of both accuracy and speed. One of the contributions of this paper is a new variant on the Winnow algorithm (used in the experiments) that is especially suited to conditions with string-valued classifications, and we give a theoretical analysis of its performance. In addition we show how Winnow can be applied to achieve a good accuracy/coverage tradeoff and explore issues that arise such as concept drift. We also provide an analysis of a policy for discarding predictors in Weighted-Majority that allows it to speed up as it learns. 相似文献

5.

Kernels as features: On kernels, margins, and low-dimensional mappings

Maria-Florina Balcan Avrim Blum Santosh Vempala 《Machine Learning》2006,65(1):79-94

Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if the result is linearly-separable by a large margin γ. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only $\tilde{O}(1/\gamma^2)Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if the result is linearly-separable by a large margin γ. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only . In this paper, we explore the question of whether one can efficiently produce such low-dimensional mappings, using only black-box access to a kernel function. That is, given just a program that computes K(x,y) on inputs x,y of our choosing, can we efficiently construct an explicit (small) set of features that effectively capture the power of the implicit high-dimensional space? We answer this question in the affirmative if our method is also allowed black-box access to the underlying data distribution (i.e., unlabeled examples). We also give a lower bound, showing that if we do not have access to the distribution, then this is not possible for an arbitrary black-box kernel function; we leave as an open problem, however, whether this can be done for standard kernel functions such as the polynomial kernel. Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can efficiently generate an explicit set of features, such that if the data was linearly separable with margin γ under the kernel, then it is approximately separable in this new feature space. Editor: Philip Long A preliminary version of this paper appeared in Proceedings of the 15th International Conference on Algorithmic Learning Theory. Springer LNAI 3244, pp. 194–205, 2004. 相似文献

6.

A Note on Learning from Multiple-Instance Examples 总被引：7，自引：0，他引：7

Blum Avrim Kalai Adam 《Machine Learning》1998,30(1):23-29

We describe a simple reduction from the problem of PAC-learning from multiple-instance examples to that of PAC-learning with one-sided random classification noise. Thus, all concept classes learnable with one-sided noise, which includes all concepts learnable in the usual 2-sided random noise model plus others such as the parity function, are learnable from multiple-instance examples. We also describe a more efficient (and somewhat technically more involved) reduction to the Statistical-Query model that results in a polynomial-time algorithm for learning axis-parallel rectangles with sample complexity Õ(d²r/²) , saving roughly a factor of r over the results of Auer et al. (1997). 相似文献

7.

A theory of learning with similarity functions

Maria-Florina Balcan Avrim Blum Nathan Srebro 《Machine Learning》2008,72(1-2):89-112

Kernel functions have become an extremely popular tool in machine learning, with an attractive theory as well. This theory views a kernel as implicitly mapping data points into a possibly very high dimensional space, and describes a kernel function as being good for a given learning problem if data is separable by a large margin in that implicit space. However, while quite elegant, this theory does not necessarily correspond to the intuition of a good kernel as a good measure of similarity, and the underlying margin in the implicit space usually is not apparent in “natural” representations of the data. Therefore, it may be difficult for a domain expert to use the theory to help design an appropriate kernel for the learning task at hand. Moreover, the requirement of positive semi-definiteness may rule out the most natural pairwise similarity functions for the given problem domain. In this work we develop an alternative, more general theory of learning with similarity functions (i.e., sufficient conditions for a similarity function to allow one to learn well) that does not require reference to implicit spaces, and does not require the function to be positive semi-definite (or even symmetric). Instead, our theory talks in terms of more direct properties of how the function behaves as a similarity measure. Our results also generalize the standard theory in the sense that any good kernel function under the usual definition can be shown to also be a good similarity function under our definition (though with some loss in the parameters). In this way, we provide the first steps towards a theory of kernels and more general similarity functions that describes the effectiveness of a given function in terms of natural similarity-based properties. 相似文献

8.

Correlation Clustering

Nikhil Bansal Avrim Blum Shuchi Chawla 《Machine Learning》2004,56(1-3):89-113

We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or ? depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of ? edges between clusters (equivalently, minimizes the number of disagreements: the number of ? edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of “agnostic learning” problem. An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter, as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between 1 and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS, building on ideas of Goldreich, Goldwasser, and Ron (1998) and de la Veg (1996). We also show how to extend some of these results to graphs with edge labels in [?1, +1], and give some results for the case of random noise. 相似文献

9.

Learning Boolean Functions in an Infinite Attribute Space 总被引：2，自引：1，他引：1

Avrim Blum 《Machine Learning》1992,9(4):373-386

This paper presents a theoretical model for learning Boolean functions in domains having a large, potentially infinite number of attributes. The model allows an algorithm to employ a rich vocabulary to describe the objects it encounters in the world without necessarily incurring time and space penalties so long as each individual object is relatively simple. We show that many of the basic Boolean functions learnable in standard theoretical models, such as conjunctions, disjunctions, K-CNF, and K-DNF, are still learnable in the new model, though by algorithms no longer quite so trivial as before. The new model forces algorithms for such classes to act in a manner that appears more natural for many learning scenarios. 相似文献

10.

Center-based clustering under perturbation stability

Pranjal Awasthi Avrim Blum Or Sheffet 《Information Processing Letters》2012,112(1-2):49-54

相似文献