共查询到20条相似文献,搜索用时 15 毫秒
1.
Minimax lower bounds for concept learning state, for example, that for each sample size n and learning rule g
n
, there exists a distribution of the observation X and a concept C to be learnt such that the expected error of g
n
is at least a constant times V/n, where V is the vc dimension of the concept class. However, these bounds do not tell anything about the rate of decrease of the error for a fixed distribution-concept pair.In this paper we investigate minimax lower bounds in such a (stronger) sense. We show that for several natural k-parameter concept classes, including the class of linear halfspaces, the class of balls, the class of polyhedra with a certain number of faces, and a class of neural networks, for any sequence of learning rules {g
n
}, there exists a fixed distribution of X and a fixed conceptC such that the expected error is larger than a constant timesk/n for infinitely many
n. We also obtain such strong minimax lower bounds for the tail distribution of the probability of error, which extend the corresponding minimax lower bounds. 相似文献
2.
Mark Braverman Ankit Garg Denis Pankratov Omri Weinstein 《Theory of Computing Systems》2016,58(2):377-379
We use self-reduction methods to prove strong information lower bounds on two of the most studied functions in the communication complexity literature: Gap Hamming Distance (GHD) and Inner Product (IP). In our first result we affirm the conjecture that the information cost of GHD is linear even under the uniform distribution, which strengthens the Ω(n) bound recently shown by Kerenidis et al. (2012), and answers an open problem from Chakrabarti et al. (2012). In our second result we prove that the information cost of IPn is arbitrarily close to the trivial upper bound n as the permitted error tends to zero, again strengthening the Ω(n) lower bound recently proved by Braverman and Weinstein (Electronic Colloquium on Computational Complexity (ECCC) 18, 164 2011). Our proofs demonstrate that self-reducibility makes the connection between information complexity and communication complexity lower bounds a two-way connection. Whereas numerous results in the past (Chakrabarti et al. 2001; Bar-Yossef et al. J. Comput. Syst. Sci. 68(4), 702–732 2004; Barak et al. 2010) used information complexity techniques to derive new communication complexity lower bounds, we explore a generic way in which communication complexity lower bounds imply information complexity lower bounds in a black-box manner. 相似文献
3.
Bar?? Ayd?nl?og?lu Dan Gutfreund John M. Hitchcock Akinori Kawachi 《Computational Complexity》2011,20(2):329-366
We show that if Arthur-Merlin protocols can be derandomized, then there is a language computable in deterministic exponential-time
with access to an NP oracle that requires circuits of exponential size. More formally, if every promise problem in prAM, the class of promise problems that have Arthur-Merlin protocols, can
be computed by a deterministic polynomial-time algorithm with access to an NP oracle, then there is a language in ENP that requires circuits of size Ω(2
n
/n). The lower bound in the conclusion of our theorem suffices to construct pseudorandom generators with exponential stretch. 相似文献
4.
In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables. 相似文献
5.
We develop a new technique for proving lower bounds in property testing, by showing a strong connection between testing and communication complexity. We give a simple scheme for reducing communication problems to testing problems, thus allowing us to use known lower bounds in communication complexity to prove lower bounds in testing. This scheme is general and implies a number of new testing bounds, as well as simpler proofs of several known bounds. For the problem of testing whether a Boolean function is k-linear (a parity function on k variables), we achieve a lower bound of ??(k) queries, even for adaptive algorithms with two-sided error, thus confirming a conjecture of Goldreich (2010a). The same argument behind this lower bound also implies a new proof of known lower bounds for testing related classes such as k-juntas. For some classes, such as the class of monotone functions and the class of s-sparse GF(2) polynomials, we significantly strengthen the best known bounds. 相似文献
6.
The complexity of on-line learning is investigated for the basic classes of geometrical objects over a discrete (digitized) domain. In particular, upper and lower bounds are derived for the complexity of learning algorithms for axis-parallel rectangles, rectangles in general position, balls, halfspaces, intersections of half-spaces, and semi-algebraic sets. The learning model considered is the standard model for on-line learning from counterexamples. 相似文献
7.
k-Decision lists and decision trees play important roles in learning theory as well as in practical learning systems.k-Decision lists generalize classes such as monomials,k-DNF, andk-CNF, and like these subclasses they are polynomially PAC-learnable [R. Rivest,Mach. Learning2(1987), 229–246]. This leaves open the question of whetherk-decision lists can be learned as efficiently ask-DNF. We answer this question negatively in a certain sense, thus disproving a claim in a popular textbook [M. Anthony and N. Biggs, “Computational Learning Theory,” Cambridge Univ. Press, Cambridge, UK, 1992]. Decision trees, on the other hand, are not even known to be polynomially PAC-learnable, despite their widespread practical application. We will show that decision trees are not likely to be efficiently PAC-learnable. We summarize our specific results. The following problems cannot be approximated in polynomial time within a factor of 2logδ nfor anyδ<1, unlessNP⊂DTIME[2polylog n]: a generalized set cover,k-decision lists,k-decision lists by monotone decision lists, and decision trees. Decision lists cannot be approximated in polynomial time within a factor ofnδ, for some constantδ>0, unlessNP=P. Also,k-decision lists withl0–1 alternations cannot be approximated within a factor logl nunlessNP⊂DTIME[nO(log log n)] (providing an interesting comparison to the upper bound obtained by A. Dhagat and L. Hellerstein [in“FOCS '94,” pp. 64–74]). 相似文献
8.
9.
In this paper we introduce and illustrate non-trivial upper and lower bounds on the learning curves for one-dimensional Guassian Processes. The analysis is carried out emphasising the effects induced on the bounds by the smoothness of the random process described by the Modified Bessel and the Squared Exponential covariance functions. We present an explanation of the early, linearly-decreasing behavior of the learning curves and the bounds as well as a study of the asymptotic behavior of the curves. The effects of the noise level and the lengthscale on the tightness of the bounds are also discussed. 相似文献
10.
Clustering is a fundamental problem in unsupervised learning, and has been studied widely both as a problem of learning mixture models and as an optimization problem. In this paper, we study clustering with respect to the k-median objective function, a natural formulation of clustering in which we attempt to minimize the average distance to cluster centers. One of the main contributions of this paper is a simple but powerful sampling technique that we call successive sampling that could be of independent interest. We show that our sampling procedure can rapidly identify a small set of points (of size just O( $k \log \frac{n}{k}$ )) that summarize the input points for the purpose of clustering. Using successive sampling, we develop an algorithm for the k-median problem that runs in O(nk) time for a wide range of values of k and is guaranteed, with high probability, to return a solution with cost at most a constant factor times optimal. We also establish a lower bound of Ω(nk) on any randomized constant-factor approximation algorithm for the k-median problem that succeeds with even a negligible (say $\frac{1}{{100}}$ ) probability. The best previous upper bound for the problem was Õ(nk), where the Õ-notation hides polylogarithmic factors in n and k. The best previous lower bound of Ω(nk) applied only to deterministic k-median algorithms. While we focus our presentation on the k-median objective, all our upper bounds are valid for the k-means objective as well. In this context our algorithm compares favorably to the widely used k-means heuristic, which requires O(nk) time for just one iteration and provides no useful approximation guarantees. 相似文献
11.
This paper presents a general information-theoretic approach for obtaining lower bounds on the number of examples required for Probably Approximately Correct (PAC) learning in the presence of noise. This approach deals directly with the fundamental information quantities, avoiding a Bayesian analysis. The technique is applied to several different models, illustrating its generality and power. The resulting bounds add logarithmic factors to (or improve the constants in) previously known lower bounds. 相似文献
12.
Shigeki Iwata 《Information and Computation》2001,168(2):187
A lower bound theorem is established for the number of comparators in a merging network. Let M(m, n) be the least number of comparators required in the (m, n)-merging networks, and let C(m, n) be the number of comparators in Batcher's (m, n)-merging network, respectively. We prove for n≥1 that M(4, n)=C(4, n) for n≡0, 1, 3 mod 4, M(4, n)≥C(4, n)−1 for n≡2 mod 4, and M(5, n)=C(5, n) for n≡0, 1, 5 mod 8. Furthermore Batcher's (6, 8k+6)-, (7, 8k+7)-, and (8, 8k+8)-merging networks are optimal for k≥0. Our lower bound for (m, n)-merging networks, m≤n, has the same terms as C(m, n) has as far as n is concerned. Thus Batcher's (m, n)-merging network is optimal up to a constant number of comparators, where the constant depends only on m. An open problem posed by Yao and Yao (Lower bounds on merging networks, J. Assoc. Comput. Mach.23, 566–571) is solved: limn→∞M(m, n)/n=log m/2+m/2log m. 相似文献
13.
Gudmund Skovbjerg Frandsen Johan P. Hansen Peter Bro Miltersen 《Information and Computation》2001,171(2):333
We consider dynamic evaluation of algebraic functions (matrix multiplication, determinant, convolution, Fourier transform, etc.) in the model of Reif and Tate; i.e., if f(x1,…, xn)=(y1, …, ym) is an algebraic problem, we consider serving online requests of the form “change input xi to value v” or “what is the value of output yi?” We present techniques for showing lower bounds on the worst case time complexity per operation for such problems. The first gives lower bounds in a wide range of rather powerful models (for instance, history dependent algebraic computation trees over any infinite subset of a field, the integer RAM, and the generalized real RAM model of Ben-Amram and Galil). Using this technique, we show optimal Ω(n) bounds for dynamic matrix–vector product, dynamic matrix multiplication, and dynamic discriminant and an Ω(
) lower bound for dynamic polynomial multiplication (convolution), providing a good match with Reif and Tate's O(
) upper bound. We also show linear lower bounds for dynamic determinant, matrix adjoint, and matrix inverse and an Ω(
) lower bound for the elementary symmetric functions. The second technique is the communication complexity technique of Miltersen, Nisan, Safra, and Wigderson which we apply to the setting of dynamic algebraic problems, obtaining similar lower bounds in the word RAM model. The third technique gives lower bounds in the weaker straight line program model. Using this technique, we show an Ω((log n)2/log log n) lower bound for dynamic discrete Fourier transform. Technical ingredients of our techniques are the incompressibility technique of Ben-Amram and Galil and the lower bound for depth-two superconcentrators of Radhakrishnan and Ta-Shma. The incompressibility technique is extended to arithmetic computation in arbitrary fields. 相似文献
14.
P. D. MacKenzie 《Theory of Computing Systems》1997,30(6):599-626
In this paper we study the question: How useful is randomization in speeding up Exclusive Write PRAM computations? Our results
give further evidence that randomization is of limited use in these types of computations. First we examine a compaction problem
on both the CREW and EREW PRAM models, and we present randomized lower bounds which match the best deterministic lower bounds
known. (For the CREW PRAM model, the lower bound is asymptotically optimal.) These are the first nontrivial randomized lower
bounds known for the compaction problem on these models. We show that our lower bounds also apply to the problem of approximate
compaction. Next we examine the problem of computing boolean functions on the CREW PRAM model, and we present a randomized
lower bound which improves on the previous best randomized lower bound for many boolean functions, including the OR function.
(The previous lower bounds for these functions were asymptotically optimal, but we improve the constant multiplicative factor.)
We also give an alternate proof for the randomized lower bound on PARITY, which was already optimal to within a constant additive
factor. Lastly, we give a randomized lower bound for integer merging on an EREW PRAM which matches the best deterministic
lower bound known. In all our proofs, we use the Random Adversary method, which has previously only been used for proving
lower bounds on models with Concurrent Write capabilities. Thus this paper also serves to illustrate the power and generality
of this method for proving parallel randomized lower bounds.
Received October 2, 1995, and in final form January 29, 1997. 相似文献
15.
Sung-Soon Choi Kyomin Jung Byung-Ro Moon 《Evolutionary Computation, IEEE Transactions on》2009,13(2):201-216
For a real-valued function f defined on {0,1}n , the linkage graph of f is a hypergraph that represents the interactions among the input variables with respect to f . In this paper, lower and upper bounds for the number of function evaluations required to discover the linkage graph are rigorously analyzed in the black box scenario. First, a lower bound for discovering linkage graph is presented. To the best of our knowledge, this is the first result on the lower bound for linkage discovery. The investigation on the lower bound is based on Yao's minimax principle. For the upper bounds, a simple randomized algorithm for linkage discovery is analyzed. Based on the Kruskal-Katona theorem, we present an upper bound for discovering the linkage graph. As a corollary, we rigorously prove that O(n 2logn) function evaluations are enough for bounded functions when the number of hyperedges is O(n), which was suggested but not proven in previous works. To see the typical behavior of the algorithm for linkage discovery, three random models of fitness functions are considered. Using probabilistic methods, we prove that the number of function evaluations on the random models is generally smaller than the bound for the arbitrary case. Finally, from the relation between the linkage graph and the Walsh coefficients, it is shown that, for bounded functions, the proposed bounds are eventually the bounds for finding the Walsh coefficients. 相似文献
16.
17.
We derive general bounds on the complexity of learning in the statistical query (SQ) model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the SQ model. This new model was introduced by Kearns to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Since all SQ algorithms can be simulated in the PAC model with classification noise, we also obtain general upper bounds on learning in the presence of classification noise for classes which can be learned in the SQ model. 相似文献
18.
Foster and Vovk proved relative loss bounds for linear regression where the total loss of the on-line algorithm minus the total loss of the best linear predictor (chosen in hindsight) grows logarithmically with the number of trials. We give similar bounds for temporal-difference learning. Learning takes place in a sequence of trials where the learner tries to predict discounted sums of future reinforcement signals. The quality of the predictions is measured with the square loss and we bound the total loss of the on-line algorithm minus the total loss of the best linear predictor for the whole sequence of trials. Again the difference of the losses is logarithmic in the number of trials. The bounds hold for an arbitrary (worst-case) sequence of examples. We also give a bound on the expected difference for the case when the instances are chosen from an unknown distribution. For linear regression a corresponding lower bound shows that this expected bound cannot be improved substantially. 相似文献
19.
In this paper we present a new technique for computing lower bounds for graph treewidth. Our technique is based on the fact that the treewidth of a graph G is the maximum order of a bramble of G minus one. We give two algorithms: one for general graphs, and one for planar graphs. The algorithm for planar graphs is shown to give a lower bound for both the treewidth and branchwidth that is at most a constant factor away from the optimum. For both algorithms, we report on extensive computational experiments that show that the algorithms often give excellent lower bounds, in particular when applied to (close to) planar graphs. This work was partially supported by the Netherlands Organisation for Scientific Research NWO (project Treewidth and Combinatorial Optimisation) and partially by the DFG research group “Algorithms, Structure, Randomness” (Grant number GR 883/9-3, GR 883/9-4). 相似文献
20.