首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Concept classes can canonically be represented by matrices with entries 1 and –1. We use the singular value decomposition of this matrix to determine the optimal margins of embeddings of the concept classes of singletons and of half intervals in homogeneous Euclidean half spaces. For these concept classes the singular value decomposition can be used to construct optimal embeddings and also to prove the corresponding best possible upper bounds on the margin. We show that the optimal margin for embedding n singletons is and that the optimal margin for half intervals over {1,...,n} is . For the upper bounds on the margins we generalize a bound by Forster (2001). We also determine the optimal margin of some concept classes defined by circulant matrices up to a small constant factor, and we discuss the concept classes of monomials to point out limitations of our approach.  相似文献   

3.
In constructing local Fourier bases and in solving differential equations with nonperiodic solutions through Fourier spectral algorithms, it is necessary to solve the Fourier Extension Problem. This is the task of extending a nonperiodic function, defined on an interval , to a function which is periodic on the larger interval . We derive the asymptotic Fourier coefficients for an infinitely differentiable function which is one on an interval , identically zero for , and varies smoothly in between. Such smoothed “top-hat” functions are “bells” in wavelet theory. Our bell is (for x ≥ 0) where where . By applying steepest descents to approximate the coefficient integrals in the limit of large degree j, we show that when the width L is fixed, the Fourier cosine coefficients a j of on are proportional to where Λ(j) is an oscillatory factor of degree given in the text. We also show that to minimize error in a Fourier series truncated after the Nth term, the width should be chosen to increase with N as . We derive similar asymptotics for the function f(x)=x as extended by a more sophisticated scheme with overlapping bells; this gives an even faster rate of Fourier convergence  相似文献   

4.
5.
Gat  Yoram 《Machine Learning》2003,53(1-2):5-21
Classifiers are often constructed iteratively by introducing changes sequentially to an initial classifier. Langford and Blum (COLT'99: Proceedings of the 12th Annual Conference on Computational Learning Theory, 1999, San Mateo, CA: Morgan Kaufmann, pp. 209–214) take advantage of this structure (the microchoice structure), to obtain bounds for the generalization ability of such algorithms. These bounds can be sharper than more general bounds. This paper extends the applicability of the microchoice approach to the more realistic case where the classifier space is continuous and the sequence of changes is not restricted to a pre-fixed finite set.Proving the microchoice bound in the continuous case relies on a conditioning technique that is often used in proving VC results. It is shown how this technique can be used to convert any learning algorithm over a continuous space into a family of algorithms over discrete spaces.The new continuous microchoice result is applied to obtain a bound for the generalization ability of the perceptron algorithm. The greedy nature of the perceptron algorithm, which generates new classifiers by introducing corrections based on misclassified points, is exploited to obtain a generalization bound that has an asymptotic form of O( ), where n is the training set size.  相似文献   

6.
In this paper, we continue our study of learning an optimal kernel in a prescribed convex set of kernels (Micchelli & Pontil, 2005) . We present a reformulation of this problem within a feature space environment. This leads us to study regularization in the dual space of all continuous functions on a compact domain with values in a Hilbert space with a mix norm. We also relate this problem in a special case to regularization. Editors: Olivier Bousquet and Andre Elisseeff This work was supported by NSF Grant ITR-0312113, EPSRC Grant GR/T18707/01 and by the IST Programme of the European Community, under the PASCAL Network of Excellence IST-2002-506778.  相似文献   

7.
In statistical analysis of measurement results, it is often beneficial to compute the range V of the population variance when we only know the intervals of possible values of the xi. In general, this problem is NP-hard; a polynomialtime algorithm is known for the case when the measurements are sufficiently accurate, i.e., when for all In this paper, we show that we can efficiently compute V under a weaker (and more general) condition .  相似文献   

8.
We prove two main results on how arbitrary linear threshold functions ${f(x) = {\rm sign}(w \cdot x - \theta)}$ over the n-dimensional Boolean hypercube can be approximated by simple threshold functions. Our first result shows that every n-variable threshold function f is ${\epsilon}$ -close to a threshold function depending only on ${{\rm Inf}(f)^2 \cdot {\rm poly}(1/\epsilon)}$ many variables, where ${{\rm Inf}(f)}$ denotes the total influence or average sensitivity of f. This is an exponential sharpening of Friedgut’s well-known theorem (Friedgut in Combinatorica 18(1):474–483, 1998), which states that every Boolean function f is ${\epsilon}$ -close to a function depending only on ${2^{O({\rm Inf}(f)/\epsilon)}}$ many variables, for the case of threshold functions. We complement this upper bound by showing that ${\Omega({\rm Inf}(f)^2 + 1/\epsilon^2)}$ many variables are required for ${\epsilon}$ -approximating threshold functions. Our second result is a proof that every n-variable threshold function is ${\epsilon}$ -close to a threshold function with integer weights at most ${{\rm poly}(n) \cdot 2^{\tilde{O}(1/\epsilon^{2/3})}.}$ This is an improvement, in the dependence on the error parameter ${\epsilon}$ , on an earlier result of Servedio (Comput Complex 16(2):180–209, 2007) which gave a ${{\rm poly}(n) \cdot 2^{\tilde{O}(1/\epsilon^{2})}}$ bound. Our improvement is obtained via a new proof technique that uses strong anti-concentration bounds from probability theory. The new technique also gives a simple and modular proof of the original result of Servedio (Comput Complex 16(2):180–209, 2007) and extends to give low-weight approximators for threshold functions under a range of probability distributions other than the uniform distribution.  相似文献   

9.
Auer  Peter  Warmuth  Manfred K. 《Machine Learning》1998,32(2):127-150
Littlestone developed a simple deterministic on-line learning algorithm for learning k-literal disjunctions. This algorithm (called ) keeps one weight for each of then variables and does multiplicative updates to its weights. We develop a randomized version of and prove bounds for an adaptation of the algorithm for the case when the disjunction may change over time. In this case a possible target disjunction schedule is a sequence of disjunctions (one per trial) and the shift size is the total number of literals that are added/removed from the disjunctions as one progresses through the sequence.We develop an algorithm that predicts nearly as well as the best disjunction schedule for an arbitrary sequence of examples. This algorithm that allows us to track the predictions of the best disjunction is hardly more complex than the original version. However, the amortized analysis needed for obtaining worst-case mistake bounds requires new techniques. In some cases our lower bounds show that the upper bounds of our algorithm have the right constant in front of the leading term in the mistake bound and almost the right constant in front of the second leading term. Computer experiments support our theoretical findings.  相似文献   

10.
The present paper proposes a new learning model—called stochastic finite learning—and shows the whole class of pattern languages to be learnable within this model.This main result is achieved by providing a new and improved average-case analysis of the Lange–Wiehagen (New Generation Computing, 8, 361–370) algorithm learning the class of all pattern languages in the limit from positive data. The complexity measure chosen is the total learning time, i.e., the overall time taken by the algorithm until convergence. The expectation of the total learning time is carefully analyzed and exponentially shrinking tail bounds for it are established for a large class of probability distributions. For every pattern containing k different variables it is shown that Lange and Wiehagen's algorithm possesses an expected total learning time of , where and are two easily computable parameters arising naturally from the underlying probability distributions, and E[] is the expected example string length.Finally, assuming a bit of domain knowledge concerning the underlying class of probability distributions, it is shown how to convert learning in the limit into stochastic finite learning.  相似文献   

11.
In this article we give several new results on the complexity of algorithms that learn Boolean functions from quantum queries and quantum examples.
  Hunziker et al.[Quantum Information Processing, to appear] conjectured that for any class C of Boolean functions, the number of quantum black-box queries which are required to exactly identify an unknown function from C is , where is a combinatorial parameter of the class C. We essentially resolve this conjecture in the affirmative by giving a quantum algorithm that, for any class C, identifies any unknown function from C using quantum black-box queries.
  We consider a range of natural problems intermediate between the exact learning problem (in which the learner must obtain all bits of information about the black-box function) and the usual problem of computing a predicate (in which the learner must obtain only one bit of information about the black-box function). We give positive and negative results on when the quantum and classical query complexities of these intermediate problems are polynomially related to each other.
  Finally, we improve the known lower bounds on the number of quantum examples (as opposed to quantum black-box queries) required for ɛ, Δ-PAC learning any concept class of Vapnik-Chervonenkis dimension d over the domain from to . This new lower bound comes closer to matching known upper bounds for classical PAC learning.
Pacs: 03.67.Lx, 89.80.+h, 02.70.-c  相似文献   

12.
We propose a general-purpose stochastic optimization algorithm, the so-called annealing stochastic approximation Monte Carlo (ASAMC) algorithm, for neural network training. ASAMC can be regarded as a space annealing version of the stochastic approximation Monte Carlo (SAMC) algorithm. Under mild conditions, we show that ASAMC can converge weakly at a rate of Ω toward a neighboring set (in the space of energy) of the global minimizers. ASAMC is compared with simulated annealing, SAMC, and the BFGS algorithm for training MLPs on a number of examples. The numerical results indicate that ASAMC outperforms the other algorithms in both training and test errors. Like other stochastic algorithms, ASAMC requires longer training time than do the gradient-based algorithms. It provides, however, an efficient approach to train MLPs for which the energy landscape is rugged. Action Editor: Risto Miikkulainen.  相似文献   

13.
In this paper, we give subexponential size hitting sets for bounded depth multilinear arithmetic formulas. Using the known relation between black-box PIT and lower bounds, we obtain lower bounds for these models.For depth-3 multilinear formulas, of size exp\({(n^\delta)}\), we give a hitting set of size exp\({\left(\tilde{O}\left(n^{2/3 + 2\delta/3}\right) \right)}\). This implies a lower bound of exp\({(\tilde{\Omega}(n^{1/2}))}\) for depth-3 multilinear formulas, for some explicit polynomial.For depth-4 multilinear formulas, of size exp\({(n^\delta)}\), we give a hitting set of size exp\({\left(\tilde{O}\left(n^{2/3 + 4\delta/3}\right) \right)}\). This implies a lower bound of exp\({(\tilde{\Omega}(n^{1/4}))}\) for depth-4 multilinear formulas, for some explicit polynomial.A regular formula consists of alternating layers of \({+,\times}\) gates, where all gates at layer i have the same fan-in. We give a hitting set of size (roughly) exp\({\left(n^{1- \delta}\right)}\), for regular depth-d multilinear formulas with formal degree at most n and size exp\({(n^\delta)}\), where \({\delta = O(1/{\sqrt{5}^d})}\). This result implies a lower bound of roughly exp\({(\tilde{\Omega}(n^{1/{\sqrt{5}^d}}))}\) for such formulas.We note that better lower bounds are known for these models, but also that none of these bounds was achieved via construction of a hitting set. Moreover, no lower bound that implies such PIT results, even in the white-box model, is currently known.Our results are combinatorial in nature and rely on reducing the underlying formula, first to a depth-4 formula, and then to a read-once algebraic branching program (from depth-3 formulas, we go straight to read-once algebraic branching programs).  相似文献   

14.
This paper considers the problem of distributively constructing a minimum-weight spanning tree (MST) for graphs of constant diameter in the bounded-messages model, where each message can contain at most B bits for some parameter B. It is shown that the number of communication rounds necessary to compute an MST for graphs of diameter 4 or 3 can be as high as and , respectively. The asymptotic lower bounds hold for randomized algorithms as well. On the other hand, we observe that O(log n) communication rounds always suffice to compute an MST deterministically for graphs with diameter 2, when B = O(log n). These results complement a previously known lower bound of for graphs of diameter Ω(log n). An extended abstract of this work appears in Proceedings of 20th ACM Symposium on Principles of Distributed Computing, August 2001.  相似文献   

15.
16.
17.
An outer-connected dominating set in a graph G = (V, E) is a set of vertices D ? V satisfying the condition that, for each vertex v ? D, vertex v is adjacent to some vertex in D and the subgraph induced by V?D is connected. The outer-connected dominating set problem is to find an outer-connected dominating set with the minimum number of vertices which is denoted by \(\tilde {\gamma }_{c}(G)\). In this paper, we determine \(\tilde {\gamma }_{c}(S(n,k))\), \(\tilde {\gamma }_{c}(S^{+}(n,k))\), \(\tilde {\gamma }_{c}(S^{++}(n,k))\), and \(\tilde {\gamma }_{c}(S_{n})\), where S(n, k), S +(n, k), S ++(n, k), and S n are Sierpi\(\acute {\mathrm {n}}\)ski-like graphs.  相似文献   

18.
We consider the problem of approximately integrating a Lipschitz function f (with a known Lipschitz constant) over an interval. The goal is to achieve an additive error of at most ε using as few samples of f as possible. We use the adaptive framework: on all problem instances an adaptive algorithm should perform almost as well as the best possible algorithm tuned for the particular problem instance. We distinguish between and , the performances of the best possible deterministic and randomized algorithms, respectively. We give a deterministic algorithm that uses samples and show that an asymptotically better algorithm is impossible. However, any deterministic algorithm requires samples on some problem instance. By combining a deterministic adaptive algorithm and Monte Carlo sampling with variance reduction, we give an algorithm that uses at most samples. We also show that any algorithm requires samples in expectation on some problem instance (f,ε), which proves that our algorithm is optimal.  相似文献   

19.
Given a k-uniform hypergraph, the Maximum k -Set Packing problem is to find the maximum disjoint set of edges. We prove that this problem cannot be efficiently approximated to within a factor of unless P = NP. This improves the previous hardness of approximation factor of by Trevisan. This result extends to the problem of k-Dimensional-Matching.  相似文献   

20.
This work is concerned with online learning from expert advice. Extensive work on this problem generated numerous expert advice algorithms whose total loss is provably bounded above in terms of the loss incurred by the best expert in hindsight. Such algorithms were devised for various problem variants corresponding to various loss functions. For some loss functions, such as the square, Hellinger and entropy losses, optimal algorithms are known. However, for two of the most widely used loss functions, namely the 0/1 and absolute loss, there are still gaps between the known lower and upper bounds.In this paper we present two new expert advice algorithms and prove for them the best known 0/1 and absolute loss bounds. Given an expert advice algorithm ALG, the goal is to form an upper bound on the regret L ALGL* of ALG, where L ALG is the loss of ALG and L* is the loss of the best expert in hindsight. Typically, regret bounds of a canonical form C · are sought where N is the number of experts and C is a constant. So far, the best known constant for the absolute loss function is C = 2.83, which is achieved by the recent IAWM algorithm of Auer et al. (2002). For the 0/1 loss function no bounds of this canonical form are known and the best known regret bound is , where C 1 = e – 2 and C 2 = 2 . This bound is achieved by a P-norm algorithm of Gentile and Littlestone (1999). Our first algorithm is a randomized extension of the guess and double algorithm of Cesa-Bianchi et al. (1997). While the guess and double algorithm achieves a canonical regret bound with C = 3.32, the expected regret of our randomized algorithm is canonically bounded with C = 2.49 for the absolute loss function. The algorithm utilizes one random choice at the start of the game. Like the deterministic guess and double algorithm, a deficiency of our algorithm is that it occasionally restarts itself and therefore forgets what it learned. Our second algorithm does not forget and enjoys the best known asymptotic performance guarantees for both the absolute and 0/1 loss functions. Specifically, in the case of the absolute loss, our algorithm is canonically bounded with C approaching and in the case of the 0/1 loss, with C approaching 3/ . In the 0/1 loss case the algorithm is randomized and the bound is on the expected regret.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号