共查询到20条相似文献,搜索用时 967 毫秒
1.
O (h
n
−1
n
d
−1) instead of O(h
n
−d
) grid points and unknowns are involved. Here d denotes the dimension of the feature space and h
n
= 2
−n
gives the mesh size. To be precise, we suggest to use the sparse grid combination technique [42] where the classification
problem is discretized and solved on a certain sequence of conventional grids with uniform mesh sizes in each coordinate direction.
The sparse grid solution is then obtained from the solutions on these different grids by linear combination. In contrast to
other sparse grid techniques, the combination method is simpler to use and can be parallelized in a natural and straightforward
way. We describe the sparse grid combination technique for the classification problem in terms of the regularization network
approach. We then give implementational details and discuss the complexity of the algorithm. It turns out that the method
scales only linearly with the number of instances, i.e. the amount of data to be classified. Finally we report on the quality
of the classifier built by our new method. Here we consider standard test problems from the UCI repository and problems with
huge synthetical data sets in up to 9 dimensions. It turns out that our new method achieves correctness rates which are competitive
to that of the best existing methods.
Received April 25, 2001 相似文献
2.
In this paper, we consider the problem of generating all maximal cliques in a sparse graph in polynomial delay. Given a graph G=(V,E) with n vertices and m edges, the latest and fastest polynomial delay algorithm for sparse graphs enumerates all maximal cliques in O(Δ 4) time delay, where Δ is the maximum degree of vertices. However, it requires an O(n?m) preprocessing time. We improve it in two aspects. First, our algorithm does not need preprocessing. Therefore, our algorithm is a truly polynomial delay algorithm. Second, our algorithm enumerates all maximal cliques in O(Δ?H 3) time delay, where H is the so called H-value of a graph or equivalently it is the smallest integer satisfying |{v∈V∣δ(v)≥H}|≤H given δ(v) as the degree of a vertex. In real-world network data, H usually is a small value and much smaller than Δ. 相似文献
3.
AnO(¦E¦log2
n) algorithm is presented to construct the visibility graph for a collection ofn nonintersecting line segments, where ¦E¦ is the number of edges in the visibility graph. This algorithm is much faster than theO(n
2)-time andO(n
2)-space algorithms by Asanoet al., and by Welzl, on sparse visibility graphs. Thus we partially resolve an open problem raised by Welzl. Further, our algorithm uses onlyO(n) working storage. 相似文献
4.
This paper presents several deterministic algorithms for selecting the k th largest record from a set of n records on any n -node hypercubic network. All of the algorithms are based on the selection algorithm of Cole and Yap, as well as on various
sorting algorithms for hypercubic networks. Our fastest algorithm runs in O( lg n lg
*
n) time, very nearly matching the trivial lower bound. Previously, the best upper bound known for selection was O( lg n lg lg n) . A key subroutine in our O( lg n lg
*
n) time selection algorithm is a sparse version of the Sharesort algorithm that sorts n records using p processors, , in O( lg n ( lg lg p - lg lg (p/n) )
2
) time.
Received March 23, 1994; revised October 30, 1997. 相似文献
5.
In this paper we treat the static dictionary problem , very well known in computer science. It consists in storing a set S of m elements in the range [1 . . . n ] so that membership queries on S 's elements can be handled in O(1) time. It can be approached as a table compression problem in which a size n table has m ones and the other elements are zeros. We focus our attention on sparse case (m
n ). We use a simple algorithm to solve the problem and make an average-case analysis of the total space required when the
input derives from uniform probability distribution. We also find some conditions able to minimize storage requirements. We
then propose and analyze a new algorithm able to reduce storage requirements drastically to O(m
4/3
) .
Received December 1, 1997; revised March 1, 1998. 相似文献
6.
Concept Decompositions for Large Sparse Text Data Using Clustering 总被引:27,自引:0,他引:27
Unlabeled document collections are becoming increasingly common and available; mining such data sets represents a major contemporary challenge. Using words as features, text documents are often represented as high-dimensional and sparse vectors–a few thousand dimensions and a sparsity of 95 to 99% is typical. In this paper, we study a certain spherical k-means algorithm for clustering such document vectors. The algorithm outputs k disjoint clusters each with a concept vector that is the centroid of the cluster normalized to have unit Euclidean norm. As our first contribution, we empirically demonstrate that, owing to the high-dimensionality and sparsity of the text data, the clusters produced by the algorithm have a certain fractal-like and self-similar behavior. As our second contribution, we introduce concept decompositions to approximate the matrix of document vectors; these decompositions are obtained by taking the least-squares approximation onto the linear subspace spanned by all the concept vectors. We empirically establish that the approximation errors of the concept decompositions are close to the best possible, namely, to truncated singular value decompositions. As our third contribution, we show that the concept vectors are localized in the word space, are sparse, and tend towards orthonormality. In contrast, the singular vectors are global in the word space and are dense. Nonetheless, we observe the surprising fact that the linear subspaces spanned by the concept vectors and the leading singular vectors are quite close in the sense of small principal angles between them. In conclusion, the concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized basis for text data sets. 相似文献
7.
Dissecting Euclidean d -space with the power diagram of n weighted point sites partitions a given m -point set into clusters, one cluster for each region of the diagram. In this manner, an assignment of points to sites is
induced. We show the equivalence of such assignments to constrained Euclidean least-squares assignments. As a corollary, there
always exists a power diagram whose regions partition a given d -dimensional m -point set into clusters of prescribed sizes, no matter where the sites are placed. Another consequence is that constrained
least-squares assignments can be computed by finding suitable weights for the sites. In the plane, this takes roughly O(n
2
m) time and optimal space O(m) , which improves on previous methods. We further show that a constrained least-squares assignment can be computed by solving
a specially structured linear program in n+1 dimensions. This leads to an algorithm for iteratively improving the weights, based on the gradient-descent method. Besides
having the obvious optimization property, least-squares assignments are shown to be useful in solving a certain transportation
problem, and in finding a least-squares fitting of two point sets where translation and scaling are allowed. Finally, we extend
the concept of a constrained least-squares assignment to continuous distributions of points, thereby obtaining existence results
for power diagrams with prescribed region volumes. These results are related to Minkowski's theorem for convex polytopes.
The aforementioned iterative method for approximating the desired power diagram applies to continuous distributions as well.
May 30, 1995; revised June 25, 1996. 相似文献
8.
In this paper, an H1-Galerkin mixed finite element method is proposed for the 1-D regularized long wave (RLW) equation ut+ux+uux−δuxxt=0. The existence of unique solutions of the semi-discrete and fully discrete H1-Galerkin mixed finite element methods is proved, and optimal error estimates are established. Our method can simultaneously
approximate the scalar unknown and the vector flux effectively, without requiring the LBB consistency condition. Finally,
some numerical results are provided to illustrate the efficacy of our method. 相似文献
9.
Telikepalli Kavitha Kurt Mehlhorn Dimitrios Michail Katarzyna E. Paluch 《Algorithmica》2008,52(3):333-349
We consider the problem of computing a minimum cycle basis of an undirected non-negative edge-weighted graph G with m edges and n vertices. In this problem, a {0,1} incidence vector is associated with each cycle and the vector space over
generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of the weights of the cycles is minimum is called a
minimum cycle basis of G. Minimum cycle basis are useful in a number of contexts, e.g. the analysis of electrical networks and structural engineering.
The previous best algorithm for computing a minimum cycle basis has running time O(m
ω
n), where ω is the best exponent of matrix multiplication. It is presently known that ω<2.376. We exhibit an O(m
2
n+mn
2log n) algorithm. When the edge weights are integers, we have an O(m
2
n) algorithm. For unweighted graphs which are reasonably dense, our algorithm runs in O(m
ω
) time. For any ε>0, we also design an 1+ε approximation algorithm. The running time of this algorithm is O((m
ω
/ε)log (W/ε)) for reasonably dense graphs, where W is the largest edge weight.
A preliminary version of this paper appeared in Kavitha et al. (31st International Colloquium on Automata, Languages and Programming
(ICALP), pp. 846–857, 2004).
T. Kavitha and K.E. Paluch were in Max-Planck-Institut für Informatik, Saarbrücken, Germany, while this work was done. 相似文献
10.
We report our development of a vision-based motion planning system for an autonomous motorcycle designed for desert terrain,
where uniform road surface and lane markings are not present. The motion planning is based on a vision vector space (V2-Space), which is a unitary vector set that represents local collision-free directions in the image coordinate system. The
V2-Space is constructed by extracting the vectors based on the similarity of adjacent pixels, which captures both the color
information and the directional information from prior vehicle tire tracks and pedestrian footsteps. We report how the V2-Space is constructed to reduce the impact of varying lighting conditions in outdoor environments. We also show how the V2-Space can be used to incorporate vehicle kinematic, dynamic, and time-delay constraints in motion planning to fit the highly
dynamic requirements of the motorcycle. The combined algorithm of the V2-Space construction and the motion planning runs in O(n) time, where n is the number of pixels in the captured image. Experiments show that our algorithm outputs correct robot motion commands
more than 90% of the time. 相似文献
11.
Generalized linear model with L 1 and L 2 regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets. 相似文献
12.
J. H. Reif 《Algorithmica》2001,29(3):487-510
{This paper is concerned with the problem of computing the characteristic polynomial of a matrix. In a large number of applications,
the matrices are symmetric and sparse : with O(n) non-zero entries. The problem has an efficient sequential solution in this case, requiring O(n
2
) work by use of the sparse Lanczos method. A major remaining open question is: to find a polylog time parallel algorithm with matching work bounds. Unfortunately, the sparse Lanczos method cannot be parallelized to faster than time Ω (n) using n processors. Let M(n) be the processor bound to multiply two n \times n matrices in O(log n) parallel time. Giesbrecht [G2] gave the best previous polylog time parallel algorithms for the characteristic polynomial
of a dense matrix with O (M(n)) processors. There is no known improvement to this processor bound in the case where the matrix is sparse. Often, in addition
to being symmetric and sparse, the matrix has a sparsity graph (which has edges between indices of the matrix with non-zero
entries) that has small separators. This paper gives a new algorithm for computing the characteristic polynomial of a sparse
symmetric matrix, assuming that the sparsity graph is s(n) -separable and has a separator of size s(n)=O(n
γ
) , for some γ , 0 < γ < 1 , that when deleted results in connected components of ≤α n vertices, for some 0 < α < 1 , with the same property. We derive an interesting algebraic version of Nested Dissection, which constructs a sparse factorization of the matrix A-λ I
n
where A is the input matrix and I
n
is the n \times n identity matrix. While Nested Dissection is commonly used to minimize the fill-in in the solution of sparse linear systems,
our innovation is to use the separator structure to bound also the work for manipulation of rational functions in the recursively
factored matrices. The matrix elements are assumed to be over an arbitrary field. We compute the characteristic polynomial of a sparse symmetric
matrix in polylog time using P(n)(n+M(s(n)))
≤ P(n)(n+ s(n)
2.376
) processors, where P(n) is the processor bound to multiply two degree n polynomials in O(log n) parallel time using a PRAM (P(n) = O(n) if the field supports an FFT of size n but is otherwise O(nlog log n) [CK]. Our method requires only that a matrix be symmetric and non-singular (it need not be positive definite as usual for
Nested Dissection techniques). For the frequently occurring case where the matrix has small separator size, our polylog parallel
algorithm has work bounds competitive with the best known sequential algorithms (i.e., the Ω(n
2
) work of sparse Lanczos methods), for example, when the sparsity graph is a planar graph, s(n) ≤ O( \sqrt n ) , and we require polylog time with only P(n)n
1.188
processors.
}
Received September 26, 1997; revised June 5, 1999. 相似文献
13.
Standard support vector machines (SVMs) training algorithms have O(l 3) computational and O(l 2) space complexities, where l is the training set size. It is thus computationally infeasible on very large data sets. To alleviate the computational burden in SVM training, we propose an algorithm to train SVMs on a bound vectors set that is extracted based on Fisher projection. For linear separate problems, we use linear Fisher discriminant to compute the projection line, while for non-linear separate problems, we use kernel Fisher discriminant to compute the projection line. For each case, we select a certain ratio samples whose projections are adjacent to those of the other class as bound vectors. Theoretical analysis shows that the proposed algorithm is with low computational and space complexities. Extensive experiments on several classification benchmarks demonstrate the effectiveness of our approach. 相似文献
14.
Given a set P of polygons in three-dimensional space, two points p and q are said to be visible from each other with respect to P if the line segment joining them does not intersect any polygon in P . A point p is said to be completely visible from an area source S if p is visible from every point in S . The completely visible region
CV(S, P) from S with respect to P is defined as the set of all points in three-dimensional space that are completely visible from S .
We present two algorithms for computing CV(S, P) for P with a total of n vertices and a convex polygonal source S with m vertices. Our first result is a divide-and-conquer algorithm which runs in O(m
2
n
2
α(mn)) time and space, where α(mn) is the inverse of Ackermann's function. We next give an incremental algorithm for computing CV(S,P) in O(m
2
n+mn
2
α(n)) time and O(mn+n
2
) space. We also prove that CV(S,P) consists of Θ(mn+n
2
) surface elements such as vertices, edges, and faces.
Received November 16, 1995; revised November 11, 1996. 相似文献
15.
We give anO(log4
n)-timeO(n
2)-processor CRCW PRAM algorithm to find a hamiltonian cycle in a strong semicomplete bipartite digraph,B, provided that a factor ofB (i.e., a collection of vertex disjoint cycles covering the vertex set ofB) is computed in a preprocessing step. The factor is found (if it exists) using a bipartite matching algorithm, hence placing
the whole algorithm in the class Random-NC.
We show that any parallel algorithm which can check the existence of a hamiltonian cycle in a strong semicomplete bipartite
digraph in timeO(r(n)) usingp(n) processors can be used to check the existence of a perfect matching in a bipartite graph in timeO(r(n)+n
2
/p(n)) usingp(n) processors. Hence, our problem belongs to the class NC if and only if perfect matching in bipartite graphs belongs to NC.
We also consider the problem of finding a hamiltonian path in a semicomplete bipartite digraph. 相似文献
16.
Seth Pettie 《Distributed Computing》2010,22(3):147-166
We present efficient algorithms for computing very sparse low distortion spanners in distributed networks and prove some non-trivial lower bounds on the tradeoff between time, sparseness, and distortion. All of our algorithms assume a synchronized distributed network, where relatively short messages may be communicated in each time step. Our first result is a fast distributed algorithm for finding an ${O(2^{{\rm log}^{*} n} {\rm log} n)}We present efficient algorithms for computing very sparse low distortion spanners in distributed networks and prove some non-trivial
lower bounds on the tradeoff between time, sparseness, and distortion. All of our algorithms assume a synchronized distributed
network, where relatively short messages may be communicated in each time step. Our first result is a fast distributed algorithm
for finding an O(2log* n log n){O(2^{{\rm log}^{*} n} {\rm log} n)} -spanner with size O(n). Besides being nearly optimal in time and distortion, this algorithm appears to be the first that constructs an O(n)-size skeleton without requiring unbounded length messages or time proportional to the diameter of the network. Our second
result is a new class of efficiently constructible (α, β)-spanners called Fibonacci spanners whose distortion improves with the distance being approximated. At their sparsest Fibonacci spanners can have nearly linear
size, namely O(n(loglogn)f){O(n(\log \log n)^{\phi})} , where f = (1 + ?5)/2{\phi = (1 + \sqrt{5})/2} is the golden ratio. As the distance increases the multiplicative distortion of a Fibonacci spanner passes through four discrete
stages, moving from logarithmic to log-logarithmic, then into a period where it is constant, tending to 3, followed by another
period tending to 1. On the lower bound side we prove that many recent sequential spanner constructions have no efficient
counterparts in distributed networks, even if the desired distortion only needs to be achieved on the average or for a tiny
fraction of the vertices. In particular, any distance preservers, purely additive spanners, or spanners with sublinear additive
distortion must either be very dense, slow to construct, or have very weak guarantees on distortion. 相似文献
17.
Dmitry Danilov 《Computational statistics & data analysis》2008,52(9):4203-4224
The Snaer program calculates the posterior mean and variance of variables on some of which we have data (with precisions), on some we have prior information (with precisions), and on some prior indicator ratios (with precisions) are available. The variables must satisfy a number of exact restrictions. The system is both large and sparse. Two aspects of the statistical and computational development are a practical procedure for solving a linear integer system, and a stable linearization routine for ratios. The numerical method for solving large sparse linear least-squares estimation problems is tested and found to perform well, even when the n×k design matrix is large (nk=O(108)). 相似文献
18.
Prosenjit Bose Paz Carmi Mohammad Farshi Anil Maheshwari Michiel Smid 《Algorithmica》2010,58(3):711-729
The greedy algorithm produces high-quality spanners and, therefore, is used in several applications. However, even for points
in d-dimensional Euclidean space, the greedy algorithm has near-cubic running time. In this paper, we present an algorithm that
computes the greedy spanner for a set of n points in a metric space with bounded doubling dimension in
O(n2logn)\ensuremath {\mathcal {O}}(n^{2}\log n)
time. Since computing the greedy spanner has an Ω(n
2) lower bound, the time complexity of our algorithm is optimal within a logarithmic factor. 相似文献
19.
In this paper we give a fully dynamic approximation scheme for maintaining all-pairs shortest paths in planar networks. Given
an error parameter such that , our algorithm maintains approximate all-pairs shortest paths in an undirected planar graph G with nonnegative edge lengths. The approximate paths are guaranteed to be accurate to within a 1+ factor. The time bounds for both query and update for our algorithm is O(
-1
n
2/3
log
2
n log D) , where n is the number of nodes in G and D is the sum of its edge lengths. The time bound for the queries is worst case, while that for the additions is amortized.
Our approximation algorithm is based upon a novel technique for approximately representing all-pairs shortest paths among
a selected subset of the nodes by a sparse substitute graph.
Received January 1995; revised February 1997. 相似文献
20.
C. C. McGeoch 《Algorithmica》1995,13(5):426-441
The essential subgraph H of a weighted graph or digraphG contains an edge (v, w) if that edge is uniquely the least-cost path between its vertices. Let s denote the number of edges ofH. This paper presents an algorithm for solving all-pairs shortest paths onG that requires O(ns+n2 logn) worst-case running time. In general the time is equivalent to that of solvingn single-source problems using only edges inH. For general models of random graphs and digraphsG, s=0(n logn) almost surely. The subgraphH is optimal in the sense that it is the smallest subgraph sufficient for solving shortest-path problems inG. Lower bounds on the largest-cost edge ofH and on the diameter ofH andG are obtained for general randomly weighted graphs. Experimental results produce some new conjectures about essential subgraphs and distances in graphs with uniform edge costs.Much of this research was carried out while the author was a Visiting Fellow at the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS). 相似文献