首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The memory intensive nature of object-oriented languages such as C++ and Java has created the need of a high-performance dynamic memory management. Object-oriented applications often generate higher memory intensity in the heap region. Thus, high-performance memory manager is needed to cope with such applications. As today's VLSI technology advances, it becomes more and more attractive to map software algorithms such as malloc(), free(), realloc(), and garbage collection into hardware.

This paper presents hardware designs of realloc function and sweeping function (for mark and sweep garbage collection) that fully utilize the advantages of combinational logic. In our scheme, the reallocation on the original block can be done in constant time. If the reallocation on the original block is not possible, the previously proposed malloc() and free() can be used to move the contents to a new location. For the sweeping function, the bit-sweeper can detect and sweep the garbage in constant time. Since the sweeping phase often consumes more time than the marking phase, the garbage collection time can be substantially improved. The hardware complexity of proposed scheme (i.e. X-Unit, RS-unit, ESG-unit, and bit-sweeper) is O(n), where n represents the size of bit-map.  相似文献   


2.
In this paper, we study the dynamics of sand grains falling in sand piles. Usually sand piles are characterized by a decreasing integer partition and grain moves are described in terms of transitions between such partitions. We study here four main transition rules. The worst classical one, introduced by Brylawski (Discrete Math. 6 (1973) 201) induces a lattice structure LB(n) (called dominance ordering) between decreasing partitions of a given integer n. We prove that a more restrictive transition rule, called SPM rule, induces a natural partition of LB(n) in suborders, each one is associated to a fixed point for the SPM rule. In the second part, we extend the SPM rule in a natural way and obtain a model called Linear Chip Firing Game (Theoret. Comput. Sci. 115 (1993) 321). We prove that this new model has interesting properties: the induced order is a lattice, a natural greedoid can be associated to the model and it also defines a strongly convergent game. In the last section, we generalize the SPM rule in another way and obtain other lattice structure parametrized by some θ, denoted by L(n,θ), which form a decreasing sequence of lattices when θ varies in [−n+2,n]. For each θ, we characterize the fixed point of L(n,θ) and give the value of its maximal sized chain's length. We also note that L(n,−n+2) is the lattice of all compositions of n.  相似文献   

3.
4.
Given n points in the plane the planar dominance counting problem is to determine for each point the number of points dominated by it. Point p is said to dominate point q if x(q)x(p) and y(q)y(p), when x(p) and y(p) are the x− and y-coordinate of p, respectively. We present two CREW PRAM parallel algorithms for the problem, one running in O(log n loglog n) time and and the other in O(lognloglogn/logloglogn) time both using O(n) processors. Some applicationsare also given.  相似文献   

5.
We compare five implementations of the Jacobi method for diagonalizing a symmetric matrix. Two of these, the classical Jacobi and sequential sweep Jacobi, have been used on sequential processors. The third method, the parallel sweep Jacobi, has been proposed as the method of choice for parallel processors. The fourth and fifth methods are believed to be new. They are similar to the parallel sweep method but use different schemes for selecting the rotations.

The classical Jacobi method is known to take O(n4) time to diagonalize a matrix of order n. We find that the parallel sweep Jacobi run on one processor is about as fast as the sequential sweep Jacobi. Both of these methods take O(n3 log2n) time. One of our new methods also takes O(n3 log2n) time, but the other one takes only O(n3) time. The choice among the methods for parallel processors depends on the degree of parallelism possible in the hardware. The time required to diagonalize a matrix on a variety of architectures is modeled.

Unfortunately for proponents of the Jacobi method, we find that the sequential QR method is always faster than the Jacobi method. The QR method is faster even for matrices that are nearly diagonal. If we perform the reduction to tridiagonal form in parallel, the QR method will be faster even on highly parallel systems.  相似文献   


6.
Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n2/p) time for pn/log n.  相似文献   

7.
Spatial regularity amidst a seemingly chaotic image is often meaningful. Many papers in computational geometry are concerned with detecting some type of regularity via exact solutions to problems in geometric pattern recognition. However, real-world applications often have data that is approximate, and may rely on calculations that are approximate. Thus, it is useful to develop solutions that have an error tolerance.

A solution has recently been presented by Robins et al. [Inform. Process. Lett. 69 (1999) 189–195] to the problem of finding all maximal subsets of an input set in the Euclidean plane that are approximately equally-spaced and approximately collinear. This is a problem that arises in computer vision, military applications, and other areas. The algorithm of Robins et al. is different in several important respects from the optimal algorithm given by Kahng and Robins [Patter Recognition Lett. 12 (1991) 757–764] for the exact version of the problem. The algorithm of Robins et al. seems inherently sequential and runs in O(n5/2) time, where n is the size of the input set. In this paper, we give parallel solutions to this problem.  相似文献   


8.
A parallel two-list algorithm for the knapsack problem   总被引:10,自引:0,他引:10  
An n-element knapsack problem has 2n possible solutions to search over, so a task which can be accomplished in 2″ trials if an exhaustive search is used. Due to the exponential time in solving the knapsack problem, the problem is considered to be very hard. In the past decade, much effort has been done in order to find techniques which could lead to practical algorithms with reasonable running time. In 1994, Chang et al. proposed a brilliant parallel algorithm, which needs O(2n/8) processors to solve the knapsack problem in O(2n/2) time; that is, the cost of Chang et al.'s parallel algorithm is O(25n/8). In this paper, we propose a parallel algorithm to improve Chang et al.'s parallel algorithm by reducing the time complexity to be O(23n/8) under the same O(2n/8) processors available. Thus, the proposed parallel algorithm has a cost of O(2n/2). It is an improvement over previous literature. We believe that the proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular.  相似文献   

9.
We describe algorithms for constructing optimal binary search trees, in which the access cost of a key depends on the k preceding keys which were reached in the path to it. This problem has applications to searching on secondary memory and robotics. Two kinds of optimal trees are considered, namely optimal worst case trees and weighted average case trees. The time and space complexities of both algorithms are O(nk+2) and O(nk+1), respectively. The algorithms are based on a convenient decomposition and characterizations of sequences of keys which are paths of special kinds in binary search trees. Finally, using generating functions, we present an exact analysis of the number of steps performed by the algorithms.  相似文献   

10.
11.
We substantially improve the known algorithms for approximating all the complex zeros of an nth degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2b, by using order of n arithmetic operations and order of (b + n)n2 Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n2 Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)).  相似文献   

12.
Yi Pan  Keqin Li 《Information Sciences》1999,120(1-4):209-221
The computation of Euclidean distance maps (EDM), also called Euclidean distance transform, is a basic operation in computer vision, pattern recognition, and robotics. Fast computation of the EDM is needed since most of the applications using the EDM require real-time computation. It is shown in L. Chen and H.Y.H. Chuang [Information Processing Letters, 51, pp. 25–29 (1994)] that a lower bound Ω(n2) is required for any sequential EDM algorithm due to the fact that in any EDM algorithm each of the n2 pixels has to be scanned at least once. Recently, many parallel EDM algorithms have been proposed to speedup its computation. Chen and Chuang proposed an algorithm for computing the EDM on an n×n mesh in O(n) time [L. Chen and H.Y.H. Chuang Parallel Computing, 21, pp. 841–852 (1995)]. Clearly, the VLSI complexities of both the sequential and the mesh algorithm described in L. Chen and H.Y.H. Chuang [Parallel Computing, 21, pp. 841–852 (1995)] are AT2=O(n4), where A is the VLSI layout area of the design and T is the computation time using area A when implemented in VLSI. In this paper, we propose a new and faster parallel algorithm for computing the EDM problem on the reconfigurable VLSI mesh model. For the same problem, our algorithm runs in O(1) time on a two-dimensional n2×n2 reconfigurable mesh. We show that the VLSI complexity of our algorithm is the same as those of the above sequential algorithm and the mesh algorithm, while it uses much less time. To our best knowledge, this is the first constant-time EDM algorithm on any parallel computational model.  相似文献   

13.
The backtrack search problem involves visiting all the nodes of an arbitrary binary tree given a pointer to its root subject to the constraint that the children of a node are revealed only after their parent is visited. We present a fast, deterministic backtrack search algorithm for a p-processor COMMON CRCW-PRAM, which visits any n-node tree of height h in time O((n/p+h)(logloglogp)2). This upper bound compares favourably with a natural Ω(n/p+h) lower bound for this problem. Our approach embodies novel, efficient techniques for dynamically assigning tree-nodes to processors to ensure that the work is shared equitably among them.  相似文献   

14.
An on-line parser processes each word as soon as it is typed by the user, without waiting for the end of the sentence. Thus, in an interactive system, a sentence will be parsed almost immediately after the last word has been presented.

The complexity of an on-line parser is determined by the resources needed for the analysis of a single word, as it is assumed that previous words have been processed already. Sequential parsing algorithms like CYK or Earley need O(n2) time for the nth word. A parallel implementation in O(n) time on O(n) processors is straightforward. In this paper a novel parallel on-line parser is presented that needs O(1) time on O(n2) processors.  相似文献   


15.
Parallel clustering algorithms   总被引:3,自引:0,他引:3  
Clustering techniques play an important role in exploratory pattern analysis, unsupervised learning and image segmentation applications. Many clustering algorithms, both partitional clustering and hierarchical clustering, require intensive computation, even for a modest number of patterns. This paper presents two parallel clustering algorithms. For a clustering problem with N = 2n patterns and M = 2m features, the time complexity of the traditional partitional clustering algorithm on a single processor computer is O(MNK), where K is the number of clusters. The proposed algorithm on anSIMD computer with MN processors has a time complexity O(K(n + m)). The time complexity of the proposed single-link hierarchical clustering algorithm is reduced from O(MN2) of the uniprocessor algorithm to O(nN) with MN processors.  相似文献   

16.
We derive asymptotic approximations for the sequence f(n) defined recursively by f(n)=min1j<n {f(j)+f(nj)}+g(n), when the asymptotic behavior of g(n) is known. Our tools are general enough and applicable to another sequence F(n)=max1j<n {F(j)+F(nj)+min{g(j),g(nj)}}, also frequently encountered in divide-and-conquer problems. Applications of our results to algorithms, group testing, dichotomous search, etc. are discussed.  相似文献   

17.
In this paper, a new approach, which is based on function decomposition, is proposed for deriving algorithms on processor arrays with reconfigurable bus systems. The effectiveness of this approach is shown through some important applications. They include computing the logical exclusive-OR of n bits, summing n bits, summing n m-bit binary integers, and multiplying two n-bit binary integers. All these applications are solved in O(1) time.  相似文献   

18.
In this paper, we derive time-minimal systolic arrays for Gaussian elimination and the Algebraic Path Problem (APP) that use a minimal number of processors. For a problem of size n, we obtain an execution time T(n) = 3n −1 using A(n) = n2/4+O(n) processors for Gaussian elimination, and T(n) = 5n −2 and A(n) = n3/+O(n) for the APP.  相似文献   

19.
This paper presents an optimal bound on the Shannon function L(n,m,) that gives the worstcase circuit-size complexity to approximate, within an approximation degree at least , partial boolean functions having n inputs and domain size m. That is . Our bound applies to any partial boolean function and any approximation degree, and thus completes the study of boolean function approximation introduced by Pippenger (1977).

Our results give an upper bound for the hardness function h(ƒ), introduced by Nisan and Wigderson (1994), which denotes the minimum value l for which there exists a circuit of size at most l that approximates a boolean function ƒ with degree at least 1/l. Indeed, if H(n) denotes the maximum hardness value achieved by boolean functions with n inputs, we prove that for almost every nH(n)n/3 + n2 + O(1). The exponent n/3 in the above inequality implies that no family of boolean functions exists which has ‘full’ hardness. This fact establishes connections with Allender and Strauss' (1994) work that explores the structure of BPP.

Finally, we show that for almost every n and for almost every boolean function ƒ of n inputs we have h(ƒ)2n/3−2 log n. The contribution in the proof of the upper bound for L(n, m, ) can be viewed as a set of technical results that globally show how boolean linear operators are ‘well’ distributed on the class of 4-regular domains. This property is then applied to approximate partial boolean functions on general domains using a suitable composition of boolean linear operators.  相似文献   


20.
A new parallel algorithm for transforming an arithmetic infix expression into a par se tree is presented. The technique is based on a result due to Fischer (1980) which enables the construction of the parse tree, by appropriately scanning the vector of precedence values associated with the elements of the expression. The algorithm presented here is suitable for execution on a shared memory model of an SIMD machine with no read/write conflicts permitted. It uses O(n) processors and has a time complexity of O(log2n) where n is the expression length. Parallel algorithms for generating code for an SIMD machine are also presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号