首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present three explicit schemes for distributingM variables amongN memory modules, whereM=Θ(N 1.5),M = Θ(N 2), andM=Θ(N 3), respectively. Each variable is replicated into a constant number of copies stored in distinct modules. We show thatN processors, directly accessing the memories through a complete interconnection, can read/write any set ofN variables in worst-case timeO (N 1/3),O(N 1/2), andO(N 2/3), respectively for the three schemes. The access times for the last two schemes are optimal with respect to the particular redundancy values used by such schemes. The address computation can be carried out efficiently by each processor without recourse to a complete memory map and requiring onlyO(1) internal storage. This paper was partially supported by NFS Grants CCR-91-96152 and CCR-94-00232, by ONR Contract N00014-91-J-4052, ARPA Order 8225, and by the ESPRIT III Basic Research Programme of the EC under Contract No. 9072 (Project GEPPCOM). Results reported here were presented in preliminary form at the 10th Symposium on Theoretical Aspects of Computer Science (Würzburg, Germany, 1993), and at the 5th ACM Symposium on Parallel Algorithms and Architectures (Velen, Germany, 1993).  相似文献   

2.
1 Introduction and related work In recent years, peer-to-peer computing has attracted significant attention from both industry field and academic field[1-3]. The core component of many proposed peer-to- peer systems is the distributed hash table (DHT) schemes[4,5] that use a hash table-like interface to publish and look up data objects. Many proposed DHT schemes[6-15] are based on some traditional interconnection to- pology: Chord[6], Tapestry[7,8], Pastry[9] are based on hypercube topolog…  相似文献   

3.
The image template matching problem is one of the fundamental problems of and has many practical applications in image processing, pattern recognition, and computer vision. It is a useful operation for filtering, edge detection, image registration, and object detection [13]. In this paper, we first design twoO[(M2/p2)log logM] andO[(M2/p2)+(M/p)log logp] time parallel image template matching algorithms on a 3-D processor array with a reconfigurable bus system usingp2N2processors with each processor containingO(1) andO(M/p) restricted memory for 1 ≤pMN, respectively, for anN×Ndigital image and anM×Mtemplate. By increasing the number of processors, these two proposed algorithms can be run inO(M2/p2) time for speeding up the time complexity usingp2M1/cN2andp2+1/cN2processors, respectively, wherecis a constant andc≥1. Furthermore, anO(1) time can be also obtained from these two proposed algorithms by usingM2+1/cN2processors. These results improve the best known bounds and achieve both optimal and optimal speed-up in their time and processor complexities.  相似文献   

4.
We give an algorithm to compute the subset partial order (called the subset graph) for a family F of sets containing k sets with N elements in total and domain size n. Our algorithm requires O(nk2/logk) time and space on a Pointer Machine. When F is dense, i.e. N=Θ(nk), the algorithm requires O(N2/log2N) time and space. We give a construction for a dense family whose subset graph is of size Θ(N2/log2N), indicating the optimality of our algorithm for dense families. The subset graph can be dynamically maintained when F undergoes set insertions and deletions in O(nk/logk) time per update (that is sub-linear in N for the case of dense families). If we assume words of b?k bits, allow bits to be packed in words, and use bitwise operations, the above running time and space requirements can be reduced by a factor of blog(k/b+1)/logk and b2log(k/b+1)/logk respectively.  相似文献   

5.
The problem of on-line labelling is one of assigning integer labels in the range 1 to N to an input stream of at most N distinct items, drawn from a linearly ordered set, so that at each step the labels respect the ordering on the items. To maintain this constraint, items may have to be relabelled to accommodate new ones. With T(M,N) denoting the total number of relabellings that have to be performed for the first M inputs, it is known that for any given constant c in the range 0<c<1 there are exact bounds T(Nc,N)=Θ(NlogN) and T(cN,N)=Θ(Nlog2N). However, in the case c=1, when the labelling is called minimal, is known only that T(N,N)=O(Nlog3N). Existing algorithms for minimal on-line labelling are complicated, and our aim in this paper is to give a simplified and self-contained account of the problem.  相似文献   

6.
We propose and analyze threading algorithms for hybrid MPI/OpenMP parallelization of a molecular-dynamics simulation, which are scalable on large multicore clusters. Two data-privatization thread scheduling algorithms via nucleation-growth allocation are introduced: (1) compact-volume allocation scheduling (CVAS); and (2) breadth-first allocation scheduling (BFAS). The algorithms combine fine-grain dynamic load balancing and minimal memory-footprint data privatization threading. We show that the computational costs of CVAS and BFAS are bounded by Θ(n 5/3 p ?2/3) and Θ(n), respectively, for p threads working on n particles on a multicore compute node. Memory consumption per node of both algorithms scales as O(n+n 2/3 p 1/3), but CVAS has smaller prefactors due to a geometric effect. Based on these analyses, we derive the selection criterion between the two algorithms in terms of the granularity, n/p. We observe that memory consumption is reduced by 75 % for p=16 and n=8,192 compared to a naïve data privatization, while maintaining thread imbalance below 5 %. We obtain a strong-scaling speedup of 14.4 with 16-way threading on a four quad-core AMD Opteron node. In addition, our MPI/OpenMP code achieves 2.58× and 2.16× speedups over the MPI-only implementation on 32,768 cores of BlueGene/P for 0.84 and 1.68 million particle systems, respectively.  相似文献   

7.
A difference method is presented for singularly perturbed convection-diffusion problems with discretization error estimates of high order (orderp), which hold uniformly in the singular perturbation parameterε. The method is based on the use of a defect-correction technique and special adaptively graded and patched meshes, with meshsizes varying betweenh andε 3/2 h whenp=2, whereh is the meshsize, used in the part of the domain where the solution is smooth, andε 3/2 h is the final meshsize in the boundary layer. Similar constructions hold for interior layers. The correction operator is a monotone operator, enabling the estimate of the error of optimal order in maximum norm. The total number of meshpoints used in ad-dimensional problem isO(ε ?s)h ?d+O(h ?d), wheres is 1/p or 1/2p, respectively in the case of boundary or interior layer.  相似文献   

8.
The restoration of digital images and patterns by the splitting-integrating method (SIM) proposed by Li (1993) and Liet al. (1992) is much simpler than other algorithms because no solutions of nonlinear algebraic equations are required. Let a pixel in 2D images be split intoN 2 subpixels; the convergence rates areO(1/N) andO/(1/N 2) for pixel greyness under image normalization by SIM. In this paper, the advanced SIM using spline functions can raise the convergence rates to (O(1/N 3) andO(1/N 4). Error bounds of pixel greyness obtained are derived from numerical analysis, and numerical experiments are carried out to confirm the high convergence rates ofO(1/N 3) andO(1/N 4).  相似文献   

9.
A simple code transformation is presented that reduces the space complexity of Yang and Anderson's local-spin mutual exclusion algorithm. In both the original and the transformed algorithm, only atomic read and write instructions are used; each process generates Θ(logN) remote memory references per lock request, where N is the number of processes. The transformed algorithm uses Θ(N) distinct variables, which is clearly optimal.  相似文献   

10.
We consider the problem of maintaining information about the rank of a matrix M under changes to its entries. For an n×n matrix M, we show an amortized upper bound of O(n ω?1) arithmetic operations per change for this problem, where ω<2.373 is the exponent for matrix multiplication, under the assumption that there is a lookahead of up to Θ(n) locations. That is, we know up to the next Θ(n) locations (i 1,j 1),(i 2,j 2),…?, whose entries are going to change, in advance; however we do not know the new entries in these locations in advance. We get the new entries in these locations in a dynamic manner. The dynamic matrix rank problem was first studied by Frandsen and Frandsen who showed an upper bound of O(n 1.575) and a lower bound of Ω(n) for this problem and later Sankowski showed an upper bound of O(n 1.495) for this problem when allowing randomization and a small probability of error. These algorithms do not assume any lookahead. For the dynamic matrix rank problem with lookahead, Sankowski and Mucha showed a randomized algorithm (with a small probability of error) that is more efficient than these algorithms.  相似文献   

11.
In this paper we describe scalable parallel algorithms for building the convex hull and a triangulation ofncoplanar points. These algorithms are designed for thecoarse grained multicomputermodel:pprocessors withO(n/p)⪢O(1) local memory each, connected to some arbitrary interconnection network. They scale over a large range of values ofnandp, assuming only thatnp1+ε(ε>0) and require timeO((Tsequential/p)+Ts(n, p)), whereTs(n, p) refers to the time of a global sort ofndata on approcessor machine. Furthermore, they involve only a constant number of global communication rounds. Since computing either 2D convex hull or triangulation requires timeTsequential=Θ(n log n) these algorithms either run in optimal time,Θ((n log n)/p), or in sort time,Ts(n, p), for the interconnection network in question. These results become optimal whenTsequential/pdominatesTs(n, p) or for interconnection networks like the mesh for which optimal sorting algorithms exist.  相似文献   

12.
Shellsort with a constant number of increments   总被引:2,自引:0,他引:2  
M. A. Weiss 《Algorithmica》1996,16(6):649-654
We consider the worst-case running time of Shellsort when only a constant number,c, of increments are allowed. Forc=3, we show that Shellsort can be implemented to run inO(N 5/3) time, which is optimal. Forc=4, we further improve the running time toO(N 11/7), and, forc=5, we obtain a bound ofO(N 23/15). We also show anO(N 1+1/k ) bound for generalc, wherek=[(1+8c+1)/4]. Forc=6, this isO(N 3/2).  相似文献   

13.
In this paper, we use crossing number and wire area arguments to find lower bounds on the layout area and maximum edge length of a variety of new and computationally useful networks. In particular, we describe
  1. anN-node planar graph which has layout area θ(NlogN) and maximum edge length θ(N 1/2/log1/2 N),
  2. anN-node graph with anO(x 1/2)-separator which has layout area θ(Nlog2 N) and maximum edge length θ(N 1/2logN/loglogN), and
  3. anN-node graph with anO(x 1?1/r )-separator which has maximum edge length θ(N 1?1/r ) for anyr ≥ 3.
  相似文献   

14.
《Parallel Computing》1997,23(13):1909-1936
This paper presents a global reduction algorithm for wormhole-routed 2D meshes. Well-known reduction algorithms that are optimized for short vectors have complexity O(M log N), where N = n × n is the number of nodes, and M the vector length. Algorithms suitable for long vectors have complexity O(√N + M). Previously known asymptotically optimal algorithms with complexity O(log N + M) incur inherent network contention among constituent messages. The proposed algorithm adapts to the given vector length, resulting in complexities O(M log N) for short vectors, O(log N + M) for medium-sized vectors, and O(√N + M) for sufficiently long vectors. The O(√N + M) version is preferred to the O(log N + M) version for long vectors, due to its small coefficient associated with M, the dominating factor for such vectors. The algorithm is contention-free in a synchronous environment. Under asynchronous execution models, depth contention (contention among message-passing steps) may occur. However, simulation studies show that the effect of depth contention on the actual performance is negligible.  相似文献   

15.
This paper considers the histogramming problem on hypercube.N-PE hypercube is used to process anN 12 × N12digitized image in which each pixel has a gray-level value between 0 andM − 1. In general,M, the range of gray-level values is much smaller thanN, the number of pixels being processed. Our algorithm generates the histogram of the image inO(logM * logN) time using radix sort and efficient data movement operations. This technique can be implemented on butterfly, shuffle-exchange and fat pyramid organizations.  相似文献   

16.
An integrative overview of the algorithmic characteristics of three well-known polynomialtime heuristics for the undirected Steiner minimum tree problem:shortest path heuristic (SPH),distance network heuristic (DNH), andaverage distance heuristic (ADH) is given. The performance of thesesingle-pass heuristics (and some variants) is compared and contrasted with several heuristics based onrepetitive applications of the SPH. It is shown that two of these repetitive SPH variants generate solutions that in general are better than solutions obtained by any single-pass heuristic. The worst-case time complexity of the two new variants isO(pn 3) andO(p 3 n 2), while the worst-case time complexity of the SPH, DNH, and ADH is respectivelyO(pn 2),O(m + n logn), andO(n 3) wherep is the number of vertices to be spanned,n is the total number of vertices, andm is the total number of edges. However, use of few simple tests is shown to provide large reductions of problem instances (both in terms of vertices and in term of edges). As a consequence, a substantial speed-up is obtained so that the repetitive variants are also competitive with respect to running times.  相似文献   

17.
《Real》1996,2(6):373-382
This paper presents the architecture and the implementation of template matching on a 3-D piece-wise regular processor space that forms a two-dimensional array of linear systolic arrays. Template matching can be considered as a 2-D convolution of an image of sizeN × Nwith a kernel of sizer× r. Conventional high-speed implementations use 2-D systolic arrays of sizeO(r2) which compute inO(N2) time. The drawback of this solution is that the size of the processor array follows on the size of the convolution kernel. This does not permit the allocation of more processors in order to meet the real-time requirements. With the approach used in this paper, the size of the processor array may be extended up toO(sr2), 1 ≤sN, thereby accomplishing the calculations inO(N2/s) time. In the case whens=r, ther × rmesh of 1-D systolic arrays of sizeO(r) is yielded. The piecewise regularity of the 3-D processor array allows also easy physical realization.  相似文献   

18.
Independent spanning trees (ISTs) on networks have applications to increase fault-tolerance, bandwidth, and security. Möbius cubes are a class of the important variants of hypercubes. A recursive algorithm to construct n ISTs on n-dimensional Möbius cube M n was proposed in the literature. However, there exists dependency relationship during the construction of ISTs and the time complexity of the algorithm is as high as O(NlogN), where N=2 n is the number of vertices in M n and n≥2. In this paper, we study the parallel construction and a diagnostic application of ISTs on Möbius cubes. First, based on a circular permutation n?1,n?2,…,0 and the definitions of dimension-backbone walk and dimension-backbone tree, we propose an O(N) parallel algorithm, called PMCIST, to construct n ISTs rooted at an arbitrary vertex on M n . Based on algorithm PMCIST, we further present an O(n) parallel algorithm. Then we provide a parallel diagnostic algorithm with high efficiency to diagnose all the vertices in M n by at most n+1 steps, provided the number of faulty vertices does not exceed n. Finally, we present simulation experiments of ISTs and an application of ISTs in diagnosis on 0-M 4.  相似文献   

19.
We study the problem of mapping theNnodes of a data structure onMmemory modules so that they can be accessed in parallel bytemplates, i.e., distinct sets of nodes. In literature several algorithms are available for arrays (accessed by rows, columns, diagonals, and subarrays) and trees (accessed by subtrees, root-to-leaf paths, levels, etc.). Although some mapping algorithms for arrays allow conflict-free access to several templates at once (for example rows and columns), no mapping algorithm is known for efficiently accessing subtree, path and level templates in complete binary trees. In our paper, we first prove that any mapping algorithm that is conflict-free for tree/level template has Ω(M/logM) conflicts when access is done according to path template and vice versa. Therefore, no mapping algorithm can be found that is conflict-free on both path and tree (or path and level) templates. Our main result is an algorithm for mapping complete binary trees withN= 2M− 1 nodes onMmemory modules in such a way that:
  • &#x02022;the number of conflicts for accessing an-node subtree,adjacent nodes in the same level, orconsecutive nodes of a root-to-leaf path is(),
  • &#x02022;the load (i.e., the ratio between the maximum and minimum number of data items mapped on each module) is 1 + o(1),
  • &#x02022;the time complexity for retrieving the module where a given data item is stored is(1), if a preprocessing phase of space and time complexity(log) is executed, or(log log), if no preprocessing is allowed.
The algorithm can be easily generalized to complete binary trees of any size.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号