首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over 100 processors and indicate that similar speedup can be obtained with several hundred processors.  相似文献   

2.
We consider the following planar max-min length triangulation problem: given a set of n points in the Euclidean plane, find a triangulation such that the length of the shortest edge in the triangulation is maximized. In this paper, a linear time algorithm is proposed for computing the max-min length triangulation of a set of points in convex position. In addition, an O(nlogn) time algorithm is proposed for computing the max-min length k-set triangulation of a set of points in convex position, where we are to compute a set of k vertices such that the max-min length triangulation on them is minimized over all possible k-set. We further show that the graph version of max-min length triangulation is NP-complete, and some common heuristics such as greedy algorithm are in general not able to give a bounded-ratio approximation to the max-min length triangulation.  相似文献   

3.
We give an in-depth introduction to the design of our functional array programming language SaC, the main aspects of its compilation into host machine code, and its parallelisation based on multi-threading. The language design of SaC aims at combining high-level, compositional array programming with fully automatic resource management for highly productive code development and maintenance. We outline the compilation process that maps SaC programs to computing machinery. Here, our focus is on optimisation techniques that aim at restructuring entire applications from nested compositions of general fine-grained operations into specialised coarse-grained operations. We present our implicit parallelisation technology for shared memory architectures based on multi-threading and discuss further optimisation opportunities on this level of code generation. Both optimisation and parallelisation rigorously exploit the absence of side-effects and the explicit data flow characteristic of a functional setting.  相似文献   

4.
针对分布内存结构的并行化将串行程序转变为在各处理节点上运行的SPMD并行程序,节点程序包含该节点所执行的运算和与其它节点交换信息的通信操作。讨论了在已知数据分解和计算划分的前提下生成分布内存结构下的消息传递并行程序的算法,以Lam提出的线性不等式基本框架为基础,在Paraguin工作基础上进行了有效的改进:第一在代码生成算法中引入了数据分布;第二将处理器空间由一维扩展到多维;第三将虚拟处理器到物理处理器的映射关系引入代码生成算法,从而减少了节点间通信的数量,提高了生成并行代码的性能。  相似文献   

5.
Christine Rüb 《Algorithmica》1992,8(1-6):119-144
In this paper we give a parallel algorithm for line-segment intersection reporting in the plane. It runs in timeO(((n +k) logn log logn)/p) usingp processors on a concurrent-read-exclusive-write (CREW)-PRAM, wheren is the number of line segments,k is the number of intersections, andpn +k.  相似文献   

6.
Electron Repulsion Integrals (ERIs) are a common bottleneck in ab initio computational chemistry. It is known that sorted/reordered execution of ERIs results in efficient SIMD/vector processing. This paper shows that reconfigurable computing and heterogeneous processor architectures can also benefit from a deliberate ordering of ERI tasks. However, realizing these benefits as net speedup requires a very rapid sorting mechanism. This paper presents two such mechanisms. Included in this study are analytical, simulation-based, and experimental benchmarking approaches to consider five use cases for ERI sorting, i.e. SIMD processing, reconfigurable computing, limited address spaces, instruction cache exploitation, and data cache exploitation. Specific consideration is given to existing cache-based processors, FPGAs, and the Cell Broadband Engine processor. It is proposed that the analyses conducted in this work should be built upon to aid the development of software autotuners which will produce efficient ab initio computational chemistry codes for a variety of computer architectures.  相似文献   

7.
Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We demonstrate a storage scheme for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution that does not waste any storage, and show that, under this storage scheme, the local memory access sequence of any processor for a computation involving the regular section A(ℓ:h:s) is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and we extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little runtime overhead and acceptable preprocessing time.  相似文献   

8.
On parallel processors or in distributed computing environments, generating and sharing one stream of random numbers for all parallel processing elements is usually impractical. A more attractive method is to allow each processing element to generate random numbers independently. This paper investigates parallel use of multiplicative congruential generators. We analyze the leapfrog, the regular spacing, and the random spacing methods. Our results show: (1) The leapfrog method can result in multipliers of low spectral values. (2) In the random spacing method, the minimal distance between n substreams is only 1/n2 of cycle length in average. (3) The regular spacing method can result in strong correlation between substreams if the starting points αjx0 () are poorly selected. We then suggest selecting multiplier a and factor α based on their k-dimensional spectral values and the minimal distance between substreams of these generators.  相似文献   

9.
Rectangles in a plane provide a very useful abstraction for a number of problems in diverse fields. In this paper we consider the problem of computing geometric properties of a set of rectangles in the plane. We give parallel algorithms for a number of problems usingn processors wheren is the number of upright rectangles. Specifically, we present algorithms for computing the area, perimeter, eccentricity, and moment of inertia of the region covered by the rectangles inO(logn) time. We also present algorithms for computing the maximum clique and connected components of the rectangles inO(logn) time. Finally, we give algorithms for finding the entire contour of the rectangles and the medial axis representation of a givenn × n binary image inO(n) time. Our results are faster than previous results and optimal (to within a constant factor).  相似文献   

10.
As a generalization of the precise and pessimistic diagnosis strategies of system-level diagnosis of multicomputers, the t/k diagnosis strategy can significantly improve the self-diagnosing capability of a system at the expense of no more than k fault-free processors (nodes) being mistakenly diagnosed as faulty. In the case k ? 2, to our knowledge, there is no known t/k diagnosis algorithm for general diagnosable system or for any specific system. Hypercube is a popular topology for interconnecting processors of multicomputers. It is known that an n-dimensional cube is (4n − 9)/3-diagnosable. This paper addresses the (4n − 9)/3 diagnosis of n-dimensional cube. By exploring the relationship between a largest connected component of the 0-test subgraph of a faulty hypercube and the distribution of the faulty nodes over the network, the fault diagnosis of an n-dimensional cube can be reduced to those of two constituent (n − 1)-dimensional cubes. On this basis, a diagnosis algorithm is presented. Given that there are no more than 4n − 9 faulty nodes, this algorithm can isolate all faulty nodes to within a set in which at most three nodes are fault-free. The proposed algorithm can operate in O(N log2 N) time, where N = 2n is the total number of nodes of the hypercube. The work of this paper provides insight into developing efficient t/k diagnosis algorithms for larger k value and for other types of interconnection networks.  相似文献   

11.
Let P andQ be two convex,n-vertex polygons. We consider the problem of computing, in parallel, some functions ofP andQ whenP andQ are disjoint. The model of parallel computation we consider is the CREW-PRAM, i.e., it is the synchronous shared-memory model where concurrent reads are allowed but no two processors can simultaneously attempt to write in the same memory location (even if they are trying to write the same thing). We show that a CREW-PRAM havingn 1/k processors can compute the following functions in O(k1+?) time: (i) the common tangents betweenP andQ, and (ii) the distance betweenP andQ (and hence a straight line separating them). The positive constant ? can be made arbitrarily close to zero. Even with a linear number of processors, it was not previously known how to achieve constant time performance for computing these functions. The algorithm for problem (ii) is easily modified to detect the case of zero distance as well.  相似文献   

12.
《Parallel Computing》1986,3(2):153-166
We present a parallel method to solve the generalized eigenvalue problem on a linear array of processors, each connected to their nearest neighbors and operating synchronously. We also include a wrap-around connection from end to end. Our method is based on the well-known QZ algorithm of Moler and Stewart, which simultaneously reduces two n × n matrices to upper triangular form by orthogonal or unitary transformations. We show how this algorithm may be partitioned and distributed of n + 1 processors, achieving a speed-up over the serial algorithm of O(n). We use the concept of windows to describe the action of each processor at each step. We show how to incorporate singles shifts, and how to apply orthogonal plane rotations on either side of a matrix without the need to transpose the matrix itself.  相似文献   

13.
Inspired by applications in parallel computing, we analyze the setting of work stealing in multithreaded computations. We obtain tight upper bounds on the number of steals when the computation can be modeled by rooted trees. In particular, we show that if the computation with n processors starts with one processor having a complete k-ary tree of height h (and the remaining n ? 1 processors having nothing), the maximum possible number of steals is \({\sum }_{i=1}^{n}(k-1)^{i}\binom {h}{i}\).  相似文献   

14.
We show that quantum query complexity satisfies a strong direct product theorem. This means that computing k copies of a function with fewer than k times the quantum queries needed to compute one copy of the function implies that the overall success probability will be exponentially small in k. For a boolean function f, we also show an XOR lemma—computing the parity of k copies of f with fewer than k times the queries needed for one copy implies that the advantage over random guessing will be exponentially small. We do this by showing that the multiplicative adversary method, which inherently satisfies a strong direct product theorem, characterizes bounded-error quantum query complexity. In particular, we show that the multiplicative adversary bound is always at least as large as the additive adversary bound, which is known to characterize bounded-error quantum query complexity.  相似文献   

15.
The reconfigurable array with slotted optical buses (RASOB) has recently received a lot of attention from the research community. In this paper, we first discuss the reconfiguration methods and communication capabilities of the RASOB architecture. Then, we use this architecture for the implementation of efficient sorting algorithms on the 1D RASOB and the 2D RASOB. Our parallel sorting algorithm on the 1D RASOB is based on an efficient divide-and-conquer scheme. It sortsNdata items usingNprocessors inO(k) communication cycles where k is the size of the data items to be sorted in bits. We further develop a parallel sorting algorithm on the 2D RASOB based on the sorting algorithm on the 1D RASOB in conjunction with the well known Rotatesort algorithm. Similarly, this algorithm sortsNdata items on a 2D RASOB of sizeNinO(k) communication cycles. These sorting algorithms are much more efficient than state-of-the-art sorting algorithms on reconfigurable arrays of processors withelectronicbuses using the same number of processors.  相似文献   

16.
In a planar geometric network vertices are located in the plane, and edges are straight line segments connecting pairs of vertices, such that no two of them intersect. In this paper we study distributed computing in asynchronous, failure-free planar geometric networks, where each vertex is associated to a processor, and each edge to a bidirectional message communication link. Processors are aware of their locations in the plane.We consider fundamental computational geometry problems from the distributed computing point of view, such as finding the convex hull of a geometric network and identification of the external face. We also study the classic distributed computing problem of leader election, to understand the impact that geometric information has on the message complexity of solving it.We obtain an O(nlog2n) message complexity algorithm to find the convex hull, and an O(nlogn) message complexity algorithm to identify the external face of a geometric network of n processors. We present a matching lower bound for the external face problem. We prove that the message complexity of leader election in a geometric ring is Ω(nlogn), hence geometric information does not help in reducing the message complexity of this problem.  相似文献   

17.
We discuss the problem of scheduling af set of independent tasks T, each ti ϵ T of lenght ℓi ϵ Z+, on m identical processors. We allow preemption but assume a communication delay of time k ϵ N. Whenever a task is preempted from one processor to another, there must be a delay of at least k time units. We show that if k = 1, an optimal schedule can be found in polynomial time but if k ⩾ 2, the corresponding decision problem is NP-complete.  相似文献   

18.
This paper describes a parallel algorithm for computing the visible portion of a simple planar polygon with N vertices from a given point on or inside the polygon. The algorithm accomplishes this in O(k log N) time using O(N/log N) processors, where k is the link-diameter of the polygon in consideration. The link-diameter of a polygon is the maximum number of straight line segments needed to connect any two points within the polygon, where all line segments lie completely within the polygon. The algorithm can also be used to compute the visible portion of the plane given a point outside of the polygon. Except in this case, the parameter k in the asymptotic bounds would be the link diameter of a different polygon. The algorithm is optimal for sets of polygons that have a constant link diameter. It is a rather simple algorithm, and has a very small run time constant, making it fast and practical to implement. The interprocessor communication needed involves only local neighbor communication and scan operations (i.e., parallel prefix operations). Thus the algorithm can be implemented not only on an EREW PRAM, but also on a variety of other more practical machine architectures, such as hypercubes, trees, butterflies, and shuffle exchange networks. The algorithm was implemented on the Connection Machine as well as the MasPar MP- 1, and various performance tests were conducted.  相似文献   

19.
The image analogy framework is especially useful to synthesize appealing images for non-homogeneous input and gives users creative control over the synthesized results. However, the traditional framework did not adaptively employ the searching strategy based on neighborhood’s different textural contents. Besides, the synthesis speed is slow due to intensive computation involved in neighborhood matching. In this paper we present a CUDA-based neighborhood matching algorithm for image analogy. Our algorithm adaptively applies the global search of the exact L 2 nearest neighbor and k-coherence search strategies during synthesis according to different textural features of images, which is especially usefully for non-homogeneous textures. To consistently implement the above two search strategies on GPU, we adopt the fast k nearest neighbor searching algorithm based on CUDA. Such an acceleration greatly reduces the time of the pre-process of k-coherence search and the synthesis procedure of the global search, which makes possible the adjustment of important synthesis parameters. We further adopt synthesis magnification to get the final high-resolution synthesis image for running efficiency. Experimental results show that our algorithm is suitable for various applications of the image analogy framework and takes full advantage of GPU’s parallel processing capability to improve synthesis speed and get satisfactory synthesis results.  相似文献   

20.
When executing processes on parallel computer systems a major bottle-neck is interprocessor communication. One way to address this problem is to minimize the communication between processes that are mapped to different processors. This translates to the k-partitioning problem of the corresponding process graph, where k is the number of processors. The classical spectral lower bound of (|V|/2k)\sum k i=1λ i for the k-section width of a graph is well known. We show new relations between the structure and the eigenvalues of a graph and present a new method to get tighter lower bounds on the k-section width. This method makes use of the level structure defined by the k-section. We define a global expansion property and prove that for graphs with the same k-section width the spectral lower bound increases with this global expansion. We also present examples of graphs for which our new bounds are tight up to a constant factor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号