首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We investigate the lattice-based array partitioning based on the theory of the Smith Normal Form and we present two elegant techniques for partitioning arrays in parallel DoAll loops for message-passing parallel machines: (1) DoAll loops with constant dependencies for communication-free partitioning: a general solution of all possible communication-free partitioning is derived where the dependencies among array references are described in constant distance vectors. (2) DoAll loops with non-constant dependencies for block-communication partitioning: the dependencies among array references are described in non-constant distance vectors. We derive the partitioning equations which allocate all remote data to a unique processor such that only one block-communication can obtain all the remote data for the computation. By using the Smith Normal Form decomposition, we are also able to verify our partitioning results.  相似文献   

2.
This paper addresses the problem of partitioning the iterations of nested loops, and data arrays accessed by the loops. Hyperplane partitions of disjoint subsets of data arrays and loop iterations that result in the elimination of communication are sought. A characterization of necessary and sufficient conditions for communication-free hyperplane partitioning is provided.  相似文献   

3.
分布存储系统中优化通信的冗余计算分割   总被引:1,自引:0,他引:1  
针对并行循环套序列,提出一种冗余计算分割的通信优化方法,根据数据流分析,文中给出用以确定每个循环套的冗余计算量的一般方法,并在此基础上提出冗余计算分割的实现和判定,针对规则依赖的程序,该文还提出了一个高效的冗余计算分割的实现方法,该技术已经在一个并行编译器中实现,试验结果表明,它比传统的通信优化技术有明显的优越性。  相似文献   

4.
Due to a significant communication overhead of sending and receiving data, the loop partitioning approaches on distributed memory systems must guarantee not just the computation load balance but computation+communication load balance. The previous approaches in loop partitioning have achieved a communication-free, computation load balanced iteration space partitioning solution for a limited subset of DOALL loops. But a large category of DOALL loops inevitably result in communication and the trade-offs between computation and communication must be carefully analyzed for these loops in order to balance out the combined computation time and communication overheads. In this work, we describe a partitioning approach based on the above motivation for the general cases of DOALL loops. Our goal is to achieve a computation+communication load balanced partitioning through static data and iteration space distribution. Our approach first performs partitioning of iteration and data spaces of a loop nest by analyzing communication and parallelism; it then performs architecture-dependent analysis to adjust the granularity of partitions, load balance each partition with respect to total computation+communication, and then performs mapping of partitions onto the available number of processors. This multiphase partitioning method works as follows. First, the code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. The partitioning is carried out in the iteration space of the loop by cyclically following a set of direction vectors such that the data references are maximally localized and reused, eliminating a larger communication volume than parallelism. We then perform data space partitioning based on a new larger partition owns rule to minimize the communication overhead for a compute intensive partition by localizing its references relatively more than a smaller noncompute intensive partition. A partition interaction graph is then constructed which is used by the architecture-dependent analysis phase to merge the partitions to achieve granularity adjustment, computation+communication load balance, and mapping on the actual number of available processors. Relevant theory and algorithms are developed along with a performance evaluation on the Cray T3D.  相似文献   

5.
This paper addresses the problem of communication-free partition of iteration spaces and data spaces along hyperplanes. To finding more possible communication-free hyperplane partitions, we treat statements within a loop body as separate schedulable units. Instead of using the information about data dependence distance or direction vectors, our technique explicitly formulates array references as transformations from statement-iteration spaces to data spaces. Based on these transformations, the necessary and sufficient conditions for communication-free partition along hyperplanes to be feasible have been proposed. This approach can be applied to all programs with an imperfectly nested loop or sequences of imperfectly nested loops, whose array references are affine functions of outer loop indices or loop invariant variables. The proposed approach is more practical than existing methods in finding the data and computation distribution patterns that can cause the processor to execute fully-parallel on multicomputers without any interprocessor communication.  相似文献   

6.
Presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. While several previous papers have looked at hyperplane partitioning of iteration spaces to reduce communication traffic, the problem of deriving the optimal tiling parameters for minimal communication in loops with general affine index expressions has remained open. Our paper solves this open problem by presenting a method for deriving an optimal hyperparallelepiped tiling of iteration spaces for minimal communication in multiprocessors with caches. We show that the same theoretical framework can also be used to determine optimal tiling parameters for both data and loop partitioning in distributed memory multicomputers. Our framework uses matrices to represent iteration and data space mappings and the notion of uniformly intersecting references to capture temporal locality in array references. We introduce the notion of data footprints to estimate the communication traffic between processors and use linear algebraic methods and lattice theory to compute precisely the size of data footprints. We have implemented this framework in a compiler for Alewife, a distributed shared-memory multiprocessor  相似文献   

7.
A nonlinear programming approach to the problem of learning in nonparametric and unsupervised environments is put forth in this study. The approach exploits the concept of optimal partitioning of the given data set as the objective of such nonparametric unsupervised learning. This partitioning or clustering is achieved by the application of the recently developed improved flexible polyhedron method (IFPM) to the associated optimization problem. The optimization problem is defined here in terms of a new, conceptually innovative optimality criterion based on intergroup and intragroup distinct scatters. Details of this new approach, a procedure for deriving an initial partition, and application of the algorithm to two numerical examples are presented.  相似文献   

8.
This paper introduces results on placement and communications minimization for systems of affine recurrence equations. We show how to classify the dependences according to the number and nature of communications they may result in. We give both communication-free conditions and conditions for an efficient use of broadcast or neighbor-to-neighbor communication primitives. Since the dependences of a problem can generally not be all communication-free, we finally introduce a heuristic to globally minimize the communications based on the classification of dependences.  相似文献   

9.
10.
Data co-clustering refers to the problem of simultaneous clustering of two data types. Typically, the data is stored in a contingency or co-occurrence matrix C where rows and columns of the matrix represent the data types to be co-clustered. An entry C ij of the matrix signifies the relation between the data type represented by row i and column j. Co-clustering is the problem of deriving sub-matrices from the larger data matrix by simultaneously clustering rows and columns of the data matrix. In this paper, we present a novel graph theoretic approach to data co-clustering. The two data types are modeled as the two sets of vertices of a weighted bipartite graph. We then propose Isoperimetric Co-clustering Algorithm (ICA)—a new method for partitioning the bipartite graph. ICA requires a simple solution to a sparse system of linear equations instead of the eigenvalue or SVD problem in the popular spectral co-clustering approach. Our theoretical analysis and extensive experiments performed on publicly available datasets demonstrate the advantages of ICA over other approaches in terms of the quality, efficiency and stability in partitioning the bipartite graph.  相似文献   

11.
In object-oriented databases (OODBs), a method encapsulated in a class typically accesses a few, but not all the instance variables defined in the class. It may thus be preferable to vertically partition the class for reducing irrelevant data (instance variables) accessed by the methods. Our prior work has shown that vertical class partitioning can result in a substantial decrease in the total number of disk accesses incurred for executing a set of applications, but coming up with an optimal vertical class partitioning scheme is a hard problem. In this paper, we present two algorithms for deriving optimal and near-optimal vertical class partitioning schemes. The cost-driven algorithm provides the optimal vertical class partitioning schemes by enumerating, exhaustively, all the schemes and calculating the number of disk accesses required to execute a given set of applications. For this, a cost model for executing a set of methods in an OODB system is developed. Since exhaustive enumeration is costly and only works for classes with a small number of instance variables, a hill-climbing heuristic algorithm (HCHA) is developed, which takes the solution provided by the affinity-based algorithm and improves it, thereby further reducing the total number of disk accesses incurred. We show that the HCHA algorithm provides a reasonable near-optimal vertical class partitioning scheme for executing a given set of applications.Received: 29 March 1999, Accepted: 11 March 2002, Published online: 3 April 2003This research has been supported, in part, by Hong Kong UGC Research Grants Council under grant CityU 733/96E and CityU 1119/99E.  相似文献   

12.
In distributed memory multicomputers, local memory accesses are much faster than those involving interprocessor communication. For the sake of reducing or even eliminating the interprocessor communication, the array elements in programs must be carefully distributed to local memory of processors for parallel execution. We devote our efforts to the techniques of allocating array elements of nested loops onto multicomputers in a communication-free fashion for parallelizing compilers. We first analyze the pattern of references among all arrays referenced by a nested loop, and then partition the iteration space into blocks without interblock communication. The arrays can be partitioned under the communication-free criteria with nonduplicate or duplicate data. Finally, a heuristic method for mapping the partitioned array elements and iterations onto the fixed-size multicomputers under the consideration of load balancing is proposed. Based on these methods, the nested loops can execute without any communication overhead on the distributed memory multicomputers. Moreover, the performance of the strategies with nonduplicate and duplicate data for matrix multiplication is studied  相似文献   

13.
Dr. G. Levi 《Computing》1974,12(4):291-313
A new algorithm for testing isomorphism on directed and undirected graphs is described. It consists of an 0 (n 5) algorithm for partitioning node and edge sets, plus a heuristic procedure for deriving all the isomorphisms. The partitioning is described by a “connectivity graph” on which a sufficient condition for isomerphism can be tested.  相似文献   

14.
Array partitioning is an important research problem in array management area, since the partitioning strategies have important influence on storage, query evaluation, and other components in array management systems. Meanwhile, compression is highly needed for the array data due to its growing volume. Observing that the array partitioning can affect the compression performance significantly, this paper aims to design the efficient partitioning method for array data to optimize the compression performance. As far as we know, there still lacks research efforts on this problem. In this paper, the problem of array partitioning for optimizing the compression performance (PPCP for short) is firstly proposed. We adopt a popular compression technique which allows to process queries on the compressed data without decompression. Secondly, because the above problem is NP-hard, two essential principles for exploring the partitioning solution are introduced, which can explain the core idea of the partitioning algorithms proposed by us. The first principle shows that the compression performance can be improved if an array can be partitioned into two parts with different sparsities. The second principle introduces a greedy strategy which can well support the selection of the partitioning positions heuristically. Supported by the two principles, two greedy strategy based array partitioning algorithms are designed for the independent case and the dependent case respectively. Observing the expensive cost of the algorithm for the dependent case, a further optimization based on random sampling and dimension grouping is proposed to achieve linear time cost. Finally, the experiments are conducted on both synthetic and real-life data, and the results show that the two proposed partitioning algorithms achieve better performance on both compression and query evaluation.  相似文献   

15.
Learning from Cluster Examples   总被引:2,自引:0,他引:2  
Learning from cluster examples (LCE) is a hybrid task combining features of two common grouping tasks: learning from examples and clustering. In LCE, each training example is a partition of objects. The task is then to learn from a training set, a rule for partitioning unseen object sets. A general method for learning such partitioning rules is useful in any situation where explicit algorithms for deriving partitions are hard to formalize, while individual examples of correct partitions are easy to specify. In the past, clustering techniques have been applied to such problems, despite being essentially unsuited to the task. We present a technique that has qualitative advantages over standard clustering approaches. We demonstrate these advantages by applying our method to problems in two domains; one with dot patterns and one with more realistic vector-data images.  相似文献   

16.
Many approaches have been proposed for deriving tests from finite state machine (FSM) specifications with respect to some established coverage criteria. A fundamental core problem in FSM-based testing relates to the derivation of input sequences that can distinguish states of an FSM specification, aka distinguishing sequences. A major effort in the construction of these sequences is based on the derivation of a successors search-tree labeled by sets of pairs of states of the given machine. We aim at reducing the time associated with such constructions through the use of state-of-the-art parallel technologies. Namely, we propose a parallel algorithm that we implement and evaluate on multicore CPUs and on many-core GPUs. We evaluate two alternative GPU implementations that use the CUDA and Thrust software platforms and a network of workstations based solution. The latter sports a workload partitioning based on Divisible Load Theory. A rigorous set of experiments highlights the differences of the proposed implementations in terms of execution time and speedup.  相似文献   

17.
丁如江  李国强 《软件学报》2019,30(7):1939-1952
近年来,基于Petri网可覆盖性的验证技术已经成功地应用于并发程序的验证与分析中.然而,由于Petri网的可覆盖性问题复杂度太高,这类技术在应用时有较大的局限性,对于输入规模较大的问题常常会出现超时的情况.而Petri网的一个子系统——非交互式Petri网,其可覆盖性和可达性复杂性均是NP完备的,同时表达力又可以作为某类并发程序的验证模型.设计并实现了可以高效验证非交互式Petri网可覆盖性的工具CFPCV.采用基于约束的方法,从模型中提取约束,并使用Z3 SMT求解器对约束进行求解,同时,通过子网可标记方法对候选解进行验证,从而保证每组解都是正确解.通过实验分析了该工具的成功率、迭代次数以及运行效率,发现该算法不仅验证成功率高,而且性能非常优异.  相似文献   

18.
分布式存储是解决大规模数据存储的一种比较有效的方法,而数据分割是实现分布式存储的前提。面对不断增长的RDF数据,提出一种基于双目标优化的RDF图分割算法(RDF Graph Partitioning algorithm based on Double Objective Optimization,RGPDOO)。RGPDOO将边割和分割平衡两项图分割指标融合到一个目标函数,并依据此目标函数,实现了RDF图的静态和动态分割。其中静态图分割通过对图进行初始划分,将图中顶点分成内核顶点、交叉顶点和自由顶点三类。然后通过计算目标函数增益对交叉和自由顶点进行分配。动态图分割部分,针对RDF元组的插入和删除给出相应的解决方案。同时,为了满足图分割目标,算法每隔一段时间[T]会根据子图的平衡性和紧密性进行一次动态调整。实验选择合成和真实数据集进行测试,并分别与几种通用的静态和动态图分割算法进行比较。实验结果表明提出的算法能够有效地实现RDF图的静态和动态分割。  相似文献   

19.
We consider a resource‐constrained project scheduling problem originating in particle therapy for cancer treatment, in which the scheduling has to be done in high resolution. Traditional mixed integer linear programming techniques such as time‐indexed formulations or discrete‐event formulations are known to have severe limitations in such cases, that is, growing too fast or having weak linear programming relaxations. We suggest a relaxation based on partitioning time into so‐called time‐buckets. This relaxation is iteratively solved and serves as basis for deriving feasible solutions using heuristics. Based on these primal and dual solutions and bounds, the time‐buckets are successively refined. Combining these parts, we obtain an algorithm that provides good approximate solutions soon and eventually converges to an optimal solution. Diverse strategies for performing the time‐bucket refinement are investigated. The approach shows excellent performance in comparison to the traditional formulations and a metaheuristic.  相似文献   

20.
The three-mode partitioning model is a clustering model for three-way three-mode data sets that implies a simultaneous partitioning of all three modes involved in the data. In the associated data analysis, a data array is approximated by a model array that can be represented by a three-mode partitioning model of a prespecified rank, minimizing a least squares loss function in terms of differences between data and model. Algorithms have been proposed for this minimization, but their performance is not yet clear. A framework for alternating least-squares methods is described in order to offset the performance problem. Furthermore, a number of both existing and novel algorithms are discussed within this framework. An extensive simulation study is reported in which these algorithms are evaluated and compared according to sensitivity to local optima. The recovery of the truth underlying the data is investigated in order to assess the optimal estimates. The ordering of the algorithms with respect to performance in finding the optimal solution appears to change as compared to the results obtained from the simulation study when a collection of four empirical data sets have been used. This finding is attributed to violations of the implicit stochastic model underlying both the least-squares loss function and the simulation study. Support for the latter attribution is found in a second simulation study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号