首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the problem of simultaneous exchange of different packets between every pair of processors. The algorithms proposed for these problems are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions. In contrast, algorithms in the literature are optimal only within an additive or multiplicative factor.  相似文献   

2.
Oblivious permutation routing in binary d-cubes has been well studied in the literature. In a permutation routing, each node initially contains a packet with a destination such that all the 2d destinations are distinct. Kaklamanis et al. (Math. Syst. Theory 24 (1991) 223–232) used the decomposability of hypercubes into Hamiltonian circuits to give an asymptotically optimal routing algorithm. The notion of “destination graph” was first introduced by Borodin and Hopcroft to derive lower bounds on routing algorithms. This idea was recently used by Grammatikakis et al. (Proceedings of the Advancement in Parallel Computing, Elsevier, Amsterdam, 1993) to construct many–one routing algorithms for the binary 2-cube and 3-cube. In the present paper, further theoretical development is made along this line. It is then applied to obtain algorithms for binary d-cubes with d up to 12, which compare favorably with the above-mentioned “Hamiltonian circuit” algorithm. Some results on t-nary cubes with t3 are also obtained.  相似文献   

3.
Orthogonal packing problems are natural multidimensional generalizations of the classical bin packing problem and knapsack problem and occur in many different settings. The input consists of a set I={r1,…,rn}I={r1,,rn} of dd-dimensional rectangular items ri=(ai,1,…,ai,d)ri=(ai,1,,ai,d) and a space QQ. The task is to pack the items in an orthogonal and non-overlapping manner without using rotations into the given space. In the strip packing setting the space QQ is given by a strip of bounded basis and unlimited height. The objective is to pack all items into a strip of minimal height. In the knapsack packing setting the given space QQ is a single, usually unit sized bin and the items have associated profits pipi. The goal is to maximize the profit of a selection of items that can be packed into the bin.  相似文献   

4.
Dynamic redistribution of arrays is required very often in programs on distributed presents efficient algorithms for redistribution between different cyclic(k) distributions, as defined in High Performance Fortran. We first propose special optimized algorithms for a cyclic(x) to cyclic(y) redistribution when x is a multiple of y, or y is a multiple of x. We then propose two algorithms, called the GCD method and the LCM method, for the general cyclic(x) to cyclic(y) redistribution when there is no particular relation between x and y. We have implemented these algorithms on the Intel Touchstone Delta, and find that they perform well for different array sizes and number of processors  相似文献   

5.
Consider an n-dimensional SIMD hypercube Hn with 3n/2-1 faulty nodes. With , and n+19 steps, this paper presents some one-to-all broadcasting algorithms on the faulty SIMD Hn. The sequence of dimensions used for broadcasting in each algorithm is the same regardless of which node is the source. The proposed one-to-all broadcasting algorithms can tolerate n/2 more faulty nodes than Raghavendra and Sridhar's algorithms (J. Parallel Distrb. Comput. 35 (1996) 57) although 8 extra steps are needed. The fault-tolerance improvement of this paper is about 50%.  相似文献   

6.
Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is significantly smaller. It takes O(max(P, Q)) time and is less than 1 percent of the data transfer time. In comparison, the schedule computation time using the state-of-the-art scheme (which is based on the bipartite matching scheme) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have implemented the algorithm using C and MPI on an IBM SP2. Results show that our algorithm performs better than the previous algorithms with respect to the total redistribution time, which includes the time for data transfer, schedule, and index computation  相似文献   

7.
Branch-and-Combine (BaC) clock distribution has recently been introduced. The most interesting aspect of the new scheme is its ability to bound skew by a constant irrespective of network size. In this paper, we introduce algorithms for systematic synthesis of BaC networks for clocking meshes, tori, and hypercubes of different dimensionalities. For meshes our approach relies on filing techniques. We start with the identification of basic proper tiles satisfying certain criteria. We define a set of valid transformations on tiles. By appropriately applying a sequence of transformations on a basic proper tile, one could synthesize a valid BaC network. We formally introduce methods and procedures for applying the above steps to systematically construct different valid BaC network designs for 2D and 3D meshes. To construct BaC networks for clocking hypercubes of any dimensionality we describe a formal methodology. In this case, we utilize an approach called replication which is based on constructing larger hypercube clocking networks from smaller ones. We combine the techniques for 2D, 3D meshes with replication techniques to formulate a methodology applicable to meshes and tori of dimensionality greater than three. We provide proofs of correctness for the algorithms we introduce. Besides, we formally define an optimality criterion based on link costs which is utilized to check the optimality of the synthesized network designs. In the case of meshes, we show that the majority of synthesized networks are optimal with respect to our defined criterion. For those suboptimal networks, we describe a procedure for identifying and removing unnecessary (redundant) links. The procedure is guaranteed to optimize the network without changing its behavioral parameters  相似文献   

8.
集合索引结构及其联接操作   总被引:1,自引:0,他引:1  
汪卫  谢闽峰  陶春  施伯乐 《软件学报》2004,15(11):1661-1670
集合类型是面向对象数据库和对象-关系数据库中的一种重要的数据类型.提出了集合类型数据的一种索引结构Set_struc,并提出了基于Set_struc的集合联接算法.Set_struc通过合并集合数据的公共前缀组织数据.这种方法可以减少重复数据和重复模式的存储空间,并通过基于树的联接算法提高集合数据上的联接操作的性能.其性能优于现有的算法,如PSJ(partition based join).  相似文献   

9.
Data distribution in memory or on disks is an important factor influencing the performance of parallel applications. On the other hand, programs or systems, like a parallel file system, frequently redistribute data between memory and disks. This paper presents a generalization of previous approaches of the redistribution problem. We introduce algorithms for mapping between two arbitrary distributions of a data set. The algorithms are optimized for multidimensional array partitions. We motivate our approach and present potential utilizations. The paper also presents a case study, the employment of mapping functions, and redistribution algorithms in a parallel file system.
Walter F. TichyEmail:
  相似文献   

10.
11.
In the recent investigations of reducing the relational join operation complexity several hash-based partitioned-join stategies have been introduced. All of these strategies depend upon the costly operation of data space partitioning before the join can be carried out. We had previously introduced a partitioned-join based on a dynamic and order preserving multidimensional data organization called DYOP. The present study extends the earlier research on DYOP and constructs a simulation model. The simulation studies on DYOP and subsequent comparisons of all the partitioned-join methodologies including DYOP have proven that space utilization of DYOP improves with the increasing number of attributes. Furthermore, the DYOP based join outperforms all the hash-based methodologies by greatly reducing the total I/O bandwidth required for the entire partitioned-join operation. The comparison model is independent of the architectural issues such as multiprocessing, multiple disk usage, and large memory availability all of which help to further increase the efficiency of the operation.  相似文献   

12.
Let \(G = (V,E)\) be a connected graph. The conditional edge connectivity \(\lambda _\delta ^k(G)\) is the cardinality of the minimum edge cuts, if any, whose deletion disconnects \(G\) and each component of \(G - F\) has \(\delta \ge k\) . We assume that \(F \subseteq E\) is an edge set, \(F\) is called edge extra-cut, if \(G - F\) is not connected and each component of \(G - F\) has more than \(k\) vertices. The edge extraconnectivity \(\lambda _\mathrm{e}^k(G)\) is the cardinality of the minimum edge extra-cuts. In this paper, we study the conditional edge connectivity and edge extraconnectivity of hypercubes and folded hypercubes.  相似文献   

13.
Array redistribution is usually needed for more efficiently executing a data-parallel program on distributed memory multicomputers. To minimize the redistribution data transfer cost, processor mapping techniques were proposed to reduce the amount of redistributed data elements. Theses techniques demand that the beginning data elements on a processor not be redistributed in the redistribution. On the other hand, for satisfying practical computation needs, a programmer may require other data elements to be un-redistributed (localized) in the redistribution. In this paper, we propose a flexible processor mapping technique for the Block-Cyclic redistribution to allow the programmer to localize the required data elements in the redistribution. We also present an efficient redistribution method for the redistribution employing our proposed technique. The data transfer cost reduction and system performance improvement for the redistributions with data localization are analyzed and presented in our experimental results.  相似文献   

14.
15.
Tick data are used in several applications that need to keep track of values changing over time, like prices on the stock market or meteorological measurements. Due to the possibly very frequent changes, the size of tick data tends to increase rapidly. Therefore, it becomes of paramount importance to reduce the storage space of tick data while, at the same time, allowing queries to be executed efficiently. In this paper, we propose an approach to decompose the original tick data matrix by clustering their attributes using a new clustering algorithm called Storage-Optimizing Hierarchical Agglomerative Clustering (SOHAC). We additionally propose a method for speeding up SOHAC based on a new lower bounding technique that allows SOHAC to be applied to high-dimensional tick data. Our experimental evaluation shows that the proposed approach compares favorably to several baselines in terms of compression. Additionally, it can lead to significant speedup in terms of running time.  相似文献   

16.
17.
Efficient aggregation algorithms for compressed data warehouses   总被引:9,自引:0,他引:9  
Aggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. This paper presents a set of aggregation algorithms on compressed data warehouses for multidimensional OLAP. These algorithms operate directly on compressed data sets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the data set parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms  相似文献   

18.
刘金岭 《计算机应用》2008,28(7):1689-1691
对空间多维数据的复杂查询是多维数据研究的重点和难点,目前研究的结论相对较少。在传统算法的基础上,进行了几个方面的改进:按分组属性值进行数据分块;对分组数据进行有效的排序;在聚集函数的应用上进行优化。模拟数据的试验表明:改进算法较大地提高了查询效率。  相似文献   

19.
In distributed systems, data may be correlated due to accesses from clients and the correlation has some impact on date placement, and existing research works focus on independent data objects. In this paper, we address both the scalability and the stability of the data placement solutions in internet environment. We first show that replica allocation decisions can be made locally for each replica site in a tree network, with data access knowledge of its neighbors. We then develop a new replication cost model for correlated data objects in Internet environment. Based on the cost model and the algorithms in previous research, we develop a distributed optimal replica allocation algorithm (DOPR) for correlated data in internet environment. A distributed heuristic algorithm (DHPR) is then developed to efficiently make replica placement decisions. The algorithm obtains sub-optimal solutions for the correlated data model and yields significant performance gains. Experimental studies show that the distributed heuristic allocation algorithm significantly outperforms the general frequency-based replication schemes (in which the replication decision of each data object is made based on the number of accesses on that data object).  相似文献   

20.
高维数据聚类方法综述*   总被引:10,自引:2,他引:10  
总结了高维数据聚类算法的研究现状,分析比较了算法性能的主要差异,并指出其今后的发展趋势,即在子空间聚类过程中融入其他传统聚类方法的思想,以提高聚类性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号