期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张李明魏彩屏《计算机研究与发展》1996,33(6):461-464

本文对ＣＢＳ图像重建原理中矩阵转换问题的并行算法进行了设计和分析，讨论了在不同的设计方法下的并行算法时间复杂度和通信开销，并在曙光一号并行机上实现了它的多线程并行算法，测试了它的加速比和并行效率。相似文献

2.

更实际的并行计算模型 总被引：7，自引：0，他引：7

陈国良《小型微型计算机系统》1995,16(2):1-9

过去所报导的大量并行算法在小规模的并行机上均运行得很好，然而将其移植到大规模并行机上运行时性能却很差。原因之一就是并行计算模型（如ＰＲＡＭ）过于抽象，略去了一些诸如通信、同步等算法运行时不可忽略的因素。本文介绍目前所提出的几个较能反映近代并行机性能的更为实际的并行计算模型，包括异步ＰＲＡＭ，ＢＳＰ，ｌｏｇＰ和Ｃ３模型等。当然这些模型在与真实并行机吻合的程度、可使用性和分析较复杂算法时的可操作性等方面尚存异议，但是它们的确打开了研究并行计其模型的新途径，成为当今并行算法研究的热点之一。相似文献

3.

并行算法与并行机相结合的可扩展性 总被引：6，自引：1，他引：5

迟利华刘杰李晓梅胡庆丰《计算机研究与发展》1999,36(1):47-51

可扩展性是设计并行算法和高性能并行机所要考虑的一个重要问题。文中首先分析了等效率和等速度两种可扩展性评价准则,指出其优缺点,然后在分析并行计算时间的基础上提出一种新的可扩展性评价准则（等并行开销计算比可扩展性评价准则）,新准则可用来评价并行算法与并行机相结合的可扩展性。最后用该评价准则分析了两个并行算法与ＹＨ０３高性能并行机相结合的可扩展性。相似文献

4.

并行算法可伸缩性的E微商分析法

林洪陈国良《计算机科学》1995,22(5):1-5

1.引言并行算法在并行体系结构上的可伸缩性分析(Scalability analysis)是目前巨量并行理MPP研究的中心问题之一。可伸缩性作为巨量并行机上并行算法的主要性能指标,揭示了在性能计算机相似文献

5.

并行计算时间模型和并行机系统性能 总被引：4，自引：0，他引：4

乔香珍《计算机学报》1998,21(5):413-418

本文重点从共享存储器式并行机系统体系结构中的新技术和并行软件系统的新特点分析了影响并行算法和应用程序性能的各种因素，并提出改进的并行计划时间的模型，给给出了提高并行算法和软件性能的原则和实例。相似文献

6.

Job—shop多机实时调度的并行算法

康一梅郑应平《控制与决策》1994,9(2):131-135

本文针对ＭＩＭＤ并行机对一般的Ｊｏｂ－ｓｈｏｐ调度提出实时调度的并行算法，通过分析复杂性和加速比以及实例，说明并行算法对求大批工件多台机器加工的最优调度的优越性。相似文献

7.

Job－shop多机实时调度的并行算法 总被引：1，自引：0，他引：1

康一梅郑应平《控制与决策》1994,(2)

本文针对ＭＩＭＤ并行机对一般的Ｊｏｂ－Ｓｈｏｐ调度提出实时调度的并行算法，通过分析复杂性和加速比以及实例，说明并行算法对求大批工件多台机器加工的最优调度的优越性。相似文献

8.

近优可扩展性：一种实用的可扩展性度量 总被引：2，自引：0，他引：2

陈军李晓梅《计算机学报》2001,24(2):179-182

良好的可扩展性是并行算法和并行机设计人员追求的一项重要性能指标,以往的可扩展模型都只是孤立地考虑了问题的某个侧面,比如某种性能或最大可利用资源,而没有从整体上进行权衡。这些可扩展模型可以满足计算机研究人员的需要,因为他们关注于更高的效率和利用率。但应用科学家更强调短小的执行时间。文中提出的近优可扩展模型,它同时考虑了并行系统的效率和执行两个因素。在一个典型MPP上的两个算法实例分析表明,该可扩展模型不仅可以描述并行算法的可扩展能力,而且,当按照适当的可扩展曲线扩展时,可以使得执行时间接近量短,而效率不低,这对算法和并行机的最优匹配有指导作用,同时有益于并行算法设计和改进。相似文献

9.

江南Ⅲ型并行机上的并行程序设计试验

下载免费PDF全文

迟学斌《计算机系统应用》1994,(10)

1．引言江南Ⅲ型并行机是江南计算所与中科院计算所最近协作推出的一个具有局部内存和共享主存的多机系统，它的每个处理单元由Intel公司的i860组成。目前有十个处理单元，每个处理机上有16MB的内存。该机的共享主存有64MB。从存储量上来看，是求解大规模问题的理想机器。该系统还在不断完善中。今后将增加的有FORTRAN语言的直接并行实现、进程之间的同步控制等一些方便用户的软件工具、江南见并行机结构图如由于江南见并行机是一个具有局部内存和共享主存的并行计算机，算法设计要结合这一特点，设计出适合该机执行的并行算法。我们给… 相似文献

10.

一个有限区格点模式的两种并行算法性能分析比较

朱政慧薛纪善《计算机应用》2002,22(9):36-39

并行算法的设计在气象天气预报模式的开发中是至关重要的，由于当前高性能计算领域多节点多处理器的分布／共享式并行计算机的发展，气象模式的最优并行算法设计成为研究重点。在IBM SP并行机上开发建立了新的并行有限区同化预报系统。介绍了并行模式的纯MPI方案及OpenMP/MPI混合编程方案的设计原则，分析了比较了两种方案的并行性能。相似文献

11.

HYPRE中多重网格解法器的并行可扩展性能分析

徐小文莫则尧曹小林《软件学报》2009,20(Z1):8-14

测试并分析了高性能预条件库HYPRE的多重网格解法器SMG和BoomerAMG在某国产大规模并行机数千个处理器上的可扩展性能,得到若干对线性解法器算法研究和并行实现技术发展具有启示性意义的结论.这些结论对实际复杂物理系统数值模拟中线性解法器的应用和发展具有一定的指导意义. 相似文献

12.

Highly scalable parallel algorithms for sparse matrix factorization

Gupta A. Karypis G. Kumar V. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):502-520

In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer 相似文献

13.

Parallel multilevel algorithms for hypergraph partitioning

Aleksandar Trifunović William J. Knottenbelt 《Journal of Parallel and Distributed Computing》2008

In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe for parallel coarsening, parallel greedy k-way refinement and parallel multi-phase refinement. Using an asymptotic theoretical performance model, we derive the isoefficiency function for our algorithms and hence show that they are technically scalable when the maximum vertex and hyperedge degrees are small. We conduct experiments on hypergraphs from six different application domains to investigate the empirical scalability of our algorithms both in terms of runtime and partition quality. Our findings confirm that the quality of partition produced by our algorithms is stable as the number of processors is increased while being competitive with those produced by a state-of-the-art serial multilevel partitioning tool. We also validate our theoretical performance model through an isoefficiency study. Finally, we evaluate the impact of introducing parallel multi-phase refinement into our parallel multilevel algorithm in terms of the trade off between improved partition quality and higher runtime cost. 相似文献

14.

多物理并行数值模拟中的两层紧耦合联接算法

莫则尧《计算机学报》2004,27(10):1311-1319

复杂物理现象通常由多类复杂的物理过程紧耦合构成，其数值模拟也通常由适用不同物理过程的多类并行应用程序紧耦合完成．如何设计这些物理过程之间的联接算法，既要保证程序之间数据传递的高效，又要保证程序各自运行和总体模拟的高效，还要保证程序各自开发的独立，是一个值得研究的课题．该文基于广泛应用于高温高压多物理研究中的辐射流体力学和中子输运多物理并行数值模拟，在非结构网格上，提出了两种联接算法：完全松散联接算法和两层紧耦合联接算法，前者侧重于实现程序各自运行的高效和开发的独立，后者在前者的基础上，还权衡了数据传递和总体模拟的高效．在两台并行机的数百个处理机上，通信复杂度分析和数值实验结果表明两个算法均是有效的，可推广适用于辐射或中子输运与其他流体力学的多物理并行数值模拟应用中．特别地，两层紧耦合联接算法是高效可扩展的，取得了近似最优的并行性能．相似文献

15.

Fast and scalable selection algorithms with applications to median filtering

Chin-Hsiung Wu Shi-Jinn Horng 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(10):983-992

The main contributions of this paper are in designing fast and scalable parallel algorithms for selection and median filtering. Based on the radix-/spl omega/ representation of data and the prune-and-search approach, we first design a fast and scalable selection algorithm on the arrays with reconfigurable optical buses (AROB). To the authors' knowledge, this is the most time efficient algorithm yet published, especially compared to the algorithms proposed by Han et al (2002) and Pan (1994). Then, given an N /spl times/ N image and a W /spl times/ W window, based on the proposed selection algorithm, several scalable median filtering algorithms are developed on the AROB model with a various number of processors. In the sense of the product of time and the number of processors used, most of the proposed algorithms are time or cost optimal. 相似文献

16.

非结构网格上求解中子输运方程的并行流水线Sn扫描算法 总被引：11，自引：4，他引：7

莫则尧傅连祥阳述林《计算机学报》2004,27(5):587-595

间断有限元离散纵标方法(Sn)是广泛应用于求解高维非定常中子输运方程的数值方法,它涉及几何网格空间、速度相空间和中子能群的离散,计算量很大．该文基于非结构网格,提出了基于区域分解的并行流水线Sn扫描算法,通过设计具有不同内在并行度和通信面体比的区域分解方法和队列插入算法,对两个不同物理模型,分别使用两台并行机的92个和256个CPU,获得72倍和78倍以上的加速．可扩展性能分析表明,算法的性能非常依赖于并行机的点对点通信延迟．相似文献

17.

Concatenation Algorithms for Parallel Numerical Simulation of Radiation Hydrodynamics coupled with Neutron Transport

Mo?Zeyao Email author 《International journal of parallel programming》2005,33(1):57-71

Complex physical phenomena can be usually split into several interacting physical computational models and can be numerically simulated by coupling parallel codes individually designed for these models. Besides rational splitting and efficient numerical methods for different models, we must design scalable parallel algorithms to concatenate these parallel codes. Meanwhile, three objectives should be well balanced. The first is how to efficiently transfer data among multiple physical models, the second is how to inherit original scalability of parallel codes and then ensure good scalability of full simulation, and the third is how to ensure independent or simultaneous developments of codes by different research groups. This paper presents two concatenation algorithms for parallel numerical simulation of radiation hydrodynamics coupled with neutron transport on unstructured grid. The first, Full Loose Concatenation Algorithm, focuses on independent development and inheritance of original scalability, and the second, Two Level Compact Concatenation Algorithm, focuses on optimal tradeoff among above three objectives. Theoretical analysis for communicational complexity and parallel numerical experiments using hundreds of processors on two parallel machines have shown that these two algorithms are efficient and can be generalized to other parallel numerical simulations for hydrodynamics coupled with radiation or neutron transport. In particular, the second algorithm is linearly scalable and has achieved theoretical optimal performance. 相似文献

18.

Run-length chain coding and scalable computation of a shape''s moments using reconfigurable optical buses

Chin-Hsiung Wu Shi-Jinn Horng 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2004,34(2):845-855

The main contribution of this paper is the design of several efficient algorithms for modified run-length chain coding and for computing a shape's moments on arrays with reconfigurable optical buses. The proposed algorithms are based on the boundary representation of an object. Instead of using chain code, the boundary can be represented by a modified run-length chain code, where each entity represents a line segment (two adjacent corner pixels). The sequential nature of the chain code makes it difficult to be parallelized. We first propose two constant time algorithms for boundary extraction and run-length chain coding. To the authors' knowledge, these are the most time efficient algorithms yet published. Based on the modified run-length chain coding, and the advantages of both optical transmission and electronic computation, a constant time parallel algorithm for computing a shape's moments using N x N processors is proposed. Additionally, instead of using N x N processors, a scalable moment algorithm using r x r processors is also derived, where r < N. Based on the product of time and the number of processors used, both proposed parallel algorithms are time and cost optimal. 相似文献

19.

一种提高并行数据挖掘效率的方法

佘春东范植华孙世新车著明唐剑《计算机科学》2004,31(2):132-134

发现关联规则是数据挖掘的一项重要任务，本文介绍了几种数据挖掘的串行和并行算法。其中IDD算法是一种高效的和易于扩展的发现关联规则的并行算法，然而，当处理嚣数目增加时，由于负载的失衡导致其效率的严重下降，于是通过引入近似算法成功地解决了这个问题。我们给出了两种近似算法和其性能证明，其一是在线算法，另一种是离线算法。在本文的最后，我们进行了改进的IDD算法的复杂性分析。相似文献

20.

What is ahead for parallel computing

Wen-mei Hwu 《Journal of Parallel and Distributed Computing》2014

With the industry-wide switch to multicore and manycore architectures, parallel computing has become the only venue in sight for continued growth in application performance. In order for the performance of an application to grow with future generations of hardware, a significant portion of its computation must be done with scalable parallel algorithms. It is therefore important to develop and deploy as many scalable parallel algorithms as possible. This paper takes a critical look at the major challenges involved in the development of scalable parallel algorithms and points to needs for compiler tool innovations to help address these challenges. 相似文献