首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
95027更实际的并行计算模型小型微型计算机系统。-1995(2).—1~9介绍了几个能反映近代并行机性能的更为实际的并行计算模型,包括异步PRAM、BSPlogP及c ̄3模型等。虽然这些模型在与真实并行机吻合的程度,可使用性和分析较复杂算法时的可操...  相似文献   

2.
BSP模型独立于并行体系结构,既可作为并行计算模型,又可作为并行程序设计模型。提出了基于BSP模型的H-V事务模型,适用于长、短事务和长短事务混杂的情况。给出了在无共享结构下实现并行事务处理的进程结构。该结构不仅实现了事务内及事务间并行性,而且使人有可用性(availability)和可扩充性(scalability),而后给出了适用于该模型的基于时间戳的多版本并发控制协议,最后描述了事务在超步结构下的运行过程。性能测试表明,使用该模型进行事务处理时可获得良好的事务响应时间和加速比。  相似文献   

3.
近优可扩展性:一种实用的可扩展性度量   总被引:2,自引:0,他引:2  
陈军  李晓梅 《计算机学报》2001,24(2):179-182
良好的可扩展性是并行算法和并行机设计人员追求的一项重要性能指标,以往的可扩展模型都只是孤立地考虑了问题的某个侧面,比如某种性能或最大可利用资源,而没有从整体上进行权衡。这些可扩展模型可以满足计算机研究人员的需要,因为他们关注于更高的效率和利用率。但应用科学家更强调短小的执行时间。文中提出的近优可扩展模型,它同时考虑了并行系统的效率和执行两个因素。在一个典型MPP上的两个算法实例分析表明,该可扩展模型不仅可以描述并行算法的可扩展能力,而且,当按照适当的可扩展曲线扩展时,可以使得执行时间接近量短,而效率不低,这对算法和并行机的最优匹配有指导作用,同时有益于并行算法设计和改进。  相似文献   

4.
在对标准微粒群算法分析的基础上,将它与BSP并行计算模型相结合,设计并实现了一种基于BSP并行计算模型的并行微粒群算法.这种基于BSP并行计算模型的并行微粒群算法改变了标准微粒群算法的结构,提高了算法求解效率.实验结果表明,该并行算法的性能比标准微粒群算法有了很大的提高.  相似文献   

5.
本文系统地总结和探讨了共享和分布式存储环境下的并行计算时间模型。微观上,结合并行机结构特征和通信机制,揭示了延长算法运行时间的关键因素,并据此提出一些优化原则和效率评价准则,能辅助用户修改并行算法达到最优性能;宏观上,给出了基本消息传递的常用通信原语类型和部分原语操作时间经验公式,能辅助用户选择最优通信原语和问题粒度,正确预测程序的运行时间和性能。  相似文献   

6.
近年来异构并行计算在高性能科学计算和通用应用领域受到广泛研究。本文结合多种代表性并行计算模型,给出异构环境中的HBSP模型和程序开销计算方法。采用基于消息长度的线性模型使通信开销的计算更精确,解除原有BSP模型对h-rela-tion的限制,使程序和算法在异构环境中的设计更加灵活。当构成BSP计算机的各处理机速度相同且原有BSP算法达到最优(即各处理机上所分配的计算量与通信量完全均衡)时,HBSP模型等同于原有模型。  相似文献   

7.
更实际的并行计算模型   总被引:7,自引:0,他引:7  
过去所报导的大量并行算法在小规模的并行机上均运行得很好,然而将其移植到大规模并行机上运行时性能却很差。原因之一就是并行计算模型(如PRAM)过于抽象,略去了一些诸如通信、同步等算法运行时不可忽略的因素。本文介绍目前所提出的几个较能反映近代并行机性能的更为实际的并行计算模型,包括异步PRAM,BSP,logP和C3模型等。当然这些模型在与真实并行机吻合的程度、可使用性和分析较复杂算法时的可操作性等方面尚存异议,但是它们的确打开了研究并行计其模型的新途径,成为当今并行算法研究的热点之一。  相似文献   

8.
多任务CAD系统的执行中心模型   总被引:1,自引:1,他引:0  
针对直接操纵的多任务CAD系统的实时反馈、并发对话等需求,提出CAD 系统的基于事件驱动的层次化执行中心模型EDHM。该模型遵循对话独立性原则,将执行中心划分为3个层次5大部件,支持连续语义反馈和异步并发对话,并以统一的框架支持顺序和异步交织的交互风格。此模型已被应用到多任务CAD支撑系统OpenDesign的执行中心构造中。实践表明,基于EDHM模型建立的执行中心满足直接操纵和多进程的需求,具有结构上的独立性,易于移植和封装复用。  相似文献   

9.
张尉东  崔唱 《软件学报》2019,30(12):3622-3636
提出一种并行计算模型——多步前进同步并行(delta-stepping synchronous parallel,简称DSP)模型和一种形式化表示方法.针对大同步并行(bulk synchronous parallel,简称BSP)模型同步次数多、收敛速度慢的特点,该模型能够有效地减少同步次数和通信开销,进而加速算法的收敛.通过形式化表示和迭代过程推导,发现DSP是一种比BSP更一般的并行计算模型.在BSP的基础上,DSP将BSP中执行1次的局部计算变为执行多次.理论分析和验证实验表明,新增加的局部计算步可以进一步挖掘和利用隐藏在数据分区中的局部性.同时,通过“计算换通信”原理增加的局部计算并非越多越好.最后的实验结果显示,DSP模型能够有效地效减少算法的迭代轮数及收敛时间,对BSP的加速可高达到数倍乃至数十倍.  相似文献   

10.
基于Web与组件技术的企业应用系统设计模型   总被引:40,自引:2,他引:38  
文章介绍了“Browse-WebServe-DBMS”体系结构的三种模式及相关的程序设计技术[1],提出了一个基于Web与组件技术的企业应用系统软件设计模型,开发了一个应用实例,并统计分析了相关数据,说明了基于Web与组件技术的软件设计方法对于提高组件重用率,缩短开发周期,提高开发效率,提高系统的可维护性等具有明显的优势。  相似文献   

11.
To achieve scalable parallel performance in molecular dynamics simulations, we have modeled and implemented several dynamic spatial domain decomposition algorithms. The modeling is based upon the bulk synchronous parallel architecture model (BSP), which describes supersteps of computation, communication, and synchronization. Using this model, we have developed prototypes that explore the differing costs of several spatial decomposition algorithms and then use this data to drive implementation of our molecular dynamics simulator,Sigma. The parallel implementation is not bound to the limitations of the BSP model, allowing us to extend the spatial decomposition algorithm. For an initial decomposition, we use one of the successful decomposition strategies from the BSP study and then subsequently use performance data to adjust the decomposition, dynamically improving the load balance. The motivating reason to use historical performance data is that the computation to predict a better decomposition increases in cost with the quality of prediction, while the measurement of past work often has hardware support, requiring only a slight amount of work to modify the decomposition for future simulation steps. In this paper, we present our adaptive spatial decomposition algorithms, the results of modeling them with the BSP, the enhanced spatial decomposition algorithm, and its performance results on computers available locally and at the national supercomputer centers.  相似文献   

12.
Given m ordered segments that form a partition of some universe (e.g., a two-dimensional strip), the multisearch problem consists of determining, for a set of n query points in the universe, the segments they belong to. We present the first nontrivial parallel deterministic scheme for performing multisearch on a distributed-memory machine when m=ω(n) . The scheme is designed on the BSP* model of parallel computation, a variant of Valiant's BSP which rewards blockwise communication, and relies on a suitable redundant representation of the segments. The time needed to answer the queries is analyzed as a function of the redundancy and of the BSP* parameters. We show that optimal performance can be obtained using logarithmic redundancy. We also prove a lower bound on the communication requirements of any deterministic multisearch scheme realized on a distributed-memory machine. The lower bound exhibits a tradeoff between the redundancy used to represent the segments and the performance of the scheme. Received June 1, 1997; revised March 10, 1998.  相似文献   

13.
可预测扩展并行性能的并行程序设计模型   总被引:1,自引:0,他引:1  
BSP(Bulk-Synchronous)模型是独立于并行体系结构的,即可作为并行计算模型又可看作并地程序设计模型,该模型使程序员在算法设计阶段和编程调试阶段可精确地分析和预测并行程序性能。BSP程序可移植性强,可在多种并行系统发PVM,MPI等上实现。  相似文献   

14.
刘瑞祥 《计算机工程》2001,27(12):166-167
BSP模型是独立于并行体系结构,即可作为并行计算模型又可作为并行程序设计模型。利用该模型进行并行程序设计简单、方便,而且可移植性强,可在多种并行系统上实现。  相似文献   

15.
This paper gives an overview of two related tools that we have developed to provide more accurate measurement and modelling of the performance of message-passing communication and application programs on distributed memory parallel computers. MPIBench uses a very precise, globally synchronised clock to measure the performance of MPI communication routines. It can generate probability distributions of communication times, not just the average values produced by other MPI benchmarks. This allows useful insights to be made into the MPI communication performance of parallel computers, and in particular how performance is affected by network contention. The Performance Evaluating Virtual Parallel Machine (PEVPM) provides a simple, fast and accurate technique for modelling and predicting the performance of message-passing parallel programs. It uses a virtual parallel machine to simulate the execution of the parallel program. The effects of network contention can be accurately modelled by sampling from the probability distributions generated by MPIBench. These tools are particularly useful on clusters with commodity Ethernet networks, where relatively high latencies, network congestion and TCP problems can significantly affect communication performance, which is difficult to model accurately using other tools. Experiments with example parallel programs demonstrate that PEVPM gives accurate performance predictions on commodity clusters. We also show that modelling communication performance using average times rather than sampling from probability distributions can give misleading results, particularly for programs running on a large number of processors.  相似文献   

16.
Many scientific applications require array redistribution when the programs run on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution, otherwise the performance of the programs will degrade considerably. The redistribution overheads consist of two parts: index computation and inter-processor communication. If there is no communication scheduling in a redistribution routine, the inter-processor communication will incur a larger communication idle time when there exists node contention and/or difference among message lengths during one particular communication step. In order to solve this problem, in this paper, we propose an efficient scheduling scheme that not only minimizes the number of communication steps and eliminates node contention, but also minimizes the difference of message lengths in each communication step. Thus, the communication idle time is reduced in redistribution routines.  相似文献   

17.
Based on the framework of BSP,a Hierarchical Bulk Synchronous Parallel(HBSP)performance model is introduced in this paper to capture the performance optimization problem for various stages in parallel program development and to accurately predict the performance of a parallel program by considering factors causing variance at local computation and global communication.The related methodology has been applied to several real applications and the results show that HBSP is a suitable model for optimizing parallel programs.  相似文献   

18.
A simple and efficient parallel FFT algorithm using the BSP model   总被引:1,自引:0,他引:1  
We present a new parallel radix-4 FFT algorithm based on the BSP model. Our parallel algorithm uses the group-cyclic distribution family, which makes it simple to understand and easy to implement. We show how to reduce the communication cost of the algorithm by a factor of 3, in the case that the input/output vector is in the cyclic distribution. We also show how to reduce computation time on computers with a cache-based architecture. We present performance results on a Cray T3E with up to 64 processors, obtaining reasonable efficiency levels for local problem sizes as small as 256 and very good efficiency levels for local sizes larger than 2048.  相似文献   

19.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号