期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

朱子玉都志辉李三立《小型微型计算机系统》2002,23(8):907-912

本文形式化描述并发展了通信Microbenchmark技术，并在这种技术的基础上，比较系统地测试了机群系统上四种并行通信系统－－GM－API、基于100M以太网的MPICH，基于Myrinet网络的MPICH和GM－MPI－－的表征通信性能的LogP参数，并通过比较，得出四种并行通信系统特点优劣关系，最后，给出了各个通信系统对不同的并行应用的适用情况。相似文献

2.

FIR滤波器分块并行算法分析

彭文钦孙世新《计算机应用》2000,20(3):32-35

采用LogP模型[1] 对FIR滤波器分块并行算法进行了详尽的分析 ,讨论了在网络并行环境中 ,分块长度对分块并行算法效率的影响 ,提出了提前发送数据块的传输模式 ,得到该模式下的并行效率 ,并在网络并行环境上进行了测试。相似文献

3.

分布式交互系统中连续媒体的延迟一致性模型 总被引：1，自引：0，他引：1

秦啸《软件学报》2002,13(6):1029-1039

随着多媒体和网络技术的发展,分布式交互系统被广泛应用.在这种系统中,多个客户端通过局域或广域网交互连接.为使响应时间短,本地节点产生的操作立即在本地执行,并广播到其他远程节点执行.在该系统中,一致性维护是一个关键问题,而文献中研究的一致性问题几乎都是基于不连续媒体的.通过一个实例,指出连续媒体中的一种不一致问题.虽然该问题可以通过绝对一致模型解决,但绝对一致模型应用在广域网中将导致长响应时间.为解决绝对一致模型中响应时间过长的问题,提出了延迟一致性模型(简称为DCM模型).在DCM中,如果节点i产生了作用于对象x上的操作,该操作到达远程节点后强行延迟一段时间并要求在统一规定的时间执行.通过该方法,对象x在其他远程节点上的状态将最终保持一致.DCM很灵活,因为不同的对象可以有不同的强行延迟时间.如果分布式交互系统建立在实时网络上,这种强行延迟时间将成为实时通信中实时消息调度的重要参数. 相似文献

4.

广域网连接拨号备份的实现

王劲松《微型机与应用》1999,(6)

广域网连接是实现远程网连接的关键,通常采用租用专线来实现,但专线的故障,将导致广域网连接失败。本文介绍的结合ＤＤＲ实现广域网连接拨号备份的方法,可实现在专线出现“故障时,自动建立临时备份线路,保障远程网间正常通信。相似文献

5.

面向深度学习图像分类的GPU并行方法研究

韩彦岭沈思扬徐利军王静张云周汝雁《计算机工程》2023,49(1):191-200

针对深度学习图像分类场景中多GPU并行后传输效率低的问题,提出一种低时间复杂度的Ring All Reduce改进算法。通过分节点间隔配对原则优化数据传输流程,缓解传统参数服务器并行结构的带宽损耗。基于数据并行难以支撑大规模网络参数及加速延缓的问题,根据深度学习主干网络所包含的权重参数低于全连接层权重参数、同步开销小、全连接层权重大与梯度传输开销过高等特点,提出GPU混合并行优化算法,将主干网络进行数据并行,全连接层进行模型并行,并通过改进的Ring All Reduce算法实现各节点之间的并行后数据通信,用于基于深度学习模型的图像分类。在Cifar10和mini ImageNet两个公共数据集上的实验结果表明,该算法在保持分类精度不变的情况下可以获得更好的加速效果,相比数据并行方法,可达到近45%的提升效果。相似文献

6.

基于网格的并行算法研究 总被引：6，自引：0，他引：6

罗寿文李代平张信一方海翔《计算机工程与应用》2005,41(8):75-77

分析了传统分布式并行计算和网格基础上并行计算技术应用中存在的问题。然后将LogP并行计算模型拓展到网格上,给出双层LogP模型和设计策略。针对网格特点对CG并行算法进行了改进,并有很好的效果。相似文献

7.

基于曙光5000并行机的远程并行调试器

陈勇李春生安虹郑启龙陈志辉《计算机科学》2004,31(3):179-182

并行调试对并行程序开发非常重要,然而传统的远程并行调试方式是采用登录(telnet)命令通过命令行的文本界面进行,非常繁杂。本文介绍了为曙光3000系统设计实现的远程并行调试器RPB(Remote Parallel Debugger)。RPB实现了完全并行调试等功能,并且具备图形用户界面,用户界面采用Java语言和Swing工具包实现,具备平台独立的特点。RPB采用客户端/服务器模式,客户端和服务器之间的通信采用当今流行的CORBA中间件技术。RPB支持通过局域网或广域网远程调试并行机上的程序,屏蔽了客户平台的差异和并行机地理位置上的差异,因而大大提高了并行机的好用性。相似文献

8.

多核处理器并行计算模型研究

李静梅张岐王军锋《电脑学习》2011,1(5)

针对并行计算机体系结构中没有通用的计算模型这一问题,分析了一些现有的典型计算模型,在同步性、通信方式、参数方面进行比较,以LogGP模型为基础提出一种改进的mzLogGP模型。利用MPI并行算法对满足节点计算资源非独占、网络存在拥塞条件下的并行程序进行分析与测试,通过增加memory层次化层数和网络拥塞指数这两个参数,计算其计算开销和通信开销,将实测时间与预测时间进行比较,可知随节点数的增加系统误差不断减小,说明该新模型能改善并行应用在多核处理器集群平台上运行的性能,具有较好的可扩展性。相似文献

9.

WAPM:适合广域分布式计算的并行编程模型 总被引：1，自引：0，他引：1

付崇国徐胜超《计算机应用》2009,29(8)

早期的MPI与OpenMP等编程模型由于扩展性限制或并行粒度的差异而不适合于大规模的广域动态Internet环境.提出了一个用于广域网络范围内的并行编程模型(WAPM),为应用的分布式计算的编程提供了一个新的可行解决方案.WAPM由通信库、通信协议和应用编程接口组成,并且具有通用编程、自适应并行、容错性等特点,通过选择合适的编程语言,就可形成一个广域范围内的并行程序设计环境.以分布式计算平台P2HP为工作平台,描述了WAPM分布式计算的实施过程.实验结果表明,WAPM是一个通用的、可行的、性能较好的编程模型. 相似文献

10.

多核处理器并行计算模型研究

李静梅张岐王军锋《电脑学习》2011,(3):9-12,20

针对并行计算机体系结构中没有通用的计算模型这一问题,分析了一些现有的典型计算模型,在同步性、通信方式、参数方面进行比较,以LogGP模型为基础提出一种改进的mzLogGP模型。利用MPI并行算法对满足节点计算资源非独占、网络存在拥塞条件下的并行程序进行分析与测试,通过增加memory层次化层数和网络拥塞指数这两个参数,计算其计算开销和通信开销,将实测时间与预测时间进行比较,可知随节点数的增加系统误差不断减小,说明该新模型能改善并行应用在多核处理器集群平台上运行的性能,具有较好的可扩展性。相似文献

11.

更实际的并行算法的设计

寿标李晓峰《计算机研究与发展》1996,33(6):445-449

大规模并行计算机的出现和发展迫切要求有新的并行处设计理论和技术来指导更实际的并行算法的设计。本文首先简单介绍了针对ＭＰＣ提出孤ＬｏｇＰ和Ｂａｒｒｉｅｒ－ＬｏｇＰ并行计算模型，然后借助于Ｂａｒｒｉｅｒ－ＬｏｇＰ模型从通信平衡、数据分配和重叠通信与计算这三个方面讨论了更实际的并行算法设计的一般方法和技巧。相似文献

12.

Fast parallel sorting under LogP: experience with the CM-5

Dusseau A.C. Culler D.E. Schauser K.E. Martin R.P. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(8):791-805

In this paper, we analyze four parallel sorting algorithms (bitonic, column, radix, and sample sort) with the LogP model. LogP characterizes the performance of modern parallel machines with a small set of parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). We develop implementations of these algorithms in Split-C, a parallel extension to C, and compare the performance predicted by LogP to actual performance on a CM-5 of 32 to 512 processors for a range of problem sizes. We evaluate the robustness of the algorithms by varying the distribution and ordering of the key values. We also briefly examine the sensitivity of the algorithms to the communication parameters. We show that the LogP model is a valuable guide in the development of parallel algorithms and a good predictor of implementation performance. The model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. With an empirical model of local processor performance, LogP predictions closely match observed execution times on uniformly distributed keys across a broad range of problem and machine sizes. We find that communication performance is oblivious to the distribution of the key values, whereas the local processor performance is not; some communication phases are sensitive to the ordering of keys due to contention. Finally, our analysis shows that overhead is the most critical communication parameter in the sorting algorithms 相似文献

13.

Design and Performance of Parallel and Distributed Approximation Algorithms for Maxcut

《Journal of Parallel and Distributed Computing》1997,46(1):48-61

We develop and experiment with a new parallel algorithm to approximate the maximum weight cut in a weighted undirected graph. Our implementation starts with the recent (serial) algorithm of Goemans and Williamson for this problem. We consider several different versions of this algorithm, varying the interior-point part of the algorithm in order to optimize the parallel efficiency of our method. Our work aims for an efficient, practical formulation of the algorithm with close-to-optimal parallelization. We analyze our parallel algorithm in the LogP model and predict linear speedup for a wide range of the parameters. We have implemented the algorithm using the message passing interface (MPI) and run it on several parallel machines. In particular, we present performance measurements on the IBM SP2, the Connection Machine CM5, and a cluster of workstations. We observe that the measured speedups are predicted well by our analysis in the LogP model. Finally, we test our implementation on several large graphs (up to 13,000 vertices), particularly on large instances of the Ising model. 相似文献

14.

基于主动消息与LogP模型的并行程序设计

李晓峰寿标《计算机研究与发展》1996,33(6):428-432

主动消息将通信和计算集成为一体，是一种相当有效和高适应性的通信体系结构。ＬｏｇＰ模型是一种面向实际的并行算法设计模型?较好地反映了当前ＭＰＰ的关键性能参数。本文讨论了它的特点，在此基础上着重分析了它在并行程序设计上的互补性。相似文献

15.

LogP模型的改进与FFT算法的优化设计 总被引：8，自引：0，他引：8

李晓峰寿标《计算机研究与发展》1996,33(6):438-444

作为大规模并行机上的并行计算模型，ＬｏｇＰ为我们提供了独立于具体系统的算法设计依据。虽然它可以精确地调度通信与计算，但却陷入了繁杂的细节设计，导致实际结果和设计期望相去甚远。我们从算法设计和模型概括两方面对它进行了改进，将路障同步和长消息引入了ＬｏｇＰ模型，既更好地发挥发并行机的效率，又使得实际结果接近设计期望。相似文献

16.

Assessing fast network interfaces

Culler D.E. Lok Tin Liu Martin R.P. Yoshikawa C.O. 《Micro, IEEE》1996,16(1):35-43

Assessing the performance of emerging high-speed networks is difficult. Our communication microbenchmark evaluates design changes in a communications system. Our microbenchmark generates a graphical signature from which we extract communication performance parameters for the hardware-software tandem. We use the LogP conceptual model for communication systems. (LogP stands for the parameters latency, overhead, gap, and processors). We study three important platforms that represent diverse points in the network interface design space: the Intel Paragon, Meiko CS-2, and a cluster of Sparcstation-20s connected by Myrinet switches using LANai SBus cards. Our study views each machine as a gray box supporting Active Messages and conforming to the LogP framework. We devise a simple set of communication microbenchmarks and measure the performance on each platform to obtain the LogP parameters 相似文献

17.

将基于PRAM模型的算法转换为基于LogP模型的算法的一种方法

焦进张晓云《小型微型计算机系统》1997,18(6):27-33

本文提出了一种将基于ＰＲＡＭ模型的并行算法转换为基于ＬｏｇＰ模型的并行算法的方法，并在算法的通信结构有无环图上进行线性分组的方法，实现基于ＬｏｇＰ模型的算法的优化。相似文献

18.

An extended dominating node approach to broadcast and globalcombine in multiport wormhole-routed mesh networks

Yih-Jia Tsai McKinley P.K. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(1):41-58

A new approach to the design of collective communication operations in wormhole-routed mesh networks is described. The approach extends the concept of dominating sets in graph theory by accounting for the relative distance-insensitivity of the wormhole switching strategy and by taking advantage of a multiport communication architecture, which allows each node to simultaneously transmit messages on different outgoing channels. Collective communication operations are defined in terms of sets of extended dominating nodes (EDNs). The nodes in a set of EDNs can deliver (receive) messages to (from) a different, larger set of nodes in a single message-passing step under dimension-ordered wormhole routing and without channel contention among messages. The EDN model can be applied to different collective operations in 2D and 3D mesh networks. The authors focus on EDN-based broadcast and global combine operations. Performance evaluation results are presented that confirm the advantage of this approach over other methods 相似文献

19.

基于LogP简化模型的矩阵求逆并行算法研究

陈天麒曾庆华孙世新《计算机科学》2003,30(8):176-177

LogP is becoming a practical parallel computation model that meets the demanding of parallel computers and parallel algorithms. So it is important to re-design parallel algorithms on the LogP model. This paper studies the parallel algorithm of computing converse matrix on the simplified LogP model, and gets the simulating results. 相似文献

20.

Competitive Implementation of Parallel Programs

X. Deng E. Koutsoupias P. MacKenzie 《Algorithmica》1999,23(1):14-30

We apply the methodology of competitive analysis of algorithms to the implementation of programs on parallel machines. We consider the problem of finding the best on-line distributed scheduling strategy that executes in parallel an unknown directed acyclic graph (dag) which represents the data dependency relation graph of a parallel program and which is revealed as execution proceeds. We study the competitive ratio of some important classes of dags assuming a fixed communication delay ratio τ that captures the average interprocessor communication measured in instruction cycles. We provide competitive algorithms for divide-and-conquer dags, trees, and general dags, when the number of processors depends on the size of the input dag and when the number of processors is fixed. Our major result is a lower bound Ω (τ / log τ ) of the competitive ratio for trees; it shows that it is impossible to design compilers that produce almost optimal execution code for all parallel programs. This fundamental result holds for almost any reasonable distributed memory parallel computation model, including the LogP and BSP model. Received March 5, 1996; revised March 11, 1997. 相似文献