首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
由于线性规划在理论和实践中的重要性,对求解大规模规划问题并行算法的研究已引起许多学者的兴趣.本文根据Galperin提出的线性规划的一种线性时间的立方算法特别适合并行的特点,提出了一种基于SPMD模型和主从式MPI的线性规划并行算法,并对算法性能进行了深入分析,理论分析和在曙光3000上的实验结果表明:该算法具有粗粒度并行、良好的可扩展性和理想加速比模型等优点,明显优于目前为止求解同类不对称线性规划问题的其他并行算法,可用于求解此类大规模线性规划问题的高性能计算.  相似文献   

2.
可预测扩展并行性能的并行程序设计模型   总被引:1,自引:0,他引:1  
BSP(Bulk-Synchronous)模型是独立于并行体系结构的,即可作为并行计算模型又可看作并地程序设计模型,该模型使程序员在算法设计阶段和编程调试阶段可精确地分析和预测并行程序性能。BSP程序可移植性强,可在多种并行系统发PVM,MPI等上实现。  相似文献   

3.
一种分布对象的并行程序设计框架   总被引:1,自引:0,他引:1  
王晨  周颖  张德富 《软件学报》1999,10(3):310-316
计算性能和合成性能对于基于工作站网的软件十分重要,但由于缺乏相应的开发环境,现在这类软件在这两方面还做得很不够,尤其是合成性能十分薄弱.该文提出并实现了一种基于分布对象的并行程序设计框架,力图使分布对象能提供高性能的并行计算服务,同时也使并行算法获得一种良好的封装和复用机制.经过一些并行算法的测试,表明该框架具有实用价值.  相似文献   

4.
Parallel computers have not yet had the expected impact on mainstream computing. Parallelism adds a level of complexity to the programming task that makes it very error-prone. Moreover, a large variety of very different parallel architectures exists. Porting an implementation from one machine to another may require substantial changes. This paper addresses some of these problems by developing a formal basis for the design of parallel programs in the form of a refinement calculus. The calculus allows the stepwise formal derivation of an abstract, low-level implementation from a trusted, high-level specification. The calculus thus helps structuring and documenting the development process. Portability is increased, because the introduction of a machine-dependent feature can be located in the refinement tree. Development efforts above this point in the tree are independent of that feature and are thus reusable. Moreover, the discovery of new, possibly more efficient solutions is facilitated. Last but not least, programs are correct by construction, which obviates the need for difficult debugging. Our programming/specification notation supports fair parallelism, shared-variable and message-passing concurrency, local variables and channels. The calculus rests on a compositional trace semantics that treats shared-variable and message-passing concurrency uniformly. The refinement relation combines a context-sensitive notion of trace inclusion and assumption-commitment reasoning to achieve compositionality. The calculus straddles both concurrency paradigms, that is, a shared-variable program can be refined into a distributed, message-passing program and vice versa. Received July 2001 / Accepted in revised form May 2002  相似文献   

5.
6.
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(Nα), where 2<α3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using Nα/log N processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1pNα/log N, multiplying two N×N matrices can be performed by a DMPC with p processors in O(Nα/p) time, i.e., linear speedup and cost optimality can be achieved in the range [1..Nα/log N]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non- standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. For instance, for all 1p Nα /log N, multiplying two N×N matrices can be performed by p processors connected by a hypercubic network in O(Nα/p+(N2/p2/α)(log p)2(α−1)/α) time, which implies that if p=O(Nα/(log N)2(α−1)/(α−2)), linear speedup can be achieved. Such a parallelization is highly scalable. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically.  相似文献   

7.
Parallel logic programming (PLP) systems are sophisticated examples of symbolic computing systems. PLP systems address problems such as allocating dynamic memory, scheduling irregular computations, and managing different types of implicit parallelism. Most PLP systems have been developed for bus-based architectures. However, the complexity of PLP systems and the large amount of data they process raise the question of whether logic programming systems can still achieve good performance on modern scalable architectures, such as distributed shared-memory (DSM) systems. In this work we use execution-driven simulation of a cache-coherent DSM architecture to investigate the performance of Andorra-I, a state-of-the-art PLP system, on a modern multiprocessor. The results of this simulation show that Andorra-I exhibits reasonable running time performance, but it does not scale well. Our detailed analysis of cache misses and their sources expose several opportunities for improvements in Andorra-I. Based on this analysis, we modify Andorra-I using a set of simple techniques that led to significantly better running time and scalability. These results suggest that Andorra-I can and should perform well on modern multiprocessors. Furthermore, as Andorra-I shares its main data structures with several PLP systems, we conclude that the methodology and techniques used in our work can greatly benefit these other PLP systems.  相似文献   

8.
分布式并行程序设计范例   总被引:1,自引:0,他引:1  
本文介绍了几种分布式并行程序设计范例:计算-汇集-广播,分而治之,流水线,处理机场等。人们可参考上述范例,进行并行程序设计。  相似文献   

9.
CC$是一种并行编程语言,目的是解决分布式众核并行计算机的编程困难。CC$的编程模型以Multi BSP 模型为基础,将分布式众核并行计算机的硬件架构抽象为3层。数据按照存储的层次和共享范围分为5类,以便在不 同层次上提供共享。LL$还提出一类虚拟指令来解决不同层次之间的数据交换,实现数据访问的逻辑化描述。并行 程序按照3层Multi BSP超步嵌套执行。CC$具有统一的编程风格、内建的多层会共地址空间、数据访问请求的表达 式描述和数据传输编译优化4大特点。测试表明,CC$程序的运行效率高,易学易用,大幅地缩短了开发周期。  相似文献   

10.
面向层次化NoC的混合并行编程模型   总被引:1,自引:0,他引:1       下载免费PDF全文
曹祥  易伟  潘红兵  高明伦  李丽 《计算机工程》2010,36(13):278-280
为更好发挥多核处理器的硬件性能,针对层次化的片上网络架构,提出MPI/OpenMP混合并行编程模型。运用基于MPI的任务级并行模型实现片内簇间的高效通信,采用OpenMP模型实现簇内四核的通信、同步和数据交换。实验结果表明,与单一并行编程模型相比,混合并行编程模型加速比提高了20%~50%。  相似文献   

11.
亢良伊  王建飞  刘杰  叶丹 《软件学报》2018,29(1):109-130
机器学习问题通常会转换成一个目标函数去求解,优化算法是求解目标函数中参数的重要工具.在大数据环境下,需要设计并行与分布式的优化算法,通过多核计算和分布式计算技术来加速训练过程.近年来,该领域涌现了大量研究工作,部分算法也在各机器学习平台得到广泛应用.本文针对梯度下降算法、二阶优化算法、邻近梯度算法、坐标下降算法、交替方向乘子算法五类最常见的优化方法展开研究,每一类算法分别从单机并行和分布式并行来分析相关研究成果,并从模型特性、输入数据特性、算法评价、并行计算模型等角度对每个算法进行详细对比.随后对有代表性的可扩展机器学习平台中优化算法的实现和应用情况进行对比分析.同时对本文中介绍的所有优化算法进行多层次分类,方便用户根据目标函数类型选择合适的优化算法,也可以通过该多层次分类图交叉探索如何将优化算法应用到新的目标函数类型.最后分析了现有优化算法存在的问题,提出可能的解决思路,并对未来研究方向进行展望.  相似文献   

12.
Message Passing (MP) and Distributed Shared Memory (DSM) are the two most common approaches to distributed parallel computing. MP is difficult to use, whereas DSM is not scalable. Performance scalability and ease of programming can be achieved at the same time by using navigational programming (NavP). This approach combines the advantages of MP and DSM, and it balances convenience and flexibility. Similar to MP, NavP suggests to its programmers the principle of pivot-computes and hence is efficient and scalable. Like DSM, NavP supports incremental parallelization and shared variable programming and is therefore easy to use. The implementation and performance analysis of real-world algorithms, namely parallel Jacobi iteration and parallel Cholesky factorization, presented in this paper supports the claim that the NavP approach is better suited for general-purpose parallel distributed programming than either MP or DSM.  相似文献   

13.
In scalable concurrent architectures, the performance of a parallel algorithm depends on the resource management policies used. Such policies determine, for example, how data is partitioned and distributed and how processes are scheduled. In particular, the performance of a parallel algorithm obtained by using a particular policy can be affected by increasing the size of the architecture or the input. In order to support scalability, we are developing a methodology for modular specification of partition and distribution strategies (PDSs). As a consequence, a PDS may be changed without modifying the code specifying the logic of a parallel algorithm. We illustrate our methodology for parallel algorithms that use dynamic data structures.  相似文献   

14.
分析了基于CMP节点的混合并行计算系统并行处理模式,基于分层次的自顶向下、逐步细化的思想,设计并实现了面向此类混合并行计算系统的编程环境,从而大大降低程了序设计人员在此类环境下编写并行应用程序的复杂度,减少了程序编码错误,提高了编程人员的效率。  相似文献   

15.
分布式系统并行编程的简单方法   总被引:1,自引:0,他引:1  
迟学斌 《软件学报》1998,9(2):107-110
本文以并行矩阵乘法为例子,给出在PVM, EXPRESS以及NX环境上的并行程序实现.通过对这个简单例子的分析,归纳出在一般情况下并行程序设计应采取的策略.这里介绍的并行程序设计方法,特别对大型应用问题的并行移植是非常有用的,其目的是为想写并行程序的科研人员提供参考.  相似文献   

16.
本文概述了分布式并行系统和分布式共享存储器的一般概念,讨论了使用共享对象和可靠广播的并行程序设计模型,最后给出了我们的改进模型。  相似文献   

17.
Much prior work in AI on various attempts to speed up rule-based systems by parallel processing has been reported. Unfortunately, many of these results indicate that there is limited parallelism to be found when rules are applied to relatively small amounts of data. Thus, one can predict that much greater parallelism can be extracted when rules are applied to large amounts of data. However, traditional compile-time parallelization strategies as developed for main-memory based systems do not scale to large databases. We propose a scalable strategy for the efficient parallel implementation of rule-based systems operating upon large databases. We concentrate on load balancing techniques in a synchronous model of rule execution, where the variance in runtime of the distributed sites is minimized per cycle of rule processing, thus increasing utilization and speedup. We demonstrate that static load balancing techniques are insufficient, and thus low overhead dynamic load balancing is the key to successful scaling. We present a form of dynamic load balancing that is based upon predicting future system loads, rather than conventional demand-driven approaches that monitor current system state. We analyze a number of possible predictive dynamic load balancing protocols by isoefficiency analysis to guide the design of a parallel database rule processing system.  相似文献   

18.
MapReduce:新型的分布式并行计算编程模型   总被引:3,自引:0,他引:3  
MapReduce是Google提出的分布式并行计算编程模型,用于大规模数据的并行处理。Ma-pReduce模型受函数式编程语言的启发,将大规模数据处理作业拆分成若干个可独立运行的Map任务,分配到不同的机器上去执行,生成某种格式的中间文件,再由若干个Reduce任务合并这些中间文件获得最后的输出文件。用户在使用MapReduce模型进行大规模数据处理时,可以将主要精力放在如何编写Map和Reduce函数上,其它并行计算中的复杂问题诸如分布式文件系统、工作调度、容错、机器间通信等都交给MapReduce系统处理,在很大程度上降低了整个编程难度。MapReduce日益成为云计算平台的主流编程模型。Apache Hadoop项目提供开源的MapReduce系统还有待进一步完善。  相似文献   

19.
This paper describes dSTEP, a directive-based programming model for hybrid shared and distributed memory machines. The originality of our work is the definition and an implementation of a unified high-level programming model addressing both data and computation distributions, providing a particularly fine control of the computation. The goal is to improve the programmer productivity while providing good performances in terms of execution time and memory usage. We define a generic compilation scheme for computation mapping and communication generation. We implement the solution in a source-to-source compiler together with a runtime library. We provide a series of optimizations to improve the performance of the generated code, with a special focus on reducing the communications time. We evaluate our solution on several scientific kernels as well as on the more challenging NAS BT benchmark, and compare our results with the hand written Fortran MPI and UPC implementations. The results show first that our solution allows to make explicit the non trivial parallel execution of the NAS BT benchmark using the dSTEP directives. Second, the results show that our generated MPI+OpenMP BT program runs with a 83.35 speedup over the original NAS OpenMP C benchmark on a hybrid cluster composed of 64 quadricores (256 cores). Overall, our solution dramatically reduces the programming effort while providing good time execution and memory usage performances. This programming model is suitable for a large variety of machines as multi-core and accelerator clusters.  相似文献   

20.
International Journal of Parallel Programming - The Divide-and-conquer (D&amp;C) pattern appears in a large number of problems and is highly suitable to exploit parallelism. This has led to...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号