期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李庆华李肯立王多强熊家军《小型微型计算机系统》2003,24(9):1718-1721

由于线性规划在理论和实践中的重要性，对求解大规模规划问题并行算法的研究已引起许多学者的兴趣．本文根据Galperin提出的线性规划的一种线性时间的立方算法特别适合并行的特点，提出了一种基于SPMD模型和主从式MPI的线性规划并行算法，并对算法性能进行了深入分析，理论分析和在曙光3000上的实验结果表明：该算法具有粗粒度并行、良好的可扩展性和理想加速比模型等优点，明显优于目前为止求解同类不对称线性规划问题的其他并行算法，可用于求解此类大规模线性规划问题的高性能计算．相似文献

2.

可预测扩展并行性能的并行程序设计模型 总被引：1，自引：0，他引：1

陈昌生孙永强《计算机工程》1999,25(6):7-8

ＢＳＰ（Ｂｕｌｋ－Ｓｙｎｃｈｒｏｎｏｕｓ）模型是独立于并行体系结构的,即可作为并行计算模型又可看作并地程序设计模型,该模型使程序员在算法设计阶段和编程调试阶段可精确地分析和预测并行程序性能。ＢＳＰ程序可移植性强,可在多种并行系统发ＰＶＭ,ＭＰＩ等上实现。相似文献

3.

一种分布对象的并行程序设计框架 总被引：1，自引：0，他引：1

王晨周颖张德富《软件学报》1999,10(3):310-316

计算性能和合成性能对于基于工作站网的软件十分重要,但由于缺乏相应的开发环境,现在这类软件在这两方面还做得很不够,尤其是合成性能十分薄弱.该文提出并实现了一种基于分布对象的并行程序设计框架,力图使分布对象能提供高性能的并行计算服务,同时也使并行算法获得一种良好的封装和复用机制.经过一些并行算法的测试,表明该框架具有实用价值. 相似文献

4.

A Refinement Calculus for Shared-Variable Parallel and Distributed Programming 总被引：2，自引：1，他引：1

J. Dingel 《Formal Aspects of Computing》2002,14(2):123-197

Parallel computers have not yet had the expected impact on mainstream computing. Parallelism adds a level of complexity to the programming task that makes it very error-prone. Moreover, a large variety of very different parallel architectures exists. Porting an implementation from one machine to another may require substantial changes. This paper addresses some of these problems by developing a formal basis for the design of parallel programs in the form of a refinement calculus. The calculus allows the stepwise formal derivation of an abstract, low-level implementation from a trusted, high-level specification. The calculus thus helps structuring and documenting the development process. Portability is increased, because the introduction of a machine-dependent feature can be located in the refinement tree. Development efforts above this point in the tree are independent of that feature and are thus reusable. Moreover, the discovery of new, possibly more efficient solutions is facilitated. Last but not least, programs are correct by construction, which obviates the need for difficult debugging. Our programming/specification notation supports fair parallelism, shared-variable and message-passing concurrency, local variables and channels. The calculus rests on a compositional trace semantics that treats shared-variable and message-passing concurrency uniformly. The refinement relation combines a context-sensitive notion of trace inclusion and assumption-commitment reasoning to achieve compositionality. The calculus straddles both concurrency paradigms, that is, a shared-variable program can be refined into a distributed, message-passing program and vice versa. Received July 2001 / Accepted in revised form May 2002 相似文献

5.

Visual Methods for Parallel and Distributed Programming

《Journal of Visual Languages and Computing》2001,12(2):123-125

相似文献

6.

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers 总被引：1，自引：0，他引：1

Keqin Li 《Journal of Parallel and Distributed Computing》2001,61(12):1709

Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N^α), where 2<α3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using N^α/log N processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1pN^α/log N, multiplying two N×N matrices can be performed by a DMPC with p processors in O(N^α/p) time, i.e., linear speedup and cost optimality can be achieved in the range [1..N^α/log N]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non- standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. For instance, for all 1p N^α /log N, multiplying two N×N matrices can be performed by p processors connected by a hypercubic network in O(N^α/p+(N²/p^2/α)(log p)^2(α−1)/α) time, which implies that if p=O(N^α/(log N)^{2(α−1)/(α−2)}), linear speedup can be achieved. Such a parallelization is highly scalable. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically. 相似文献

7.

Parallel Logic Programming Systems on Scalable Architectures

《Journal of Parallel and Distributed Computing》2000,60(7):835-852

Parallel logic programming (PLP) systems are sophisticated examples of symbolic computing systems. PLP systems address problems such as allocating dynamic memory, scheduling irregular computations, and managing different types of implicit parallelism. Most PLP systems have been developed for bus-based architectures. However, the complexity of PLP systems and the large amount of data they process raise the question of whether logic programming systems can still achieve good performance on modern scalable architectures, such as distributed shared-memory (DSM) systems. In this work we use execution-driven simulation of a cache-coherent DSM architecture to investigate the performance of Andorra-I, a state-of-the-art PLP system, on a modern multiprocessor. The results of this simulation show that Andorra-I exhibits reasonable running time performance, but it does not scale well. Our detailed analysis of cache misses and their sources expose several opportunities for improvements in Andorra-I. Based on this analysis, we modify Andorra-I using a set of simple techniques that led to significantly better running time and scalability. These results suggest that Andorra-I can and should perform well on modern multiprocessors. Furthermore, as Andorra-I shares its main data structures with several PLP systems, we conclude that the methodology and techniques used in our work can greatly benefit these other PLP systems. 相似文献

8.

分布式并行程序设计范例 总被引：1，自引：0，他引：1

倪南吴巧泉《计算机研究与发展》1994,31(2):5-9

本文介绍了几种分布式并行程序设计范例：计算－汇集－广播，分而治之，流水线，处理机场等。人们可参考上述范例，进行并行程序设计。相似文献

9.

CC$:一种面向分布式众核平台的并行编程语言

吴峻峰许跃生张永东江颖叶纬材《计算机科学》2013,40(3):128-132

CC$是一种并行编程语言,目的是解决分布式众核并行计算机的编程困难。CC$的编程模型以Multi BSP 模型为基础,将分布式众核并行计算机的硬件架构抽象为3层。数据按照存储的层次和共享范围分为5类,以便在不同层次上提供共享。LL$还提出一类虚拟指令来解决不同层次之间的数据交换,实现数据访问的逻辑化描述。并行程序按照3层Multi BSP超步嵌套执行。CC$具有统一的编程风格、内建的多层会共地址空间、数据访问请求的表达式描述和数据传输编译优化4大特点。测试表明,CC$程序的运行效率高,易学易用,大幅地缩短了开发周期。相似文献

10.

面向层次化NoC的混合并行编程模型 总被引：1，自引：0，他引：1

下载免费PDF全文

曹祥易伟潘红兵高明伦李丽《计算机工程》2010,36(13):278-280

为更好发挥多核处理器的硬件性能,针对层次化的片上网络架构,提出MPI/OpenMP混合并行编程模型。运用基于MPI的任务级并行模型实现片内簇间的高效通信,采用OpenMP模型实现簇内四核的通信、同步和数据交换。实验结果表明,与单一并行编程模型相比,混合并行编程模型加速比提高了20%~50%。相似文献

11.

可扩展机器学习的并行与分布式优化算法综述

亢良伊王建飞刘杰叶丹《软件学报》2018,29(1):109-130

机器学习问题通常会转换成一个目标函数去求解,优化算法是求解目标函数中参数的重要工具.在大数据环境下,需要设计并行与分布式的优化算法,通过多核计算和分布式计算技术来加速训练过程.近年来,该领域涌现了大量研究工作,部分算法也在各机器学习平台得到广泛应用.本文针对梯度下降算法、二阶优化算法、邻近梯度算法、坐标下降算法、交替方向乘子算法五类最常见的优化方法展开研究,每一类算法分别从单机并行和分布式并行来分析相关研究成果,并从模型特性、输入数据特性、算法评价、并行计算模型等角度对每个算法进行详细对比.随后对有代表性的可扩展机器学习平台中优化算法的实现和应用情况进行对比分析.同时对本文中介绍的所有优化算法进行多层次分类,方便用户根据目标函数类型选择合适的优化算法,也可以通过该多层次分类图交叉探索如何将优化算法应用到新的目标函数类型.最后分析了现有优化算法存在的问题,提出可能的解决思路,并对未来研究方向进行展望. 相似文献

12.

Distributed Parallel Computing Using Navigational Programming

Lei Pan Ming Kin Lai Koji Noguchi Javid J. Huseynov Lubomir F. Bic Michael B. Dillencourt 《International journal of parallel programming》2004,32(1):1-37

Message Passing (MP) and Distributed Shared Memory (DSM) are the two most common approaches to distributed parallel computing. MP is difficult to use, whereas DSM is not scalable. Performance scalability and ease of programming can be achieved at the same time by using navigational programming (NavP). This approach combines the advantages of MP and DSM, and it balances convenience and flexibility. Similar to MP, NavP suggests to its programmers the principle of pivot-computes and hence is efficient and scalable. Like DSM, NavP supports incremental parallelization and shared variable programming and is therefore easy to use. The implementation and performance analysis of real-world algorithms, namely parallel Jacobi iteration and parallel Cholesky factorization, presented in this paper supports the claim that the NavP approach is better suited for general-purpose parallel distributed programming than either MP or DSM. 相似文献

13.

A Methodology for Programming Scalable Architectures

《Journal of Parallel and Distributed Computing》1994,22(3):479-487

In scalable concurrent architectures, the performance of a parallel algorithm depends on the resource management policies used. Such policies determine, for example, how data is partitioned and distributed and how processes are scheduled. In particular, the performance of a parallel algorithm obtained by using a particular policy can be affected by increasing the size of the architecture or the input. In order to support scalability, we are developing a methodology for modular specification of partition and distribution strategies (PDSs). As a consequence, a PDS may be changed without modifying the code specifying the logic of a parallel algorithm. We illustrate our methodology for parallel algorithms that use dynamic data structures. 相似文献

14.

面向混合并行计算系统编程环境的研究与实现

武华北孙济洲王文义《计算机科学》2010,37(4):143

分析了基于CMP节点的混合并行计算系统并行处理模式,基于分层次的自顶向下、逐步细化的思想,设计并实现了面向此类混合并行计算系统的编程环境,从而大大降低程了序设计人员在此类环境下编写并行应用程序的复杂度,减少了程序编码错误,提高了编程人员的效率。相似文献

15.

分布式系统并行编程的简单方法 总被引：1，自引：0，他引：1

迟学斌《软件学报》1998,9(2):107-110

本文以并行矩阵乘法为例子,给出在PVM, EXPRESS以及NX环境上的并行程序实现.通过对这个简单例子的分析,归纳出在一般情况下并行程序设计应采取的策略.这里介绍的并行程序设计方法,特别对大型应用问题的并行移植是非常有用的,其目的是为想写并行程序的科研人员提供参考. 相似文献

16.

基于分布式共享存储器的并行程序设计

袁道华《计算机研究与发展》1994,31(9):56-60

本文概述了分布式并行系统和分布式共享存储器的一般概念，讨论了使用共享对象和可靠广播的并行程序设计模型，最后给出了我们的改进模型。相似文献

17.

Scalable Parallel and Distributed Expert Database Systems with Predictive Load Balancing

《Journal of Parallel and Distributed Computing》1994,22(3):506-522

Much prior work in AI on various attempts to speed up rule-based systems by parallel processing has been reported. Unfortunately, many of these results indicate that there is limited parallelism to be found when rules are applied to relatively small amounts of data. Thus, one can predict that much greater parallelism can be extracted when rules are applied to large amounts of data. However, traditional compile-time parallelization strategies as developed for main-memory based systems do not scale to large databases. We propose a scalable strategy for the efficient parallel implementation of rule-based systems operating upon large databases. We concentrate on load balancing techniques in a synchronous model of rule execution, where the variance in runtime of the distributed sites is minimized per cycle of rule processing, thus increasing utilization and speedup. We demonstrate that static load balancing techniques are insufficient, and thus low overhead dynamic load balancing is the key to successful scaling. We present a form of dynamic load balancing that is based upon predicting future system loads, rather than conventional demand-driven approaches that monitor current system state. We analyze a number of possible predictive dynamic load balancing protocols by isoefficiency analysis to guide the design of a parallel database rule processing system. 相似文献

18.

MapReduce:新型的分布式并行计算编程模型 总被引：3，自引：0，他引：3

李成华张新访金海向文《计算机工程与科学》2011,33(3):129

MapReduce是Google提出的分布式并行计算编程模型,用于大规模数据的并行处理。Ma-pReduce模型受函数式编程语言的启发,将大规模数据处理作业拆分成若干个可独立运行的Map任务,分配到不同的机器上去执行,生成某种格式的中间文件,再由若干个Reduce任务合并这些中间文件获得最后的输出文件。用户在使用MapReduce模型进行大规模数据处理时,可以将主要精力放在如何编写Map和Reduce函数上,其它并行计算中的复杂问题诸如分布式文件系统、工作调度、容错、机器间通信等都交给MapReduce系统处理,在很大程度上降低了整个编程难度。MapReduce日益成为云计算平台的主流编程模型。Apache Hadoop项目提供开源的MapReduce系统还有待进一步完善。相似文献

19.

Combining Data and Computation Distribution Directives for Hybrid Parallel Programming : A Transformation System

Rachid Habel Frédérique Silber-Chaussumier François Irigoin Elisabeth Brunet François Trahay 《International journal of parallel programming》2016,44(6):1268-1295

This paper describes dSTEP, a directive-based programming model for hybrid shared and distributed memory machines. The originality of our work is the definition and an implementation of a unified high-level programming model addressing both data and computation distributions, providing a particularly fine control of the computation. The goal is to improve the programmer productivity while providing good performances in terms of execution time and memory usage. We define a generic compilation scheme for computation mapping and communication generation. We implement the solution in a source-to-source compiler together with a runtime library. We provide a series of optimizations to improve the performance of the generated code, with a special focus on reducing the communications time. We evaluate our solution on several scientific kernels as well as on the more challenging NAS BT benchmark, and compare our results with the hand written Fortran MPI and UPC implementations. The results show first that our solution allows to make explicit the non trivial parallel execution of the NAS BT benchmark using the dSTEP directives. Second, the results show that our generated MPI+OpenMP BT program runs with a 83.35 speedup over the original NAS OpenMP C benchmark on a hybrid cluster composed of 64 quadricores (256 cores). Overall, our solution dramatically reduces the programming effort while providing good time execution and memory usage performances. This programming model is suitable for a large variety of machines as multi-core and accelerator clusters. 相似文献

20.

A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems

Martínez Millán A. Fraguela Basilio B. Cabaleiro José C. 《International journal of parallel programming》2021,49(6):820-845

International Journal of Parallel Programming - The Divide-and-conquer (D&C) pattern appears in a large number of problems and is highly suitable to exploit parallelism. This has led to... 相似文献