期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

迟利华刘杰李晓梅胡庆丰《计算机研究与发展》1999,36(1):47-51

可扩展性是设计并行算法和高性能并行机所要考虑的一个重要问题。文中首先分析了等效率和等速度两种可扩展性评价准则,指出其优缺点,然后在分析并行计算时间的基础上提出一种新的可扩展性评价准则（等并行开销计算比可扩展性评价准则）,新准则可用来评价并行算法与并行机相结合的可扩展性。最后用该评价准则分析了两个并行算法与ＹＨ０３高性能并行机相结合的可扩展性。相似文献

2.

基于MPI的并行最大最小蚂蚁系统

下载免费PDF全文

刘彩云陈忠熊杰《计算机工程》2010,36(19):200-202

现有蚁群系统在求解大规模组合优化问题时所需的计算时间较长。针对该不足,提出基于消息传递接口的粗粒度异步协作并行最大最小蚂蚁系统,能在保证解质量的前提下,降低并行计算中的通信开销。在曙光4000L并行机上进行的数值实验结果表明,该系统具有较优的并行加速比和加速效率,且适合于大规模TSP问题的求解。相似文献

3.

江南Ⅲ型并行机上的并行程序设计试验

下载免费PDF全文

迟学斌《计算机系统应用》1994,(10)

1．引言江南Ⅲ型并行机是江南计算所与中科院计算所最近协作推出的一个具有局部内存和共享主存的多机系统，它的每个处理单元由Intel公司的i860组成。目前有十个处理单元，每个处理机上有16MB的内存。该机的共享主存有64MB。从存储量上来看，是求解大规模问题的理想机器。该系统还在不断完善中。今后将增加的有FORTRAN语言的直接并行实现、进程之间的同步控制等一些方便用户的软件工具、江南见并行机结构图如由于江南见并行机是一个具有局部内存和共享主存的并行计算机，算法设计要结合这一特点，设计出适合该机执行的并行算法。我们给… 相似文献

4.

有限元并行程序设计与实现 总被引：1，自引：0，他引：1

余天堂姜弘道《数值计算与计算机应用》2000,21(2):155-160

1．引言有限元并行计算的一个主要途径是利用子结构方法山;并行对各子结构进行静凝聚,再并行求解界面方程,然后并行回代求内点位移和计算应变、应力．并行程序的设计与有效实现强烈地依赖于并行机硬件的计算模型．网络并行计算由于具有巨大的计算潜能、良好的性能价格比和可扩展性,以及灵活的体系结构等优点,和以PVM,MPI,EXPRESSP[2,3]等为代表的一批基于消息传递的并行程序设计软件平台的出现,使得可伸缩分布式网络并行有限元成了有限元并行计算的一个重要方向．本文详细介绍了基于PVM的分布式网络并行环境下有限元并行分… 相似文献

5.

基于并行Benchmark的高性能机实用测试与评价方法

迟利华刘杰胡庆丰李晓梅《计算机工程与科学》2004,26(4):45-47

文中分析了传统基于并行基准测试程序测试方法的不足，针对目前高性能计算机处理机台数多的特点，给出了实用的测试方法，推广了传统加速比的概念，提出了一种可扩展性评价方法。使用文中给出的方法，可方便地对高性能计算机进行测试和性能评价，同时可以使用小规模的并行机测试结果预测大规模并行机的性能。最后给出了NPB在某高
性能计算机上的测试和性能评价结果。相似文献

6.

基于采样和MIMD结构的背包问题并行算法

下载免费PDF全文

刘晓玲李肯立郑光勇《计算机工程与科学》2006,28(9):100-102

背包问题属于著名的NP完全问题，在信息密码学和数论研究中有着极其重要的应用。在深入分析背包问题现有并行算法的基础上，本文提出了一种基于采样和MIMD结构的背包问题并行求解算法，并给出了算法性能的理论分析和在IBMP690超级计算机上的实验结果。实验结果表明，当背包实例的维数n≥40时，本算法的并行效率可达60％以上。因此，本并行算法具有较好的可扩展性，能应用于各种MIMD结构的并行机上有效地求解背包问题。相似文献

7.

并行流程车间调度问题及其概率学习进化算法 总被引：1，自引：0，他引：1

庞哈利万珊珊《控制理论与应用》2005,22(1):149-152

并行Flowshop调度问题兼有并行机器和流程车间调度问题的特点,是一类新型的调度问题.针对最小化最大完工时间目标函数,建立了一般并行Flowshop调度问题的整数规划模型.鉴于问题的求解复杂性,设计了基于概率学习的求解算法.对随机生成的测试问题进行求解,实验结果显示出该算法求解并行Flowshop调度问题的良好潜能. 相似文献

8.

有限差分法的并行化计算实现

王伟潘建伟《数字社区&智能家居》2008,(3):1339-1342

有限差分法是求解偏微分方程近似解的一种重要的数值方法。串行算法并不能高效的解决大规模复杂计算问题,并行化计算方法可提高复杂计算问题的效率．从而使并行机上计算有限差分问题成为可能。二维场中拉普拉斯方的差分程格式非常适合并行化方法的计算,将串行部分并行化以提高大规模计算的效率具有重要的现实意义。MPI（消息传递接口）是实现并行程序设计的标准之一。虚拟进程（MPI_PROC_NULL）的引用简化了MPI编程中的通信部分,串行算法可更改为并行化计算方法,最终实现有限差分方法的并行化计算。相似文献

9.

协同分布式图形硬件的混合并行体绘制

下载免费PDF全文

曹轶莫则尧王弘堃袁斌《中国图象图形学报》2008,13(7):1379-1384

由于一般的共享存储并行机缺乏图形硬件,其上产生的3维科学计算数据,无法采用硬件加速的并行体绘制来就地进行数据可视化。为此基于本地并行机和分布式图形工作站,给出了一种混合并行绘制模型。该模型的工作原理是先将源数据存留在并行机,然后通过并行机的多处理器发布远程绘制命令流,进而通过操控工作站的图形硬件完成绘制;后期图像合成在并行机上执行,以发挥共享存储通信优势。通过负载平衡优化,并行绘制流水线有效实现了绘制、合成与显示的重叠。实验结果显示,该方法能以1024×1024图像分辨率,交互绘制并行机上的大规模数据场。相似文献

10.

有限差分法的并行化计算实现

WANG Wei PAN Jian-wei 《数字社区&智能家居》2008,(7)

有限差分法是求解偏微分方程近似解的一种重要的数值方法。串行算法并不能高效的解决大规模复杂计算问题,并行化计算方法可提高复杂计算问题的效率,从而使并行机上计算有限差分问题成为可能。二维场中拉普拉斯方程的差分格式非常适合并行化方法的计算,将串行部分并行化以提高大规模计算的效率具有重要的现实意义。MPI(消息传递接口)是实现并行程序设计的标准之一。虚拟进程(MPI_PROC_NULL)的引用简化了MPI编程中的通信部分,串行算法可更改为并行化计算方法,最终实现有限差分方法的并行化计算。相似文献

11.

Analyzing composability of applications on MPSoC platforms

Akash Bart Bart Henk Yajun 《Journal of Systems Architecture》2008,54(3-4):369-383

Modern day applications require use of multi-processor systems for reasons of performance, scalability and power efficiency. As more and more applications are integrated in a single system, mapping and analyzing them on a multi-processor platform becomes a multi-dimensional problem. Each possible set of applications that can be concurrently active leads to a different use-case (also referred to as scenario) that the system has to be verified and tested for. Analyzing the feasibility and resource utilization of all possible use-cases becomes very demanding and often infeasible.

Therefore, in this paper, we highlight this issue of being able to analyze applications in isolation while still being able to reason about their overall behavior – also called composability. We make a number of novel observations about how arbitration plays an important role in system behavior. We compare two commonly used arbitration mechanisms, and highlight the properties that are important for such analysis. We conclude that none of these arbitration mechanisms provide the necessary features for analysis. They either suffer from scalability problems, or provide unreasonable estimates about performance, leading to waste of resources and/or undesirable performance.

We further propose to use a Resource Manager (RM) to ensure applications meet their performance requirements. The basic functionalities of such a component are introduced. A high-level simulation model is developed to study the performance of RM, and a case study is performed for a system running an H.263 and a JPEG decoder. The case study illustrates at what granularity of control a resource manager can effectively regulate the progress of applications such that they meet their performance requirements. 相似文献

12.

基于测量的接纳控制方案比较研究

马宏伟钱华林《小型微型计算机系统》2004,25(11):1938-1942

基于测量的连接接纳控制(Measurement—based Connection Admission Control，MBCAC)通过实时的业务流测量从而对新连接请求做出接受与否的决定．MBCAC无须事先了解业务流的流量模型并能根据通过测量得到的数据动态适应网络灸栽的变化，所以近来受到了重视．分析了与接纳控制有关的问题．提出了一种新的MBcAC分类方法，并通过实验比较了几种MBCAC方案的实现复杂性、可扩展性和带宽利用率等．同时，实验结果也显示出了通过测量汇集流的信息计算有效带宽的方法可以在保证服务质量(QoS)的前提下得到较高的网络资源(带宽)利用率．相似文献

13.

Dynamic core affinity for high-performance file upload on Hadoop Distributed File System

《Parallel Computing》2014,40(10):722-737

The MapReduce programming model, in which the data nodes perform both the data storing and the computation, was introduced for big-data processing. Thus, we need to understand the different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. In particular, the provision of high-performance data storing has become more critical because of the continuously increasing volume of data uploaded to distributed file systems and database servers. However, the analysis of the performance characteristics of the processes that store upstream data is very intricate, because both network and disk inputs/outputs (I/O) are heavily involved in their operations. In this paper, we analyze the impact of core affinity on both network and disk I/O performance and propose a novel approach for dynamic core affinity for high-throughput file upload. We consider the dynamic changes in the processor load and the intensiveness of the file upload at run-time, and accordingly decide the core affinity for service threads, with the objective of maximizing the parallelism, data locality, and resource efficiency. We apply the dynamic core affinity to Hadoop Distributed File System (HDFS). Measurement results show that our implementation can improve the file upload throughput of end applications by more than 30% as compared with the default HDFS, and provide better scalability. 相似文献

14.

Towards scalability collapse behavior on multicores

Yan Cui Yu Chen Yuanchun Shi 《Concurrency and Computation》2014,26(2):336-359

Multicore processor systems have become mainstream. To release the full potential of multiple cores, applications are programmed to be parallel to keep every core busy. Unfortunately, lock contention within operating systems can limit the scalability so seriously that use of more cores leads to reduced throughput (scalability collapse). To understand and characterize the collapse behavior easily, a discrete‐event simulation model, which considers both the sequential execution of critical sections and the overhead of hardware resource contention, is designed and implemented. By the use of the model, we observe that the percentage of time used to wait for locks and the number of tasks requesting for a lock have a significant correlation with the occurrence of scalability collapse. On the basis of these observations, two new techniques (lock contention aware scheduler and requester‐based adaptive lock) are proposed to remove the scalability collapse on multicores. The proposed methods are implemented in the Linux kernel 2.6.29.4 and evaluated on an AMD 32‐core system to verify their effectiveness. By using micro‐benchmarks and macro‐benchmarks, we find that these methods can remove scalability collapse totally for four of five workloads exhibiting the collapse behavior. For one workload that does not suffer scalability collapse, these proposed methods only introduce negligible overhead. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

15.

大规模并行应用程序的可扩展性研究 总被引：3，自引：0，他引：3

陈军莫则尧李晓梅袁国兴《计算机研究与发展》2000,37(11):1382-1388

为适应未来超大型并行计算,要求算法和应用程序必须具有良好的可扩展性,以往的可扩展性研究更强调于对算法的分析,而对于实际程序可扩展性低的原因很少进行深入探讨,不能有针对性地指导用户改进程序。现提出了数值可护展性和并行可扩展性。用来描述并行系统的数值性能和并行性能的扩展行为。并深入地讨论了数值可扩展性和并行可扩展性可能低的原因,提出了一套可扩展性评价准则。使用这套评价准则和近优可扩展性方法,对一个大规模应用程序--二维等离子体粒子云网格法并行程序进行了分析,结果表明这套可扩展性评价准则可以帮助定位引起可扩展性低的原因,同时也表明,对于实际的大规模应用,在已知小规模问题的执行信息下,近优可扩展性分析方法提供了一种预测更大规模的问题在多少台处理机上运行更合理的途径。这里的“合理”,指的是时间接近最短时间而效率有较大提高。相似文献

16.

分布式系统可伸缩性研究综述 总被引：1，自引：0，他引：1

陈斌白晓颖马博黄俊飞《计算机科学》2011,38(8):17-24

可伸缩性(Scalabifity)反映了系统可随系统需求和资源变化,持续满足性能需求的能力。在不同的场景下,可伸缩性的基本定义和度量方法能够通过不同的角度进行理解和表达。根据系统需求和运行状态,改变可用资源数量以及任务调度方式,动态调整系统性能,是系统可伸缩性实现的主要途径。分布式资源管理系统可伸缩性设计的关键技术可以从并行任务调度和分布式系统框架两个方面进行分析。可伸缩性测试是检测和评价系统性能的主要依据,并行代码测试以及可伸缩性测试系统设计的主要方法是测试技术的两个重要组成部分。随着软件范型的发展变化,软件的部署和提供逐步向基于开放、共享虚拟化资源管理平台的在线服务方式的转变,可伸缩性已成为云计算背景下软件服务的重要性能指标,进一步探讨可伸缩性在新的软件范型下所面临的挑战性问题是可伸缩性研究的新方向。相似文献

17.

近优可扩展性：一种实用的可扩展性度量 总被引：2，自引：0，他引：2

陈军李晓梅《计算机学报》2001,24(2):179-182

良好的可扩展性是并行算法和并行机设计人员追求的一项重要性能指标,以往的可扩展模型都只是孤立地考虑了问题的某个侧面,比如某种性能或最大可利用资源,而没有从整体上进行权衡。这些可扩展模型可以满足计算机研究人员的需要,因为他们关注于更高的效率和利用率。但应用科学家更强调短小的执行时间。文中提出的近优可扩展模型,它同时考虑了并行系统的效率和执行两个因素。在一个典型MPP上的两个算法实例分析表明,该可扩展模型不仅可以描述并行算法的可扩展能力,而且,当按照适当的可扩展曲线扩展时,可以使得执行时间接近量短,而效率不低,这对算法和并行机的最优匹配有指导作用,同时有益于并行算法设计和改进。相似文献

18.

A simulator for adaptive parallel applications

Basile Schaeli Sebastian Gerlach Roger D. Hersch 《Journal of Computer and System Sciences》2008,74(6):983-999

Dynamically allocating computing nodes to parallel applications is a promising technique for improving the utilization of cluster resources. Detailed simulations can help identify allocation strategies and problem decomposition parameters that increase the efficiency of parallel applications. We describe a simulation framework supporting dynamic node allocation which, given a simple cluster model, predicts the running time of parallel applications taking CPU and network sharing into account. Simulations can be carried out without needing to modify the application code. Thanks to partial direct execution, simulation times and memory requirements are reduced. In partial direct execution simulations, the application's parallel behavior is retrieved via direct execution, and the duration of individual operations is obtained from a performance prediction model or from prior measurements. Simulations may then vary cluster model parameters, operation durations and problem decomposition parameters to analyze their impact on the application performance and identify the limiting factors. We implemented the proposed techniques by adding direct execution simulation capabilities to the Dynamic Parallel Schedules parallelization framework. We introduce the concept of dynamic efficiency to express the resource utilization efficiency as a function of time. We verify the accuracy of our simulator by comparing the effective running time, respectively the dynamic efficiency, of parallel program executions with the running time, respectively the dynamic efficiency, predicted by the simulator under different parallelization and dynamic node allocation strategies. 相似文献

19.

A cloudification methodology for multidimensional analysis: Implementation and application to a railway power simulator

《Simulation Modelling Practice and Theory》2015

Many scientific areas make extensive use of computer simulations to study complex real-world processes. These computations are typically very resource-intensive and present scalability issues as experiments get larger even in dedicated clusters, since these are limited by their own hardware resources. Cloud computing raises as an option to move forward into the ideal unlimited scalability by providing virtually infinite resources, yet applications must be adapted to this new paradigm. This process of converting and/or migrating an application and its data in order to make use of cloud computing is sometimes known as cloudifying the application. We propose a generalist cloudification method based in the MapReduce paradigm to migrate scientific simulations into the cloud to provide greater scalability. We analysed its viability by applying it to a real-world railway power consumption simulatior and running the resulting implementation on Hadoop YARN over Amazon EC2. Our tests show that the cloudified application is highly scalable and there is still a large margin to improve the theoretical model and its implementations, and also to extend it to a wider range of simulations. We also propose and evaluate a multidimensional analysis tool based on the cloudified application. It generates, executes and evaluates several experiments in parallel, for the same simulation kernel. The results we obtained indicate that out methodology is suitable for resource intensive simulations and multidimensional analysis, as it improves infrastructure’s utilization, efficiency and scalability when running many complex experiments. 相似文献

20.

数值并行计算可扩展性评价与测试 总被引：3，自引：1，他引：2

迟利华刘杰胡庆丰《计算机研究与发展》2005,42(6):1073-1078

分析了几种可扩展性能评价模型存在的问题,针对实际评价与测试的需要,提出了一种基于等平均负载的数值并行计算可扩展性评价模型．该评价模型对可扩展性能加速比和可扩展性进行了重新定义,给出了使用该模型的进行可扩展加速比和可扩展性测试的方法,结合曲线拟合或并行计算时间模型可以预测并行系统的可扩展性,对NPB BT,SP和矩阵乘法进行了可扩展性预测．相似文献