首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
刘雷  李晶  陈莉  冯晓兵 《计算机工程》2014,(3):99-102,112
投机并行化是解决遗留串行代码并行化的重要技术,但以往投机并行化运行时系统面临着诸多的性能问题,如任务分配不均衡、通信频繁、冲突代价高,以及进程启动,结柬频繁而导致开销过高等。为此,提出一种基于进程实现的投机并行化运行时系统。采用隐式单程序多数据的并行任务划分和执行模式。通过实现重甩进程的投机任务调度策略和委托正确性检查技术,降低投机进程启动/结束和通信的开销,提高投机进程的利用率,同时利用守护进程与投机进程协同执行的方式,确保在投机进程出现异常情况时程序也能正确执行。实验结果表明,该基于进程实现的投机运行时系统比同类型系统的性能提高231%。  相似文献   

2.
同时多线程(SMT)能在同一时钟周期执行不同线程的指令,同时开发了指令级并行(ILP)和线程级并行(TLP)。显式并行指令计算(EPIC)关注于编译器和硬件的相互协作。在本文中,我们设计和实现了一套并行环境,其中包括并行编译器OpenUH和基于IA-64的同时多线程体系结构EDSMT,并通过NAS并行测试程序作出了性能评测。  相似文献   

3.
同时多线程能在同一时钟周期执行不同线程的指令,并且指令级并行和线程级并行。显式并行指令计算关注于编译器和硬件的相互协作。寄存器文件的设计在高性能处理器设计中十分重要,寄存器栈和寄存器栈引擎是提高其性能的重要手段。该文设计和实现一套并行环境,其中包括并行编译器OpenUH和基于IA-64的同时多线程体系结构EDSMT,实验表明,该并行架构适用于大多数并行应用,针对NAS的并行测试程序,该架构相对于SMTSIM平均有12.48%的性能提升。  相似文献   

4.
主要对千兆通讯的网络处理芯片IXP1200网络处理器进行研究和分析,着重探讨和研究其先进的多级并行设计机制.主要从体系结构和并行设计技术两个角度对IXP1200网络处理器的数控分层和多层次并行等设计机制进行了介绍.突出了其利用多线程、多处理器的先进设计结构来优化设计、提高处理速度的设计理念和实现过程,并在最后进一步详细讨论了如何利用特定微码指令来实现IXP1200网络处理器的指令并行和多线程并行的程序调度方法和设计技术.  相似文献   

5.
为了充分利用多核CPU来实现动态二进制翻译的并行化,研究了用多线程将翻译阶段和执行阶段并行优化的方法,提供了并行化系统的程序流程。并根据翻译与执行的时序及相关性,设计实现了一种超前翻译算法,它能够有效预测程序的执行路径,为翻译过程提供导向作用。实验结果表明,该优化方法提高了翻译缓存中基本块的命中率,使执行阶段尽量不被中断,进而提升了执行效率。  相似文献   

6.
针对投机并行化中如何权衡策略并确定合适的执行模型来获取理想性能的问题,提出了一种基于交互信息的投机并行化方法,利用交互信息来确定投机并行化的执行模型,建立相关评价模型,并着重从线程抽取创建角度提出了相应的策略及对应的性能评价。通过实验表明,基于交互信息进行“按需”并行化,可以达到所需的性能要求。  相似文献   

7.
主要对千兆通讯的网络处理芯片IXP1200网络处理器进行研究和分析,着重探讨和研究其先进的多级并行设计机制。主要从体系结构和并行设计技术两个角度对IXP1200网络处理器的数控分层和多层次并行等设计机制进行了介绍。突出了其利用多线程、多处理器的先进设计结构来优化设计、提高处理速度的设计理念和实现过程,并在最后进一步详细讨论了如何利用特定微码指令来实现IXP1200网络处理器的指令并行和多线程并行的程序调度方法和设计技术。  相似文献   

8.
为了满足对汽车电子稳定控制系统(ESC)电子控制单元多个产品同步执行测试的需要,对并行测试技术进行了研究;设计开发了基于LabVIEW和TestStand的汽车ESC产品并行测试系统;通过配置NI SWITCH矩阵开关模块和测量仪器单元构建硬件平台,调用LabVIEW编写的测试测量功能子程序,并编制TestStand并行测试序列,采用TestStand内置的多线程并行测试管理技术,解决了多线程软件编程中遇到的竞争、资源冲突、死锁问题,实现了对四个被测单元同时执行测试的并行测试系统。  相似文献   

9.
并行程序设计由于需要考虑进程之间的同步等问题使得编码过程十分复杂。可视化的并行程序设计为程序员提供了图形化的编程模板和骨架来进行并行程序的设计工作,在一定程度上减小了并行程序的设计难度。首先研究软件事务性内存模型,它相对于传统的并行程序设计方法而言有着接口简单灵活,可扩展性强等特点,之后将STM模型运用到可视化程序设计中来,使得其编程接口以UML活动图的形式提供给编程人员使用,不用依赖特定的软件或硬件环境,提高了可视化并行程序设计的通用性与可扩展性。  相似文献   

10.
在异构多核机群系统上利用数据任务块的动态调度策略和全锁定技术,给出一种面向数据密集型应用的结点内主存和可用的共享二级缓存大小中动态调度数据块的多进程级和多线程级并行编程机制,给出了优化数据密集型应用并行程序性能的策略和技术。在多核计算机组成的异构机群上并行求解随机序列多关键字查找的实验结果表明,所给出的多核并行程序设计机制和性能优化方法可行和高效。  相似文献   

11.
Parallel simulation of parallel programs for large datasets has been shown to offer significant reduction in the execution time of many discrete event models. The paper describes the design and implementation of MPI-SIM, a library for the execution driven parallel simulation of task and data parallel programs. MPI-SIM can be used to predict the performance of existing programs written using MPI for message passing, or written in UC, a data parallel language, compiled to use message passing. The simulation models can be executed sequentially or in parallel. Parallel execution of the models are synchronized using a set of asynchronous conservative protocols. The paper demonstrates how protocol performance is improved by the use of application-level, runtime analysis. The analysis targets the communication patterns of the application. We show the application-level analysis for message passing and data parallel languages. We present the validation and performance results for the simulator for a set of applications that include the NAS Parallel Benchmark suite. The application-level optimization described in the paper yielded significant performance improvements in the simulation of parallel programs, and in some cases completely eliminated the synchronizations in the parallel execution of the simulation model  相似文献   

12.
本文在并行系统模拟环境中,采集了一个迭代类并行程序实例的运行时间数据,据此,分析了影响程序运行时间的主要因素,建立了一个并行程序运行时间推算模型,从而可以在迭代次数,输入数据规模,以及并行系统的配置等三个方向上对程序运行时间进行预测,实验数据表明,该模型是相当精确的,可以为我们节省大量的模拟时间。  相似文献   

13.
郑宇华  谢立 《计算机学报》1993,16(9):641-647
本文提出一种新型并行推理机制BTJ,它同时支持受限“与”并行和完全“或”并行,与其它“与/或”并行模型相比,BTJ具有高并行度和低运行时刻代价的优点,性能测试结果表明,BTJ对于“与”并行和“或”并行均可获得较好的并行加速比。  相似文献   

14.
P-GRADE provides a high-level graphical environment to develop parallel applications transparently both for parallel systems and the Grid. P-GRADE supports the interactive execution of parallel programs as well as the creation of a Condor, Condor-G or Globus job to execute parallel programs in the Grid. In P-GRADE, the user can generate either PVM or MPI code according to the underlying Grid where the parallel application should be executed. PVM applications generated by P-GRADE can migrate between different Grid sites and as a result P-GRADE guarantees reliable, fault-tolerant parallel program execution in the Grid. The GRM/PROVE performance monitoring and visualisation toolset has been extended towards the Grid and connected to a general Grid monitor (Mercury) developed in the EU GridLab project. Using the Mercury/GRM/PROVE Grid application monitoring infrastructure any parallel application launched by P-GRADE can be remotely monitored and analysed at run time even if the application migrates among Grid sites. P-GRADE supports workflow definition and co-ordinated multi-job execution for the Grid. Such workflow management can provide parallel execution at both inter-job and intra-job level. Automatic checkpoint mechanism for parallel programs supports the migration of parallel jobs inside the workflow providing a fault-tolerant workflow execution mechanism. The paper describes all of these features of P-GRADE and their implementation concepts.  相似文献   

15.
Aquarius-II is a cache coherent multiprocessor system designed for the parallel execution of Prolog programs. It contains two tiers of memory: synchronization memory and high bandwidth (HB) memory. The synchronization memory consists of snooping caches connected to a bus and is used to store rendezvous points, synchronization bits, synchronization variables such as locks and semaphores and most of the write shared data. The HB memory is used to store the bulk of the application program code and data. It contains caches and an inexpensive VLSI chip based crossbar interconnection network to memory. The caches connected to the crossbar do not have full snooping capability. The architecture is evaluated by a full simulation of parallel execution of Prolog programs on Aquarius-II. The design details of the components of the architecture and simulation results are presented. Simulation results indicate that the two tier memory system significantly reduces memory interference and speeds up synchronization when compared to a single bus multi. This shared memory multiprocesor architecture has the potential to support other parallel programming paradigms.  相似文献   

16.
基于群机系统的并行程序的最大加速比计算   总被引:1,自引:0,他引:1  
加速比是并行程序的重要指标之一。在大多数并行系统中,在数据规 模确定的情况下,程序的加速比随节点工作站的增加而增加,但是大多数群机 系统的节点工作站是共享物理传输介质的,这使得许多并行程序的加速比在节 点机数目超过某一个值之后会随着节,点机的增加而减少。本文通过对群机系统 上并行程序执行时间的分析,论述了在数据规模确定的情况下,程序能够获得 的最大加速比和最短的计算时间,以及获得这个加速比和计算时间的节点机个 数。  相似文献   

17.
Dynamical concurrent execution makes it possible to adapt programs for their execution on computing environments with parallel architecture. In the paper, a formal model of dynamical concurrent execution of programs written in functional style is presented. The model is proven to possess a feature that guarantees correctness of concurrent execution.  相似文献   

18.
A specification of the OR-parallel execution of Prolog programs, using CHOCS (calculus of higher order communicating systems) [24], is presented in the paper. A translation is defined from Prolog programs and goals to CHOCS processes: the execution of the CHOCS process corresponding to a goal mimics the OR-parallel execution of the original Prolog goal. In the translation, clauses and predicate definitions of a Prolog program correspond to processes. To model OR-parallelism, the processes , corresponding to clauses (having the same head predicate ) start their execution concurrently, but, in order to respect the depth-first search rule, each is guarded by the termination of the executions of processes 's, . The computational model is proved correct with respect to the semantics of Prolog, as given in [4, 5]. Our model, because of its algebraic specification, can be easily used to prove properties of the parallel execution of Prolog programs. Moreover, the model exploits the maximum degree of parallelism, by giving the Prolog solutions in parallel, without any order among them. However, this model, being close to the Prolog semantics definition, contains sources of inefficiency which make it unpractical as a guide for the implementation. To overcome these problems, a new computational model is defined. This model is obtained by modifications of the basic one and thus its correctness can be easily proved. Finally, we show how to obtain models of different real implementations of OR-parallel Prolog by slight modification of the new model. The relations among all these models, in terms of parallelism degree, are studied by using the concepts of bisimulation and simulation, developed for concurrent calculi. Received: 5 May 1995 / 28 May 1996  相似文献   

19.
D.A.  P.D. 《Performance Evaluation》2005,60(1-4):165-187
We present a new performance modeling system for message-passing parallel programs that is based around a Performance Evaluating Virtual Parallel Machine (PEVPM). We explain how to develop PEVPM models for message-passing programs using a performance directive language that describes a program’s serial segments of computation and message-passing events. This is a novel bottom-up approach to performance modeling, which aims to accurately model when processing and message-passing occur during program execution. The times at which these events occur are dynamic, because they are affected by network contention and data dependencies, so we use a virtual machine to simulate program execution. This simulation is done by executing models of the PEVPM performance directives rather than executing the code itself, so it is very fast. The simulation is still very accurate because enough information is stored by the PEVPM to dynamically create detailed models of processing and communication events. Another novel feature of our approach is that the communication times are sampled from probability distributions that describe the performance variability exhibited by communication subject to contention. These performance distributions can be empirically measured using a highly accurate message-passing benchmark that we have developed. This approach provides a Monte Carlo analysis that can give very accurate results for the average and the variance (or even the probability distribution) of program execution time. In this paper, we introduce the ideas underpinning the PEVPM technique, describe the syntax of the performance modeling language and the virtual machine that supports it, and present some results, for example, parallel programs to show the power and accuracy of the methodology.  相似文献   

20.

Powerlists are data structures that can be successfully used for defining parallel programs based on divide-and-conquer paradigm. These parallel recursive data structures and their algebraic theories offer both a methodology to design parallel algorithms and parallel programming abstractions to ease the development of parallel applications. The paper presents a technique for speeding up the parallel recursive programs defined based on powerlists. The improvements are achieved by applying transformation rules that introduce tuple functions and prefix operators, for which a more efficient execution model is defined. Together with the execution model, a cost model is also defined in order to allow a proper evaluation. The treated examples emphasise the fact that the transformation leads to important improvements of the programs. The speeding up is achieved by reducing the number of recursive calls, and also by enable the fusion of splitting/combining operations on different data structures. In addition, enhancing the function that has to be computed to other useful functions using a tuple, could improved the cost reduction even more.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号