首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 109 毫秒
1.
UPC并行循环优化的研究与实现   总被引:2,自引:0,他引:2  
UPC(UnifiedParallelC)是一种新型的基于全局地址空间(GlobalAddressSpace,简称GAS)访问的并行编程语言,支持SPMD(SingleProgramMulti-Data)编程模式。论文主要研究UPC原型系统的编译器优化技术的算法与实现,该UPC原型系统是建立在开放源码的BerkeleyUPC编译器基础之上的。目前该原型系统已实现了upc_forall优化和共享访问私有化,使得一部分UPC并行应用程序的效率得到了明显改善。  相似文献   

2.
全局地址空间网络(GASNet)是一种用于Berkele UPC语言的可移植高性能的通信系统川。通过对该系统分析,对其单边通信进行了优化,优化后性能提高近42%。此外,还对通信与计算的重叠技术进行了探讨。  相似文献   

3.
Unified Parallel C(UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space(PGAS) programming model,which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures.Therefore,UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures,such as multi-core clusters,in a more productive way,accessing remote memory by means of different high-level language constructs,such as assignments to shared variables or collective primitives.However,the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality.This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library,allowing,for example,the use of a specific source and destination thread or defining the amount of data transferred by each particular thread.This library fulfills the demands made by the UPC developers community and implements portable algorithms,independent of the specific UPC compiler/runtime being used.The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies.The results obtained confirm the suitability of the new library to provide easier programming without trading off performance,thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.  相似文献   

4.
EAN·UCC系统(全球统一物品标识系统)起源于美国,是由美国统一代码委员会(UCC:Universal Code Council)于1973年创建的。UCC采用12位数字标识代码UPC(Universal Product Code)码。1974年标识代码和条码符号首次在贸易活动中得以应用。继UPC系统成功之后,欧洲物品编码协会,即现在的国际物品编码协会,于1977年开发了一套在北美以外使用,与UPC系统相兼容的系统——EAN(European Article Numbering)系统。EAN系统  相似文献   

5.
ABR业务的用法/网络参数控制(UPC/NPC)   总被引:1,自引:0,他引:1  
论文讨论了对ABR业务实行用法/网络参数控制的重要性和难点,分析了现有实现方法的优点和局限性,重点提出并分析了动态UPC/NPC算法。模拟实验证明动态UPC/NPC是一个实用的有效的方法。  相似文献   

6.
用高密度可编程逻辑器件设计条码阅读器(上)   总被引:1,自引:0,他引:1  
用一片Lattice pLSI器件实现单片条码阅读器.该条码阅读器可解读出版本A(12位数字)和版本E(6位数字)两类标准通用商品码(UPC码).  相似文献   

7.
基于热备份的主备倒换在高端路由器中的应用   总被引:1,自引:0,他引:1  
为了提高高端路由器的可靠性、减少故障持续时间,关键在于控制高端路由器主备倒换的时间。文中重点分析了热备份下的主备倒换技术,在出现故障或者人为主动触发主备倒换后,高端路由器的主控板会经过平滑函数阶段、热备份初始化阶段、批同步阶段和实时同步阶段,而备UPC会在极短的时间内变为主UPC,并承担前主UPC的所有业务,从而节省大量学习时间。实验证明,本文采用基于热备份的主备倒换技术并将其运用于高端路由器中,大大地提高了系统的可靠性,减少了故障持续时间。  相似文献   

8.
张玉峰  孙知信 《微机发展》2010,(3):172-175,179
为了提高高端路由器的可靠性、减少故障持续时间,关键在于控制高端路由器主备倒换的时间。文中重点分析了热备份下的主备倒换技术,在出现故障或者人为主动触发主备倒换后,高端路由器的主控板会经过平滑函数阶段、热备份初始化阶段、批同步阶段和实时同步阶段,而备UPC会在极短的时间内变为主UPC,并承担前主UPC的所有业务,从而节省大量学习时间。实验证明,本文采用基于热备份的主备倒换技术并将其运用于高端路由器中,大大地提高了系统的可靠性,减少了故障持续时间。  相似文献   

9.
首先介绍了现阶段几种主流的并行语言OpenMP,HPF,Co-array FORTRAN,ZPL,UPC,并说明其各自的特点以及目前的状况,然后对并行编译技术的现状进行了分析,最后对并行语言以及编译技术的发展趋势进行了预测.  相似文献   

10.
在大规模多输入多输出(massive MIMO)下行链路系统下,不牺牲用户端QoS,进行优化能源效率,对信号发射动态功率进行了推导,这个问题被证明有一个隐藏凸优化,利用凸优化提出优化算法,动态的集中能量给用户分配同时减少了损耗.在相同条件和假设下,与传统的迫零预编码(ZF)和最大比传输(MRT)预编码方案进行性能分析并比较.仿真结果与理论结果一致,并表明在相同条件下优化算法比ZF的性能好,ZF比MRT性能更好.  相似文献   

11.
For pt.I. see ibid., p. 170-80. In pt.I, we presented a binding environment for the AND and OR parallel execution of logic programs. This environment was instrumental in rendering a compiler for the AND and OR parallel execution of logic programs machine independent. In this paper, we describe a compiler based on the Reduce-OR process model (ROPM) for the parallel execution of Prolog programs, and provide performance of the compiler on five parallel machines: the Encore Multimax, the Sequent Symmetry, the NCUBE 2, the Intel i860 hypercube and a network of Sun workstations. The compiler is part of a machine independent parallel Prolog development system built on top of a run time environment for parallel programming called the Chare kernel, and runs unchanged on these multiprocessors. In keeping with the objectives behind the ROPM, the compiler supports both on and independent AND parallelism in Prolog programs and is suitable for execution on both shared and nonshared memory machines. We discuss the performance of the Prolog compiler in some detail and describe how grain size can be used to deliver performance that is within 10% of the underlying sequential Prolog compiler on one processor, and scale linearly with increasing number of processors on problems exhibiting sufficient parallelism. The loose coupling between parallel and sequential components makes it possible to use the best available sequential compiler as the sequential component of our compiler  相似文献   

12.
A Vectorizing Compiler for Multimedia Extensions   总被引:6,自引:0,他引:6  
In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture.  相似文献   

13.
自动并行编译的新进展   总被引:2,自引:0,他引:2  
自动并行编译是并行程序的主要途径之一,本文概述了发展自动并行化编译的必要性及其主要进展,讨论了当前采用的主要技术和今后的发展动向。  相似文献   

14.
VLIW机器在单个机器周期中同时发射并执行多个的并行操作,从而获得较高的指令级并行度,这些操作之间的依赖分析和调度工作则被完全交给相应的编译器执行,因此VLIW的并行性能能否充分发挥取决于VLIW体系结构相关编译器的质量。GNU开发的GCC是被最广泛使用的编译系统之一,它具有多语言、多平台支持的能力和开放的结构,能够运用各种成熟的常规编译优化技术生成高效的代码。文章分析了VLIW及GCC的结构特点,提出了一种基于GCC的VLIW编译系统设计方案,利用GCC进行RTL中间代码一级的体系结构无关优化和少量体系结构相关优化,在汇编代码一级针对VLIW结构进行体系结构相关的优化,从而充分利用GCC的成熟编译技术快速开发高效的VLIW多语言编译系统。  相似文献   

15.
Unified Parallel C (UPC) is a Partitioned Global Address Space (PGAS) language whose popularity has increased during the last years owing to its high programmability and reasonable performance through an efficient exploitation of data locality, especially on hierarchical architectures like multicore clusters. However, the performance issues that arise in this language due to the irregular structure of sparse matrix operations have not yet been studied. Among them, the selection of an adequate storage format for the sparse matrices can significantly improve the efficiency of the parallel codes. This paper presents an evaluation, using UPC, of the most common sparse storage formats with different implementations of the matrix-vector and matrix-matrix products, which are key kernels in many scientific applications.  相似文献   

16.
同时多线程(SMT)能在同一时钟周期执行不同线程的指令,同时开发了指令级并行(ILP)和线程级并行(TLP)。显式并行指令计算(EPIC)关注于编译器和硬件的相互协作。在本文中,我们设计和实现了一套并行环境,其中包括并行编译器OpenUH和基于IA-64的同时多线程体系结构EDSMT,并通过NAS并行测试程序作出了性能评测。  相似文献   

17.
随着四核微机走向市场和八十核处理器在实验室研制成功,多核正引领软件研发发生基础性变化。开发人员需要在代码中添加线程来利用系统所提供的多个内核,从而提升PC应用软件的功能和性能。文中探讨在多核微机上进行并行计算的实现技术。介绍了共享存储系统并行编程接口OpenMP的模型、指令和库函数,以及Intel C 编译器9.1和Microsoft Visual Studio 2005等对OpenMP的支持;着重探讨了二维离散快速傅里叶变换并行算法的设计、实现与优化技术;展望了高性能并行计算软构件库的开发前景。  相似文献   

18.
介绍一种可扩展的自动并行化编译系统Agassiz,研究其架构设计及关键特性。该系统可以把串行程序转换为并行程序,并为编译优化技术的研究提供良好的平台,通过面向对象的设计和实现,能有效集成各种并行优化技术。实验结果表明,该系统具有良好的可扩展性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号