期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

何王全刘勇方燕飞魏迪漆锋滨《软件学报》2017,28(4):764-785

异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战,因此研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、提升并行程序的性能都具有重要的意义.提出统一架构的多模式并行编程模型,包括异构融合的加速运算模型和按同构方式编程的自主运算模型,根据编程模型设计了Parallel C语言,能有效描述国产众核系统的异构并行性,与其它众核系统上MPI+X的使用模式相比,编程和系统优化都具有全局视角,在多级局部性描述、单边消息、兼容已有多核应用等方面具有特色;基于Open64构建了Parallel C编译系统,全面支持加速运算模型和自主运算模型,提出并实现了数据布局与自动DMA、编译指导的线程代理和拓扑位置感知的集合通信等优化.Micro Benchmark和实际应用在神威太湖之光计算机系统上的测试数据表明,Parallel C语言和编译系统具有良好的性能和可扩展性,能够有效支撑大型应用. 相似文献

2.

基于多核处理器的并行编程模型 总被引：3，自引：3，他引：0

伊君翰《计算机工程》2009,35(8):62-64

为解决传统编程模型与并行架构间存在的矛盾,针对多媒体和网络应用程序的特点,提出一种基于多核处理器的并行编程模型,该模型采用节点化的并行程序描述方式,将并行编译器划分到多个核上运行。实验结果表明,这种新的并行编程模型能有效提高程序的执行效率。相似文献

3.

事务存储系统

彭林谢伦国张小强《计算机研究与发展》2009,46(8)

多核处理器性能的发挥依靠程序的并行,共享存储并行编程模型为大多数多核处理器所采用,而有效同步多个线程对共享变量的访问是其关键、也是难题.借鉴数据库中事务的思想,人们提出事务存储(transactional memory),旨在提供一种编程简单,对程序正确性推理容易的同步手段.简介了事务存储的起源,诠释了事务存储系统的概念.论述了事务存储的编程接口和执行模型.讨论了事务存储系统所涉及的主要内容,对各种方法和策略进行了比较.对事务存储中有待解决的问题进行了探讨.最后介绍了几个开源的事务存储研究平台. 相似文献

4.

一种异步BSP模型及其程序优化技术 总被引：2，自引：0，他引：2

刘方爱刘志勇乔香珍《计算机学报》2002,25(4):373-380

基于BSP模型，该文提出了异步计算模型（CSA－BSP）。该模型更准确地描述了并行机的性能参数，引导用户编写高效率的并行程序，在CSA－BSP模型下，两个进程异步执行的位置至多相差p-1个超步；基于程序的执行时间，作者分析了BSP、A－BSP和CSA－BSP程序的效率，得出CSA－BSP程序的效率是最高的，在曙光并行机上，用“红黑格法”和“矩阵乘法”进行了验证，和BSP模型相比，这两个CSA－BSP程序的效率分别提高20％和37％；同时，其进程执行时间和最大可以降低8％，因此，按照CSA－BSP模型编程对于提高程序效率和改善系统的吞吐率，都有良好的效果。相似文献

5.

一种面向多核DSP的小容量紧耦合快速共享数据池 总被引：7，自引：0，他引：7

陈书明汪东陈小文万江华《计算机学报》2008,31(10)

该文结合片上便笺式存储器(SPM)的结构特点,提出了一种面向异构多核DSP的新型小容量紧耦合共享存储结构——快速共享数据池(FSDP).FSDP在存储层次上与一级Cache平行,可以被访存指令直接访问,采用多体并行的结构、交叉访问模式和基于硬件信号灯的自动同步机制,支持多个DSP核的并行访问与快速的核间数据交换,两核之间交换单个数据只需4拍.该文构建了FSDP的模拟模型,并进行了RTL级设计实现和分析.多种典型测试程序的验证表明,FSDP对于DSP核间细粒度共享数据的传输具有很高的效率,相比同类的VS-SPM结构能够将程序性能提高37%,与传统的共享数据Cache结合使用能够将异构多核DSP的性能提高13%. 相似文献

6.

化学驱油藏数模并行化中的关键技术

常晓东胡长军李永红《微计算机信息》2007,23(28):249-251

本文描述了化学复合驱数值模拟程序UTCHEM在分布式内存多计算机并行系统SMP-CLUSTER上并行化的关键技术。化学复合驱并行模型采用单程序多数据（SPMD）程序模型，利用区域分解方法将整个求解区域分解为子区域，使得多个计算节点同时求解一个单一的模拟问题。各计算节点通过消息传递对重叠区域的共享数据进行通信，以协调各节点之问的计算。目前仅对压力方程组求解部分进行了并行化实现。测试结果显示了较好的并行效率。相似文献

7.

PDSM：一个可移植的分布式共享存储系统 总被引：1，自引：0，他引：1

徐大杰章锋《计算机科学》1999,26(2):18-22

1.引言科学计算是一门迅速发展的学科,传统上,这些问题是用超级计算机或工作站机群来解决的。在互相独立的计算机上的并行程序设计是在PVM这样的网络并行计算和分布式编程环境下通过消息传递实现结点通信的。但是,由于编程者要了解底层消息传递的细节,基于PVM的并行编程十分困难,而科学家们又没有很多精力用于细致的程序设计。DSM(分布式共享内存)通过在工作站机群上建立一个共享内存的抽象层来降低这种程序设计的复杂度。相似文献

8.

基于MSA的SMIM实现机制

下载免费PDF全文

高志鹏孟洛明李文璟邱雪松《计算机工程》2008,34(2):31-33

提出基于元数据存储访问的共享管理信息模型实现机制,以解决异构系统互操作问题,保证数据一致性。该实现机制基于已有异构系统和已有数据,使用分层的逻辑划分,将实现过程划分为数据应用层、数据访问适配层、数据访问代理层、元数据存储层及数据存储层5个层次,并介绍了各个层次的功能和相关算法。相似文献

9.

面向国产高性能众核处理器的编程模型

陈虎周鹏灵《计算机应用》2023,(11):3517-3526

在国产高性能众核处理器上编程时，需要直接使用最底层的接口开发软件，这使编程和调试非常困难；并且各自平台的高性能软件编程模型较为基础，计算软件不能通用，造成了重复性开发。针对以上问题，实现了通用编程模型以及所对应的支撑库：一方面基于消息队列机制开发国产高性能众核处理器的线程级并行机制；另一方面基于单指令多数据流（SIMD）编程模型开发从核上的数据级并行性。首先，对国产高性能众核处理器体系结构进行抽象；其次，设计模型的消息队列机制，并为程序员提供一套异构并行编程接口，如系统参数接口、从核线程控制接口、消息队列接口、SIMD抽象接口；最后，在上述基础上形成全新的高性能计算软件开发模型和方法，方便用户开发基于国产高性能众核处理器的并行计算软件。性能传输测试结果表明，在国产众核处理器上，当启动核数较少时，所提模型的传输带宽普遍达到了峰值直接内存访问（DMA）带宽的90%；当启动的核数较多时，消息队列模型的传输带宽普遍达到了峰值DMA带宽的70%。在矩阵乘法实验中，与系统原语传输矩阵并计算的性能相比，所提模型的性能达到前者的90%；在口令猜测系统中，所提模型的代码性能与直接使用最底层的接口开发的代... 相似文献

10.

基于对象的隐式并行编程众核体系结构研究

谭海《计算机工程与设计》2013,34(2):623-626

针对现有的显式编程模式编程成本大、容易出错且不能兼容现有的开发工具和程序的不足,提出了面向对象的基于对象粒度的隐式编程模式及支持该模式的众核体系结构,在底层硬件和编译技术支撑下,兼容现有的串行程序开发模式、开发技术和开发工具,降低并行程序开发成本和开发风险,并通过反编译技术和软件逆分析手段,实现对现有的串行二进制代码并行化,使众核时代不至于抛弃现有的这些代码,这些关键技术能够解决众核技术发展的瓶颈. 相似文献

11.

Hybrid address spaces: A methodology for implementing scalable high-level programming models on non-coherent many-core architectures

《Journal of Systems and Software》2014

This paper introduces hybrid address spaces as a fundamental design methodology for implementing scalable runtime systems on many-core architectures without hardware support for cache coherence. We use hybrid address spaces for an implementation of MapReduce, a programming model for large-scale data processing, and the implementation of a remote memory access (RMA) model. Both implementations are available on the Intel SCC and are portable to similar architectures. We present the design and implementation of HyMR, a MapReduce runtime system whereby different stages and the synchronization operations between them alternate between a distributed memory address space and a shared memory address space, to improve performance and scalability. We compare HyMR to a reference implementation and we find that HyMR improves performance by a factor of 1.71× over a set of representative MapReduce benchmarks. We also compare HyMR with Phoenix++, a state-of-art implementation for systems with hardware-managed cache coherence in terms of scalability and sustained to peak data processing bandwidth, where HyMR demonstrates improvements of a factor of 3.1× and 3.2× respectively. We further evaluate our hybrid remote memory access (HyRMA) programming model and assess its performance to be superior of that of message passing. 相似文献

12.

Master–worker model for MapReduce paradigm on the TILE64 many-core platform

《Future Generation Computer Systems》2014

MapReduce is a popular programming paradigm for processing big data. It uses the master–worker model, which is widely used on distributed and loosely coupled systems such as clusters, to solve large problems with task parallelism. With the ubiquity of many-core architectures in recent years and foreseeable future, the many-core platform will be one of the main computing platforms to execute MapReduce programs. Therefore, it is essential to optimize MapReduce programs on many-core platforms. Optimizations of parallel programs for a many-core platform are viewed as a multifaceted problem, where both system and architectural factors should be taken into account. In this paper, we look into the problem by constructing a master–worker model for MapReduce paradigm on the TILE64 many-core platform. We investigate master share and worker share schemes for implementation of a MapReduce library on the TILE64. The theoretical analysis shows that the worker share scheme is inherently better for implementation of MapReduce library on the TILE64 many-core platform. 相似文献

13.

Cache-based high-level simulation of microthreaded many-core architectures

《Journal of Systems Architecture》2014,60(7):529-552

The accuracy of simulated cycles in high-level simulators is generally less than the accuracy in detailed simulators for a single-core systems, because high-level simulators simulate the behaviour of components rather than the components themselves as in detailed simulators. The simulation problem becomes more challenging when simulating many-core systems, where many cores are executing instructions concurrently. In these systems data may be accessed from multiple caches and the abstraction of the instruction execution has to consider the dynamic resource sharing on the whole chip. The problem becomes even more challenging in microthreaded many-core systems, because there may exist concurrent hardware threads. Which means that the latency of long latency operations can be tolerated from many cycles to just few cycles. We have previously presented a simulation technique to improve the accuracy in high-level simulation of microthreaded many-core systems, known as Signature-based high- level simulator, which adapts the throughput of the program based on the type of instructions, number of instructions and number of active threads in the pipeline. However, it disregards the access to different levels of the caches on the many-core system. Accessing L1-cache has far less latency than accessing off-chip memory and if the core is not able to tolerate latency, different levels of caches can not be treated equally. The distributed cache network along with the synchronization-aware coherency protocol in the Microgrid is a complicated memory architecture and it is difficult to simulate its behaviour at a high-level. In this article we present a high-level cache model, which aims to improve the accuracy in high-level simulators for general-purpose many-core systems by adding little complexity to the simulator and without affecting the simulation speed. 相似文献

14.

使用粗糙集与Bayes分类器的P2P网络安全管理机制

王海晟王海晨桂小林《计算机科学》2012,39(9):28-32

提出一种使用粗糙集与Bayes分类器的P2P网络安全管理机制。该模型放弃了局部信任度与全局信任度等概念,对不满意事件进行分类统计,对交易节点进行分类控制。创新之处有:1)通过对节点彼此之间进行交易发生的不满意事件按照交易失败的类型、损害的严重程度、交易规模的大小等情况进行分类与量化,将交易失败事件区分为恶意攻击、大规模交易且质量不满意等类型。2)使用粗糙集分类器与Bayes分类器,将对等网络中的节点划分为可信任节点、陌生节点、恶意节点等不同的类型;建立信任节点列表与恶意节点列表;交易时将恶意节点排除在外。3)建立了反馈控制机制,使用粗糙集分类器与Bayes分类器根据节点反馈推荐的意见对被评价节点进行分类、做出评价,同时监测提出评价的节点是否有恶意行为,将反馈行为划分为诚实反馈、恶意反馈等。实验表明,与已有的安全模型相比,提出的安全管理机制对恶意行为具有更高的检测率、更满意的交易成功率以及更好的反馈信息综合能力。相似文献

15.

基于分布对象的并行程序设计方法研究

龚向坚邹腊梅马淑萍《现代计算机》2011,(21):9-11,26

研究分布式对象的并行实现及优化,提出一种基于分布式对象的并行程序设计方法,构建一个基于分布式对象的并行程序设计模型,并以此方法完成虚拟计算机网络实验系统的设计和实现实验结果表明,该虚拟计算机网络实验系统并行性较好、响应速度适中,证明基于分布式对象的并行程序设计方法在改善微机系统并行性上具有一定的作用相似文献

16.

流媒体服务中基于分布式代理的缓存数据放置策略

郭攀红杨扬李新友《计算机科学》2009,36(11):56-60

随着高速宽带接入技术的发展,流媒体技术的研究得到了迅速的发展,并具有广阔的应用前景.流媒体代理技术作为减轻服务器的访问负载、提高用户的访问响应速度的重要手段,已成为流媒体研究领域中的研究热点之一.针对流媒体服务中的分布式代理服务器系统,提出了一种优化的缓存数据放置策略.其主要思想是将缓存数据放入某个特定的代理服务器中,使得今后访问该数据的网络传输开销最小.仿真实验表明,所提出的算法比传统的缓存数据放置算法能获得更小的传输开销和更好的可扩展性. 相似文献

17.

An integrated,programming model-driven framework for NoC–QoS support in cluster-based embedded many-cores

《Parallel Computing》2013,39(10):549-566

Embedded SoC designs are embracing the many-core paradigm to deliver the required performance to run an ever-increasing number of applications in parallel. Networks-on-Chip (NoC) are considered as a convenient technology to implement many-core embedded platforms. The complex and non-uniform nature of the traffic flows generated when multiple parallel applications are running simultaneously calls for Quality-of-Service (QoS) extensions in the NoC, but to efficiently exploit similar services it is necessary to expose them to the software in a easy-to-use yet efficient manner. In this paper we present an integrated hardware/software approach for delivering QoS on top of an hybrid OpenMP-MPI parallel programming model. Our experimental results show the effectiveness of our proposal over a broad range of benchmarks and application mappings, demonstrating the ability to manage parallelism under QoS requirements effortlessly from the programming model. 相似文献

18.

SSearch基于众核加速的并行模型分析

张丹丹徐莹徐磊李根国《计算机应用与软件》2012,29(8):78-81

介绍SSearch核心算法的特点,分析该算法的并行性,并以GPU以及类Cell处理器为例分析算法对众核系统的适用性。在此基础上提出众核系统下的SSearch并行模型。相似文献

19.

Active Data: A programming model to manage data life cycle across heterogeneous systems and infrastructures

《Future Generation Computer Systems》2015

The Big Data challenge consists in managing, storing, analyzing and visualizing these huge and ever growing data sets to extract sense and knowledge. As the volume of data grows exponentially, the management of these data becomes more complex in proportion. A key point is to handle the complexity of the data life cycle, i.e. the various operations performed on data: transfer, archiving, replication, deletion, etc. Indeed, data-intensive applications span over a large variety of devices and e-infrastructures which implies that many systems are involved in data management and processing. We propose Active Data, a programming model to automate and improve the expressiveness of data management applications. We first define the concept of data life cycle and introduce a formal model that allows to expose data life cycle across heterogeneous systems and infrastructures. The Active Data programming model allows code execution at each stage of the data life cycle: routines provided by programmers are executed when a set of events (creation, replication, transfer, deletion) happen to any data. We implement and evaluate the model with four use cases: a storage cache to Amazon-S3, a cooperative sensor network, an incremental implementation of the MapReduce programming model and automated data provenance tracking across heterogeneous systems. Altogether, these scenarios illustrate the adequateness of the model to program applications that manage distributed and dynamic data sets. We also show that applications that do not leverage on data life cycle can still benefit from Active Data to improve their performances. 相似文献