期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘振英张毅《小型微型计算机系统》2000,21(1):64-66

本文介绍了一种新型的并行程序设计语言ＨＰＣ＋＋语言,在由多个结点互连起来组成的网络环境下,ＨＰＣ＋＋不仅支持结点间的并行,还支持结点内的线索并行。另外,利用ＣＯＢＲＡ的ＩＤＬ技术,用户可以对远程对象的成员函数进行调用。本文还对它的并行标准模版库进行了描述。相似文献

2.

基于机群系统的C＋＋语言并行化实现 总被引：2，自引：0，他引：2

温冬蝉王鼎兴《计算机学报》1997,(1)

在计算机机群系统环境下，将面向对象程序设计技术与并行技术相结合能够有效地降低并行程序设计的难度，提高并行程序的可维护性、可移植性和可重用性．本文探讨了机群系统下的Ｃ＋＋语言并行化实现的几种方法，分别介绍了基于消息传递的ＭＰＣ＋＋、基于共享对象的ＳＯＣ＋＋和基于对象级并行的ＣＣＰＰ语言模型、编程接口及其实现，并给出了几种语言系统评测的结果及分析．相似文献

3.

DPC—C＋＋语言的并发性实现

常守金柳军飞《微小型计算机开发与应用》1996,(3):22-25

ＤＰＣ－Ｃ＋＋语言是一个支持分布式应用程序设计的面向对象的并发程序设计语言，本文侧重介绍了ＤＰＣ－Ｃ＋＋语言的并发性实现及其程序模型，并简要给出其运行支持系统的设计。相似文献

4.

面向对象C＋＋并行编译系统的总体设计和实现

肖侬胡守仁《计算机研究与发展》1997,34(4):292-297

面向对象语言在大型并行软件设计和开发上具有巨大的潜力。本文介绍了在网络环境上，我们设计的面向对象Ｃ＋＋并行编译系统ＯＯＣＰＣＳ的面向对象的大粒度数据流并行模型和总体设计，并讨论了其中一些重要的实现技术。相似文献

5.

一种支持虚拟共享存储空间的并行程序设计模型

孙彤李三立《小型微型计算机系统》1997,18(8):1-6

本文研究机群系统的程序设计问题，旨在建立一种支持虚拟共享存储空间和多种并行性描述方式的并行程序设计模型。文中首先提出了抽象结构共享存储器模型的概念，并在此基础上建立了同时支持数据并行、任务并行和对象并行的层次并行模型，这两种模型构成了并行语言ＴｉｐＣ＋＋的并行程序设计模型。文中还初步讨论了基于这种程序设计模型的性能优化原语、编译优化和任务调度等问题。相似文献

6.

基于SBM的操作级并行处理算法研究

文涤屏杨学军《计算机工程与科学》1992,14(1):71-78

SBM是支持操作级并行的一种有效的同步机制。基于SBM,本文对结点调度和barrier插入算法进行了深入的研究,提出了一套行之有效的开发操作级并行的方案。本文用一有向图G(N,A)表示指令之间的相关关系,并以结点的临界路径为关键字将结点从小到大进行排序,按照排序后的结点顺序,本文描述了一种分配算法将结点分配给各处理机,同时描述了相关结点之间的barrier插入算法。相似文献

7.

网络并行计算系统的消息存储器网络接口设计 总被引：4，自引：0，他引：4

武剑锋李三立戈弋《计算机学报》2000,23(2):195-201

文中通过定性分析典型并行应用程序,提出产蒙义了消息传递无关因子Ｒ,即堆中的数据的传递在整个消息传递中所占比例,而且后在一个实际的ＮＰＣ环境中对一组典型并行应用程序进行踪迹统计,证实了Ｒ接近１的分析,根据这个定性分析以及定量统计结构,结合存储器技术的进展,在ＮＰＣ中的网络接口上引入了消息存储器,使得ＮＰＣ中各个结点可以直接访问其它结点的消息存储器,通过竣是出结论,在设置了消息存储器的网络接口的ＮＰＣ相似文献

8.

BJ－1并行计算机的并行程序设计语言

纪金龙金亿新李强钟津立《计算机学报》1995,(12)

本文介绍了ＢＪ－１并行计算机系统中并行Ｃ（ＢＰＣ）／并行ＦＯＲＴＲＡＮ（ＢＰＦ７７）语言的设计思想和实现技术．相似文献

9.

BJ—01并行计算机的系统软件 总被引：1，自引：0，他引：1

黄大海纪金龙《计算机学报》1993,16(12):903-910

本文介绍了ＢＪ－０１并行计算机操作系统ＭＯＳ，并行Ｃ语言ＰＣＬ以及接口软件的设计和实现技术。此外还讨论了ＢＪ－０１并行机的并行执行环境和并行程序调试工具。相似文献

10.

具有大量错误结点的超立方体网络中并行路由算法 总被引：4，自引：0，他引：4

下载免费PDF全文

王国军陈松乔陈建二《计算机工程与科学》2001,23(5):5-12

本文讨论具有大量错误结点的超立方体网络中的并行路由算法。假定Hn是一个局部k-维子立方体连通用的n-维超立方体网络,本文提出的并行路由算法能够找出至少K=min（Dk(u),Dk(v）条并行路径,其中每一条路径的长度不超过（dH(Uk,Vk）＋3）2^k。该算法的时间复杂度为O（Kn2^k）。这里,Dk(u）和Dk(v）分别代表源结点u和目的结点v的正确的邻结点个数（不考虑u和v所在的k－维子立方体内部的邻结点）,dH(Uk,Vk）代表源结点u和目的结点v所在的两个k－维子立方体Uk和Vk之间的海明距离。本文还考察了了k=3的特殊情况,在k=3并且有分别不超过12．5％和25％的错误结点的情况下,该算法的时间复杂度为O（Kn）,并且每一条路径的路径长度分别在大约1．5和2倍源结点和目的结点之间的海明距离之内。该算法只要求结点知道其邻结点的状态,而无需知道整个网络信息,也就是说,该算法是基于局部信息的,因而该算法具有很强的实际意义。相似文献

11.

TCASM: An asynchronous shared memory interface for high-performance application composition

《Parallel Computing》2017

This paper addresses the growing need for mechanisms supporting intra-node application composition in high-performance computing (HPC) systems. It provides a novel shared memory interface that allows composite applications, two or more coupled applications, to share internal data structures without blocking. This allows independent progress of the applications such that they can proceed in a parallel, overlapped fashion. Composite applications using in-node shared memory can reduce the amount of data to be communicated between nodes, allowing checkpointing and data reduction or analytics to be performed locally and in parallel. The approach is implemented in Linux, and evaluated using benchmarks that represent typical composite applications on a large HPC testbed. The results show that the proposed approach significantly outperforms the traditional ones (up to a 15-fold speed increase on a 200 node machine). 相似文献

12.

并行存储系统的功耗优化

下载免费PDF全文

董勇陈娟《计算机工程与科学》2009,31(11)

功耗问题已经成为高性能计算机系统设计的重要问题。并行存储系统是高性能计算机系统的重要组成部分,降低其功耗对于降低整个并行系统功耗具有重要意义。并行存储系统由存储结点组成,降低存储结点功耗是降低并行存储系统功耗的重要部分。本文针对存储结点的处理器提出了功耗优化方法,根据利用率信息调节处理器电压/频率,并通过元数据服务器指导的频率预调节算法缓解因调频所引发的响应时间滞后问题。分析表明,该方法可以有效降低存储结点功耗,实现并行存储系统的功耗优化。相似文献

13.

A technique to automatically determine Ad-hoc communication patterns at runtime

《Parallel Computing》2017

Current High Performance Computing (HPC) systems are typically built as interconnected clusters of shared-memory multicore computers. Several techniques to automatically generate parallel programs from high-level parallel languages or sequential codes have been proposed. To properly exploit the scalability of HPC clusters, these techniques should take into account the combination of data communication across distributed memory, and the exploitation of shared-memory models.In this paper, we present a new communication calculation technique to be applied across different SPMD (Single Program Multiple Data) code blocks, containing several uniform data access expressions. We have implemented this technique in Trasgo, a programming model and compilation framework that transforms parallel programs from a high-level parallel specification that deals with parallelism in a unified, abstract, and portable way.The proposed technique computes at runtime exact coarse-grained communications for distributed message-passing processes. Applying this technique at runtime has the advantage of being independent of compile-time decisions, such as the tile size chosen for each process. Our approach allows the automatic generation of pre-compiled multi-level parallel routines, libraries, or programs that can adapt their communication, synchronization, and optimization structures to the target system, even when computing nodes have different capabilities.Our experimental results show that, despite our runtime calculation, our approach can automatically produce efficient programs compared with MPI reference codes, and with codes generated with auto-parallelizing compilers. 相似文献

14.

ArchSim: A System-Level Parallel Simulation Platform for the Architecture Design of High Performance Computer 总被引：2，自引：0，他引：2

下载免费PDF全文

Yong-Qin Huang 《计算机科学技术学报》2009,24(5):901-912

High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and sequential simulation, are not applicable to system-level simulations of HPC systems. Even the parallel simulation using large-scale parallel machines also have many difficulties in scalability, reliability, generality, as well as efficiency. According to the current needs of HPC architecture design, this paper proposes a system-level parallel simulation platform: ArchSim. We first introduce the architecture of ArchSim simulation platform which is composed of a global server (GS), local server agents (LSA) and entities. Secondly, we emphasize some key techniques of ArchSim, including the synchronization protocol, the communication mechanism and the distributed checkpointing/restart mechanism. We then make a synthesized test of some main performance indices of ArchSim with the phold benchmark and analyze the extra overhead generated by ArchSim. Finally, based on ArchSim, we construct a parallel event-driven interconnection network simulator and a system-level simulator for a small scale HPC system with 256 processors. The results of the performance test and HPC system simulations demonstrate that ArchSim can achieve high speedup ratio and high scalability on parallel host machine and support system-level simulations for the architecture design of HPC systems. 相似文献

15.

A convergence of key‐value storage systems from clouds to supercomputers

Tonglin Li Xiaobing Zhou Ke Wang Dongfang Zhao Iman Sadooghi Zhao Zhang Ioan Raicu 《Concurrency and Computation》2016,28(1):44-69

This paper presents a convergence of distributed key‐value storage systems in clouds and supercomputers. It specifically presents ZHT, a zero‐hop distributed key‐value store system, which has been tuned for the requirements of high‐end computing systems. ZHT aims to be a building block for future distributed systems, such as parallel and distributed file systems, distributed job management systems, and parallel programming systems. ZHT has some important properties, such as being lightweight, dynamically allowing nodes join and leave, fault tolerant through replication, persistent, scalable, and supporting unconventional operations such as append, compare and swap, callback in addition to the traditional insert/lookup/remove. We have evaluated ZHT's performance under a variety of systems, ranging from a Linux cluster with 64 nodes, an Amazon EC2 virtual cluster up to 96 nodes, to an IBM Blue Gene/P supercomputer with 8K nodes. We compared ZHT against other key‐value stores and found it offers superior performance for the features and portability it supports. This paper also presents several real systems that have adopted ZHT, namely, FusionFS (a distributed file system), IStore (a storage system with erasure coding), MATRIX (distributed scheduling), Slurm++ (distributed HPC job launch), Fabriq (distributed message queue management); all of these real systems have been simplified because of key‐value storage systems and have been shown to outperform other leading systems by orders of magnitude in some cases. It is important to highlight that some of these systems are rooted in HPC systems from supercomputers, while others are rooted in clouds and ad hoc distributed systems; through our work, we have shown how versatile key‐value storage systems can be in such a variety of environments. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

16.

基于MPI的并行密码恢复框架

蒋璐瑾《计算机与现代化》2014,(7):142-145

现代信息系统中,使用密码保存加密数据是维护数据保密性的基本方法。当需要解密未知密钥的加密数据时,通常需要几百万甚至几百万亿次的解密操作。这些解密操作虽然计算复杂度较高,但是不同的解密操作相互独立而易于并行。所以,使用高性能计算机（HPC）可以减少数据解密的时间。本文设计一个基于MPI的并行密码恢复框架(P²RF),该框架把需要解密的数据和候选密钥在任务级分布到不同的计算节点上,计算节点再根据节点计算资源配置的不同把计算分布到计算资源上。实验结果表明：P²RF的扩展性随着节点的增多而线性扩展。相似文献

17.

New techniques for simulating high performance MPI applications on large storage networks 总被引：1，自引：0，他引：1

Alberto Núñez Javier Fernández Jose D. Garcia Félix Garcia Jesús Carretero 《The Journal of supercomputing》2010,51(1):40-57

In this work, we propose new techniques to analyze the behavior, the performance, and specially the scalability of High Performance Computing (in short, HPC) applications on different computing architectures. Our final objective is to test applications using a wide range of architectures (real or merely designed) and scaling it to any number of nodes or components. This paper presents a new simulation framework, called SIMCAN, for HPC architectures. The main characteristic of the proposed simulation framework is the ability to be configured for simulating a wide range of possible architectures that involve any number of components. SIMCAN is developed to simulate complete HPC architectures, but putting special emphasis on the storage and network subsystems. The SIMCAN framework can handle complete components (nodes, racks, switches, routers, etc.), but also key elements of the storage and network subsystems (disks, caches, sockets, file systems, schedulers, etc.). We also propose several methods to implement the behavior of HPC applications. Each method has its own advantages and drawbacks. In order to evaluate the possibilities and the accuracy of the SIMCAN framework, we have tested it by executing a HPC application called BIPS3D on a hardware-based computing cluster and on a modeled environment that represent the real cluster. We also checked the scalability of the application using this kind of architecture by simulating the same application with an increased number of computing nodes. 相似文献

18.

A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems

Ifeanyi P. Egwutuoha David Levy Bran Selic Shiping Chen 《The Journal of supercomputing》2013,65(3):1302-1326

In recent years, High Performance Computing (HPC) systems have been shifting from expensive massively parallel architectures to clusters of commodity PCs to take advantage of cost and performance benefits. Fault tolerance in such systems is a growing concern for long-running applications. In this paper, we briefly review the failure rates of HPC systems and also survey the fault tolerance approaches for HPC systems and issues with these approaches. Rollback-recovery techniques which are most often used for long-running applications on HPC clusters are discussed because they are widely used for long-running applications on HPC systems. Specifically, the feature requirements of rollback-recovery are discussed and a taxonomy is developed for over twenty popular checkpoint/restart solutions. The intent of this paper is to aid researchers in the domain as well as to facilitate development of new checkpointing solutions. 相似文献

19.

A taxonomy of task-based parallel programming technologies for high-performance computing

Peter Thoman Kiril Dichev Thomas Heller Roman Iakymchuk Xavier Aguilar Khalid Hasanov Philipp Gschwandtner Pierre Lemarinier Stefano Markidis Herbert Jordan Thomas Fahringer Kostas Katrinis Erwin Laure Dimitrios S. Nikolopoulos 《The Journal of supercomputing》2018,74(4):1422-1434

Task-based programming models for shared memory—such as Cilk Plus and OpenMP 3—are well established and documented. However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today. 相似文献

20.

User transparent data and task parallel multimedia computing with Pyxis-DT

Timo van Kessel Ben van Werkhoven Niels Drost Jason Maassen Henri E. Bal Frank J. Seinstra 《Future Generation Computer Systems》2013,29(8):2252-2261

The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of emerging MMCA problems, there is an urgent need to apply High Performance Computing (HPC) techniques. As most MMCA researchers are not also HPC experts, however, there is a demand for programming models and tools that are both efficient and easy to use. Existing user transparent parallelization tools generally use a data parallel approach in which data structures (e.g. video frames) are scattered among the available nodes in a compute cluster. For certain MMCA applications a data parallel approach induces intensive communication, however, which significantly decreases performance. In these situations, we can benefit from applying alternative approaches.We present Pyxis-DT, a user transparent parallel programming model for MMCA applications that employs both data and task parallelism. Hybrid parallel execution is obtained by run-time construction and execution of a task graph consisting of strictly defined building block operations. Results show that for realistic MMCA applications the concurrent use of data and task parallelism can significantly improve performance compared to using either approach in isolation. Extensions for GPU clusters are also presented. 相似文献