期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reference implementation of scalable I/O low-level API on Intel Paragon 总被引：1，自引：0，他引：1

SUN Ninghui 《计算机科学技术学报》1999,14(3):206-223

The Scalable I/O(SIO)Initiative‘s Low-Level Application Programming Interface(SIO LLAP)provides file system implementers with a simple low-Level interface to support high-level parallel /O interfaces efficiently and effectively.This paper describes a reference implementation and the evaluation of the SIO LLAPI on the Intel Paragon multicomputer.The implementation provides the file system structure and striping algorithm compatible with the Parallel File System(PFS)of Intel Paragon ,and runs either inside the kernel or as a user level library.The scatter-gather addressing read/write,asynchronous I/O,client caching and prefetching mechanism,file access hint mechanism,collective I/O and highly efficient file copy have been implemented.The preliminary experience shows that the SIO LLAPI provides opportunities of significant performance improvement and is easy to implement.Some high level file system interfaces and applications such as PFS,ADIO and Hartree-Fock application,are also implemented on top of SIO.The performance of PFS is at least the same as that of Intel‘s native pfs,and in many cases,such as small sequential file access,huge I/O requests and collective I/O,it is stable and much better,The SIO features help to support high level interfaces easily,quickly and more efficiently,and the cache,prefetching,hints are useful to get better performance based on different access models.The scalability and performance of SIO are limited by the network latency,network scalable bandwidth,memory copy bandwidth,memory size and pattern of I/O requests.The tadeoff between generality and efficienc should be considered in implementation. 相似文献

2.

Parallel Reservoir Integrated Simulation Platform for One Million Grid Blocks Cases

Feng Pan Jianwen Cao 《通讯和计算机》2005,2(11):29-33,42

This paper first provides a brief introduction to a numerical reservoir simulation and a parallel numerical reservoir integrated simulation platform developed by RDCPS （Research ＆ Development Center for Parallel Software, Institute of Software, Chinese Academy of Sciences）, which includes Pre-Processing, Simulator （for a Three-Dimensional ＆ Three-Phase Black-Oil models）, Post Processing, seamlessly integrated with parallel computers. We then present key technologies of the simulator, such as nonlinear and linear solvers, communications among processors, parallel I/O, etc., and corresponding content. Finally, some results of the platform to solve one million grid blocks cases from Chinese oil fields are given in the paper, which can show that the simulator has a very robust portability, high-speed for deadline and good scalability for the tested cases. As one of the application softwares, its objective is always focusing on satisfying deadlines of oil industry. Now, for one million grid blocks＇ case with 20 - 30 year-production, its elapsed time with 16 processors is less than 12 hours on parallel computers based on Myrinet or QsNet, namely ＇to submit a case just before off-duty and get its result just before on-duty＇. A decreasing line of elapsed time is given for a case with one million grid blocks. The developing trace of the simulator along with parallel computers can be also inferred from the line. 相似文献

3.

Optimal Partitioning and Granularity of Uniform Task Graphs

下载免费PDF全文

Zhang Zhongyun Li Guojie 《计算机科学技术学报》1991,6(2):185-194

Task partitioning is an important technique in parallel processing.In this paper,we investigate the optimal partitioning strategies and granularities of tasks with communications based on several models of parallel computer systems.Different from the usual approach,we study the optimal partitioning strategies and granularities from the viewpoint of minimizing T as well as minimizing NT^2,where N is the number of processors used and T is the program execution time using N processors.Our results show that the optimal partitioning strategies for all cases discussed in this paper are the same--either to assign all tasks to one processor or to distribute them among the processors as equally as possible depending only on the functions of ratio of running time to communication time R/C. 相似文献

4.

Predicting the behavior of large scale P2P systems by parallel discrete event simulation

ZHENG WeiMin YU HongLiang SHI GuangYu & CHEN Jian 《中国科学:信息科学(英文版)》2010,(6):1109-1121

P2P systems are becoming the dominator of Internet.Such systems are typically composed of thousands to millions of physical computers,which make it difficult to predict their behaviors without a large scale distributed system simulator.This paper is an attempt to predict the behavior of large scale P2P systems by building a novel parallel simulator:AegeanSim,which provides parallel discrete event simulation of such systems on high performance server clusters.We abstract the execution of P2P applications wit... 相似文献

5.

Implementation of GAMMA on a Massively Parallel Computer 总被引：1，自引：0，他引：1

下载免费PDF全文

Huang Linpeng Tong Weiqin Kam Wing Ng Sun Yongqiang 《计算机科学技术学报》1997,12(1):29-39

The GAMMA paradigm is recently proposed by Banatre and Metayer to describe the systematic construction of parallel programs without introducing artificial sequentiality.This paper presents two synchronous execution models for GAMMA and discusses how to implement them on MasPar MP-1,a massively data parallel computer.The results show that GAMMA paradign can be implemented very naturally on data parallel machines,and very high level language,such as GAMMA in which parallelism is left implicit,is suitable for specifying massively parallel applications. 相似文献

6.

Open-MP与并行程序设计 总被引：1，自引：0，他引：1

陈《计算机科学》2003,30(11):133-135

The application programming interface Open-MP for the shared memory parallel computer system and its characteristics are illustrated. We also compare Open-MP with parallel programming tool MPI.To overcome the disadvantage of large overhead in Open-MP program,several optimization methods in Open-MP programming are presented to increase the efficiency of its execution. 相似文献

7.

Progress and Challenges in High Performance Computer Technology

下载免费PDF全文

Xue-Jun Yang Yong Dou and Qing-Feng Hu 《计算机科学技术学报》2006,21(5):674-681

High performance computers provide strategic computing power in the construction of national economy and defense, and become one of symbols of the country＇s overall strength. Over 30 years, with the supports of governments, the technology of high performance computers is in the process of rapid development, during which the computing performance increases nearly 3 million times and the processors number expands over 10 hundred thousands times. To solve the critical issues related with parallel efficiency and scalability, scientific researchers pursued extensive theoretical studies and technical innovations. The paper briefly looks back the course of building high performance computer systems both at home and abroad, and summarizes the significant breakthroughs of international high performance computer technology. We also overview the technology progress of China in the area of parallel computer architecture, parallel operating system and resource management, parallel compiler and performance optimization, environment for parallel programming and network computing. Finally, we examine the challenging issues, ＂memory wall＂, system scalability and ＂power wall＂, and discuss the issues of high productivity computers, which is the trend in building next generation high performance computers. 相似文献

8.

基于LogP简化模型的矩阵求逆并行算法研究

陈天麒曾庆华孙世新《计算机科学》2003,30(8):176-177

LogP is becoming a practical parallel computation model that meets the demanding of parallel computers and parallel algorithms. So it is important to re-design parallel algorithms on the LogP model. This paper studies the parallel algorithm of computing converse matrix on the simplified LogP model, and gets the simulating results. 相似文献

9.

Runtime系统综述

张宏莉胡铭曾方滨兴《计算机科学》1999,26(6):25-28

Runtime systems play an important role in parallel programming and parallel compilation. In this paper,goals and key techniques of runtime systems are presented. And some experiences and its trend are given in the end. 相似文献

10.

Shared Variable Oriented Parallel Precompiler for SPMD Model

下载免费PDF全文

Kang Jichang Zhu Yi''an Hong Yuanlin Ying Bishan 《计算机科学技术学报》1995,10(5):476-480

For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique. 相似文献

11.

大屏幕投影系统中基于软件的无缝拼接技术 总被引：1，自引：0，他引：1

俞凌云王毅刚王亢《计算机仿真》2009,26(5)

为了克服现有大多数投影系统价格昂贵,投影形式单一,需要专业人员维护的局限性,提出了基于软件的无缝拼接技术.在开放源码的三维绘制引擎OpenSeeneGmph基础上,通过对大屏幕系统二个关键技术(基于集群机的分布式图形绘制及其同步控制技术、几何校准、颜色亮度校准)的研究,结合大屏幕投影的应用特点,提出了一套利用普通投影机和具有普通图形卡的PC机实现的大屏幕无缝拼接的三维渲染软件平台.实践证明该方法效果理想,成本投入低,易于操作. 相似文献

12.

更实际的并行算法的设计

寿标李晓峰《计算机研究与发展》1996,33(6):445-449

大规模并行计算机的出现和发展迫切要求有新的并行处设计理论和技术来指导更实际的并行算法的设计。本文首先简单介绍了针对ＭＰＣ提出孤ＬｏｇＰ和Ｂａｒｒｉｅｒ－ＬｏｇＰ并行计算模型，然后借助于Ｂａｒｒｉｅｒ－ＬｏｇＰ模型从通信平衡、数据分配和重叠通信与计算这三个方面讨论了更实际的并行算法设计的一般方法和技巧。相似文献

13.

Parallelism in multigrid methods: How much is too much?

Lesley R. Matheson Robert E. Tarjan 《International journal of parallel programming》1996,24(5):397-432

Multigrid methods are powerful techniques to accelerate the solution of computationally-intensive problems arising in a broad range of applications. Used in conjunction with iterative processes for solving partial differential equations, multigrid methods speed up iterative methods by moving the computation from the original mesh covering the problem domain through a series of coarser meshes. But this hierarchical structure leaves domain-parallel versions of the standard multigrid algorithms with a deficiency of parallelism on coarser grids. To compensate, several parallel multigrid strategies with more parallelism, but also more work, have been designed. We examine these parallel strategies and compare them to simpler standard algorithms to try to determine which techniques are more efficient and practical. We consider three parallel multigrid strategies: (1) domain-parallel versions of the standard V-cycle and F-cycle algorithms; (2) a multiple coarse grid algorithm, proposed by Fredrickson and McBryan, which generates several coarse grids for each fine grid; and (3) two Rosendale algorithm, which allow computation on all grids simultaneously. We study an elliptic model problem on simple domains, discretized with finite difference techniques on block-structured meshes in two or three dimensions with up to 10⁶ or 10⁹ points, respectively. We analyze performance using three models of parallel computation: the PRAM and two bridging models. The bridging models reflect the salient characteristics of two kinds of parallel computers: SIMD fine-grain computers, which contain a large number of small (bitserial) processors, and SPMD medium-grain computers, which have a more modest number of powerful (single chip) processors. Our analysis suggests that the standard algorithms are substantially more efficient than algorithms utilizing either parallel strategy. Both parallel strategies need too much extra work to compensate for their extra parallelism. They require a highly impractical number of processors to be competitive with simpler, standard algorithms. The analysis also suggests that the F-cycle, with the appropriate optimization techniques, is more efficient than the V-cycle under a broad range of problem, implementation, and machine characteristics, despite the fact that it exhibits even less parallelism than the V-cycle. Research at Princeton University partially supported by the National Science Foundation, Grant No. CCR-8920505, and the Office of Naval Research, Contract No. N0014-91-J-1463. 相似文献

14.

Parallel models of computation: an introductory survey

M. Leoncini 《Calcolo》1989,26(2-4):209-236

The paper gives an overview of some models of computation which have proved successful in laying a foundation for a general theory of parallel computation. We present three models of parallel computation, namelyboolean andarithmetic circuit families, andParallel Random Access Machines. They represent different viewpoints on parallel computing: boolean circuit families are useful for in-depth theoretical studies on the power and limitations of parallel computers; Parallel Random Access Machines are the most general vehicles for designing highly parallel algorithms; arithmetic circuit families are an important tool for undertaking studies related to one of the most active areas in parallel computing, i.e. parallel algebraic complexity. 相似文献

15.

并行遗传算法在非均衡负载节点并行机上的实现

陈前李星《计算机工程与应用》2000,36(9):55-57

基于平衡负载、减小通信开销的考虑,对于非均衡负载节点并行机提出了两种并行遗传算法一动态负载平衡的孤岛模型和主从模型,并与基本的孤岛模型做了比较。两种算法在实际使用中均取得了较好的效果。相似文献

16.

更实际的并行计算模型 总被引：7，自引：0，他引：7

陈国良《小型微型计算机系统》1995,16(2):1-9

过去所报导的大量并行算法在小规模的并行机上均运行得很好，然而将其移植到大规模并行机上运行时性能却很差。原因之一就是并行计算模型（如ＰＲＡＭ）过于抽象，略去了一些诸如通信、同步等算法运行时不可忽略的因素。本文介绍目前所提出的几个较能反映近代并行机性能的更为实际的并行计算模型，包括异步ＰＲＡＭ，ＢＳＰ，ｌｏｇＰ和Ｃ３模型等。当然这些模型在与真实并行机吻合的程度、可使用性和分析较复杂算法时的可操作性等方面尚存异议，但是它们的确打开了研究并行计其模型的新途径，成为当今并行算法研究的热点之一。相似文献

17.

Parallel processing of the radiosity method

Derek Paddon Alan Chalmers 《Computer aided design》1994,26(12):917-927

The radiosity method gives one of the best solutions for synthetizing realistic images. However, the method is also the most computationally expensive. Using parallel computers will cut the time required to solve this problem provided that care is taken in the design of the system. Various models of parallel computing are explored for both the gather radiosity method and the progressive refinement radiosity method. 相似文献

18.

Models of parallel computation :a survey and classification 总被引：5，自引：1，他引：5

Zhang Yunquan Chen Guoliang Sun Guangzhong Miao Qiankun 《Frontiers of Computer Science in China》2007,1(2):156-165

In this paper, the state-of-the-art parallel computational model research is reviewed. We will introduce various models that were developed during the past decades. According to their targeting architecture features, especially memory organization, we classify these parallel computational models into three generations. These models and their characteristics are discussed based on three generations classification. We believe that with the ever increasing speed gap between the CPU and memory systems, incorporating non-uniform memory hierarchy into computational models will become unavoidable. With the emergence of multi-core CPUs, the parallelism hierarchy of current computing platforms becomes more and more complicated. Describing this complicated parallelism hierarchy in future computational models becomes more and more important. A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity, thus allowing more complicated models with more parameters to be adopted. Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research. 相似文献

19.

Implementation of an ADI Method on parallel computers

Raad A. Fatoohi Chester E. Grosch 《Journal of scientific computing》1987,2(2):175-193

In this paper we discuss the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented. 相似文献

20.

LogP模型上的最优播送与求和算法的实现

寿标陈国良《软件学报》1997,8(1):22-28

与以往的各种并行计算模型相比，LogP模型更真实、更全面地反映了大规模并行计算机MPC(massivelyparalelcomputers)的特征．鉴于目前见到的LogP模型上的算法都仅局限于给出设计思想，本文尝试用算法语言来描述LogP模型上的完整的可移植算法．文中针对单项播送与求和这2类基本问题，实现了它们在有任意参数的LogP模型上的最优算法，并对其时间复杂度进行了分析.本文研究得到国家863高科技项目基金资助. 相似文献