期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张灿峰周海芳《计算机工程与科学》2010,32(9):34-38

本文针对遥感图像IHS、HPF、DWT等典型的像素级融合算法,提出并实现了相应的基于数据并行的并行融合算法P-IHS、P-HPF、P-DWT,并在算法时空复杂度分析的基础上进行了通信、I/O优化。针对IKONOS卫星遥感图像在机群系统上的测试结果表明,我们提出的并行算法可获得良好的并行加速比,并行效率较高。这三类算法适合于对实时性要求比较高的遥感应用领域。相似文献

2.

I/O受限的并行加速比模型与可扩展I/O体系结构

李琼杜云飞杨学军《计算机工程与科学》2011,33(3):28

为了缓解I/O瓶颈问题,可以从应用程序、可扩展算法、编译器和语言、运行时库、操作系统和体系结构六方面展开研究。其中,I/O体系结构是所有技术途径的关键支撑。当前并行I/O性能分析缺乏科学的理论模型为I/O体系结构设计提供理论依据。本文针对并行计算机系统的可扩展性问题,研究了I/O负载对并行计算机系统可扩展性的影响,建立了I/O受限的并行加速比性能模型,对目前大规模并行计算机系统中三种常用I/O体系结构的可扩展性进行了分析;以此为理论依据,提出了一种面向高性能计算的可扩展并行I/O系统结构。同时,还提出了几种有效降低I/O操作服务时间的策略,从而达到增强系统可扩展性的目的,为后续研究奠定了基础。相似文献

3.

基于强化学习的智能I/O调度算法

下载免费PDF全文

李琼郭御风蒋艳凰《计算机工程与科学》2010,32(7):58-61

利用机器学习方法解决存储领域中若干技术难题是目前存储领域的研究热点之一。强化学习作为一种以环境反馈作为输入、自适应环境的特殊的机器学习方法,能通过观测环境状态的变化,评估控制决策对系统性能的影响来选择最优的控制策略,基于强化学习的智能RAID控制技术具有重要的研究价值。本文针对高性能计算应用特点,将机器学习领域中的强化学习技术引入RAID控制器中,提出了基于强化学习的智能I/O调度算法RL-scheduler,利用Q-学习策略实现了面向并行应用的自治调度策略。RL-scheduler综合考虑了调度的公平性、磁盘寻道时间和MPI应用的I/O访问效率,并提出多Q-表交叉组织方法提高Q-表的更新效率。实验结果表明,RL-scheduler缩短了并行应用的平均I/O服务时间,提高了大规模并行计算系统的I/O吞吐率。相似文献

4.

Airshed Pollution Modeling in an HPF Style Environment

《Journal of Parallel and Distributed Computing》2000,60(6):690-715

In this paper, we describe our experience with developing Airshed, a large pollution modeling application, in the Fx programming environment. We demonstrate that high level parallel programming languages like Fx and High Performance Fortran offer a simple and attractive model for developing portable and efficient parallel applications. Performance results are presented for the Airshed application executing on Intel Paragon and Cray T3D and T3E parallel computers. The results demonstrate that the application is “performance portable,” i.e., it achieves good and consistent performance across different architectures, and that the performance can be explained and predicted using a simple model for the communication and computation phases in the program. We also show how task parallelism was used to alleviate I/O related bottlenecks, an important consideration in many applications. Finally, we demonstrate how external parallel modules developed using different parallelization methods can be integrated in a relatively simple and flexible way with modules developed in the Fx compiler framework. Overall, our experience demonstrates that a high level parallel programming environment based on a language like HPF is suitable for developing complex multidisciplinary applications. 相似文献

5.

Heuristics for scheduling I/O operations 总被引：1，自引：0，他引：1

Jain R. Somalwar K. Werth J. Browne J.C. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(3):310-320

The I/O bottleneck in parallel computer systems has recently begun receiving increasing interest. Most attention has focused on improving the performance of I/O devices using fairly low level parallelism in techniques such as disk striping and interleaving. Widely applicable solutions, however, will require an integrated approach which addresses the problem at multiple system levels, including applications, systems software, and architecture. We propose that within the context of such an integrated approach, scheduling parallel I/O operations will become increasingly attractive and can potentially provide substantial performance benefits. We describe a simple I/O scheduling problem and present approximate algorithms for its solution. The costs of using these algorithms in terms of execution time, and the benefits in terms of reduced time to complete a batch of I/O operations, are compared with the situations in which no scheduling is used, and in which an optimal scheduling algorithm is used. The comparison is performed both theoretically and experimentally. We have found that, in exchange for a small execution time overhead, the approximate scheduling algorithms can provide substantial improvements in I/O completion times 相似文献

6.

SPIFFI-a scalable parallel file system for the Intel Paragon

Freedman C.S. Burger J. DeWitt D.J. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(11):1185-1200

This paper presents the design and performance of SPIFFI, a scalable high-performance parallel file system intended for use by extremely I/O intensive applications including “Grand Challenge” scientific applications and multimedia systems. This paper contains experimental results from a SPIFFI prototype on a 64 node/64 disk Intel Paragon. The results show that SPIFFI provides high performance and linear scaleup on real hardware. The paper also explains how shared file pointers (i.e., file pointers that are shared by multiple processes) can simplify the design of a parallel application. By sequentializing I/O accesses and by providing dynamic I/O load balancing, a shared file pointer may even improve an application's performance. This paper also presents the predictions of a SPIFFI simulator that we validated using the prototype. The simulator results show that SPIFFI continues to provide high performance even when it is scaled to configurations with as many as 128 disks or 256 compute nodes 相似文献

7.

Accelerating big data analytics on HPC clusters using two-level storage

《Parallel Computing》2017

Data-intensive applications that are inherently I/O bound have become a major workload on traditional high-performance computing (HPC) clusters. Simply employing data-intensive computing storage such as HDFS or using parallel file systems available on HPC clusters to serve such applications incurs performance and scalability issues. In this paper, we present a novel two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We build a two-level storage system prototype with Tachyon and OrangeFS, and analyze the resulting I/O throughput for typical MapReduce operations. Theoretical modeling and experiments show that the proposed two-level storage delivers higher aggregate I/O throughput than HDFS and OrangeFS and achieves scalable performance for both read and write. We expect this two-level storage approach to provide insights on system design for big data analytics on HPC clusters. 相似文献

8.

Automated tuning of parallel I/O systems: an approach to portableI/O performance for scientific applications

Ying Chen Winslett M. 《IEEE transactions on pattern analysis and machine intelligence》2000,26(4):362-383

相似文献

9.

Cluster环境下p—HPF编译器支持的并行计算范式 总被引：2，自引：0，他引：2

胡长军余华山丁文魁许卓群《计算机研究与发展》2001,38(8):954-959

p-HPF是研制的一个符合HPF（high performance Fortran)规范的并行编译系统,以HPF为核心实现多范式并行计算是开发大型并行应用系统的基础。首先论述了Cluster环境下的并行运行范式,包括farm parallel范式、流水线并行、流循环并行、基于数据并行和组合数据并行等,抽象分析了它们的性能,接着给出了利用p-HPF的外部过程机制、任务并行机制以以FORALL,INDEPENDENT DO等典型并行语句实现几种典型并行范式的方法,给出了实例程序,对实例进行了实际运行并对运行结果进行了分析。相似文献

10.

Inverted file partitioning schemes in multiple disk systems 总被引：1，自引：0，他引：1

Byeong-Soo Jeong Omiecinski E. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(2):142-153

Multiple-disk I/O systems (disk arrays) have been an attractive approach to meet high performance I/O demands in data intensive applications such as information retrieval systems. When we partition and distribute files across multiple disks to exploit the potential for I/O parallelism, a balanced I/O workload distribution becomes important for good performance. Naturally, the performance of a parallel information retrieval system using an inverted file structure is affected by the partitioning scheme of the inverted file. In this paper, we propose two different partitioning schemes for an inverted file system for a shared-everything multiprocessor machine with multiple disks. We study the performance of these schemes by simulation under a number of workloads where the term frequencies in the documents are varied, the term frequencies in the queries are varied, the number of disks are varied and the multiprogramming level is varied 相似文献

11.

Models of parallel applications with large computation and I/Orequirements

Rosti E. Serazzi G. Smirni E. Squillante M.S. 《IEEE transactions on pattern analysis and machine intelligence》2002,28(3):286-307

A fundamental understanding of the interplay between computation and I/O activities in parallel applications that manipulate huge amounts of data is critical to achieving good application performance, as well as correctly characterizing the workloads of large-scale high-performance parallel systems. We present a formal model of the behavior of CPU and I/O interactions in scientific applications, from which we derive various formulas that characterize application performance. Our model captures the I/O and CPU activity at different levels of granularity, where results from the model are shown to be in excellent agreement with measurement data from a set of I/O-intensive applications. Using the formulas from our model, which explicitly take I/O activity into account, we also present examples of possible applications of the model 相似文献

12.

Enabling dynamic file I/O path selection at runtime for parallel file system

Xiuqiao Li Limin Xiao Meikang Qiu Bin Dong Li Ruan 《The Journal of supercomputing》2014,68(2):996-1021

Parallel file systems are experiencing more and more applications from various fields. Various applications have different I/O workload characteristics, which have diverse requirements on accessing storage resources. However, parallel file systems often adopt the “one-size-fits-all” solution, which fails to meet specific application needs and hinders the full exploitation of potential performance. This paper presents a framework to enable dynamic file I/O path selection with fine granularity at runtime. The framework adopts a file handle-rich scheme to allow file systems choose corresponding optimizations to serve I/O requests. Consistency control algorithms are proposed to ensure data consistency while changing optimizations at runtime. One case study on our prototype shows that choosing proper optimizations can improve the I/O performance for small files and large files by up to 40 and 64.4 %, respectively. Another case study shows that the data prefetch performance for real-world application traces can be improved by up to 193 % by selecting correct prefetch patterns. Simulations in large-scale environment also show that our method is scalable and both the memory consumption and the consistency control overhead can be negligible. 相似文献

13.

p-HPF并行编译系统核外计算的实现及优化策略 总被引：4，自引：0，他引：4

丁文魁汪剑平向华李晓明许卓群《计算机学报》1999,22(10):1042-1049

文中阐述了ｐ－ＨＰＦ编译系统中对核外计算的支持以及采取的优化策略,通过对编程模型的扩充和并行Ｉ／Ｏ模型的构造,ｐ－ＨＰＦ编译系统已能对核外数组进行有效的处理。相似文献

14.

基于p-HPF Exrinsic过程调用的并行应用模版

胡长军陆爱胜姜伟许卓群《计算机工程与应用》2001,37(3):30-34

Ｅｘｔｒｉｎｓｉｃ是ＨＰＦ中用来调用外部语言过程的机制。利用ＨＰＦＥｘｔｒｉｎｓｉｃ机制可以实现多范例并行计算,文章首先给出ｐ－ＨＰＦ并行编译器中Ｅｘｔｒｉｎｓｉｃ过程调用的支持方法,然后给出几种在分布内存的网络环境下,基于Ｅｘｔｒｉｎｓｉｃ的并行应用模版,它们是并行算法库应用模版、协同应用模版、ＭＰＳＤ处理应用模版、异步Ｉ／Ｏ应用模版和流水线应用模版。并分析了它们的运行效率,给出了ｐ－ＨＰＦ实现方法。相似文献

15.

The Alloc Stream Facility: a redesign of application-level streamI/O

Krieger O. Stumm M. Unrau R. 《Computer》1994,27(3):75-82

The authors introduce an application-level I/O facility, the Alloc Stream Facility, that addresses three primary goals. First, ASF addresses recent computing substrate changes to improve performance, allowing applications to benefit from specific features such as mapped files. Second, it is designed for parallel systems, maximizing concurrency and reporting errors properly. Finally, its modular and object-oriented structure allows it to support a variety of popular I/O interfaces (including stdio and C++ stream I/O) and to be tuned to system behavior, exploiting a system's strengths while avoiding its weaknesses. On a number of standard Unix systems, I/O-intensive applications perform substantially better when linked to the Alloc facility. Also, modifying applications to use a new interface provided by the facility can improve performance by another factor of two. These performance improvements are achieved primarily by reducing data copying and the number of system calls. Not visible in these improvements is the extra degree of concurrency the facility brings to multithreaded and parallel applications 相似文献

16.

Integrated performance models for SPMD applications and MIMD architectures

Cremonesi P. Gennaro C. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(12):1320-1332

This paper introduces queuing network models for the performance analysis of SPMD applications executed on general-purpose parallel architectures such as MIMD and clusters of workstations. The models are based on the pattern of computation, communication, and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e., number of processors, number of disks, I/O topology, etc). 相似文献

17.

p—HPF支持多范例并行计算的并行编译技术 总被引：1，自引：1，他引：0

胡长军余华山姜伟陆爱胜许卓群《计算机学报》2001,24(7):685-693

多范例并行是大规模并行应用系统的本质特征,实现p－HPF对多范例并行计算的编译支持不仅可以弥补数据并行示例本身的一些缺点,而且可以提高并行应用系统的效率,文中在论述cluster环境下Global,Local,Serial三种典型并行计算模型的基础上,给出了实现p－HPF对三种模型的典型代表F77＋MPI,ScaLAPACK调用的并行编译技术,包括参数重分布技术、存储转换技术、全局与局部信息交换技术以及局部数组参数的上下界处理技术等,给出了调用实例并分析了实现技术的正确性和有效性。相似文献

18.

可伸缩分布共享大规模并行I／O系统设计

李琼郭御风庞征斌刘光明《计算机工程与科学》2006,28(1):135-138

如何有效地解决I／O瓶颈问题，一直是高性能并行计算机有待研究解决的关键技术。我们提出了一种可伸缩分布共享并行I／O系统方案，并自行研制了结点控制器芯片和路由器芯片，研制了原型系统SDSP604。为实现系统的计算、通讯和I／O性能随着系统规模均衡扩展的目标，该系统基于CC-NUMA系统结构，采用了合理的分布共享并行I／O系统结构。相似文献

19.

Storage wall for exascale supercomputing

Wei Hu Guang-ming Liu Qiong Li Yan-huang Jiang Gui-lin Cai 《浙江大学学报:C卷英文版》2016,17(11):1154-1175

The mismatch between compute performance and I/O performance has long been a stumbling block as supercomputers evolve from petaflops to exaflops. Currently, many parallel applications are I/O intensive, and their overall running times are typically limited by I/O performance. To quantify the I/O performance bottleneck and highlight the significance of achieving scalable performance in peta/exascale supercomputing, in this paper, we introduce for the first time a formal definition of the ‘storage wall’ from the perspective of parallel application scalability. We quantify the effects of the storage bottleneck by providing a storage-bounded speedup, defining the storage wall quantitatively, presenting existence theorems for the storage wall, and classifying the system architectures depending on I/O performance variation. We analyze and extrapolate the existence of the storage wall by experiments on Tianhe-1A and case studies on Jaguar. These results provide insights on how to alleviate the storage wall bottleneck in system design and achieve hardware/software optimizations in peta/exascale supercomputing. 相似文献

20.

Performance Evaluation of a Parallel Pipeline Computational Model for Space-Time Adaptive Processing

Wei-Keng Liao Alok Choudhary Donald Weiner Pramod Varshney 《The Journal of supercomputing》2005,31(2):137-160

This paper presents further results on the design and implementation of various optimizations based on our earlier work of developing a parallel pipelined model for the computational intensive applications that have multiple processing tasks. Performance evaluation of this model was done by using a real-time airborne radar application that employs a Space-Time Adaptive Processing (STAP) algorithm. This paper focuses on the following four issues: (1) The tradeoffs between increasing the throughput and reducing the latency are examined in more detail when allocating processors among different processing tasks. (2) A multi-threaded design is incorporated into the pipeline model and implemented on a massively parallel computer with symmetric multi-processor nodes, which shows enhanced performance. (3) The disk I/O is incorporated into the parallel pipeline to study its effect on performance in which two I/O task designs have been implemented: embedding I/O in the pipeline or having a separate I/O task. By using a double buffering approach together with the asynchronous I/O, the overall pipeline performance scales well as the number of processors increases. (4) From the comparison of the two I/O implementations, it is discovered that the latency may be improved when merging multiple tasks into a single task. The effect of reorganizing the task structure of the pipeline is discussed in detail. All the performance results shown in this work demonstrate the linear scalability the parallel pipeline model can achieve using a production radar application. Although this paper focuses on the implementation of the parallel pipeline model and uses the results from a STAP application to support the claims of the discovered properties for this pipeline, this model is also applicable to many other types of applications with similar computational characteristics. 相似文献