期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

pRPL + pGTIOL: The marriage of a parallel processing library and a parallel I/O library for big raster data

《Environmental Modelling & Software》2017

Data I/O has become a major bottleneck of computational performance of geospatial analysis and modeling. In this study, a parallel GeoTIFF I/O library (pGTIOL) was developed. Through the storage mapping and data arrangement techniques, pGTIOL can operate on files in either strip or tile storage mode, read/write any sub-domain of data within the raster dataset. pGTIOL enables asynchronized I/O, which means a process can read/write its own sub-domains of data when necessary without synchronizing with other processes. pGTIOL was integrated into the parallel raster processing library (pRPL). Several pGTIOL-based data I/O functions and options were added to pRPL, while the existing functions of pRPL stay intact. Experiments showed that the integration of pRPL and pGTIOL achieved higher performance than the original pRPL that uses GDAL as the I/O interface. Therefore, pRPL + pGTIOL enables transparent parallelism for high-performance raster processing with the capability of true parallel I/O of massive raster datasets. 相似文献

2.

Orthogonal striping and mirroring in distributed RAID forI/O-centric cluster computing

Kai Hwang Hai Jin Ho R.S.C. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(1):26-44

This paper presents a new distributed disk-array architecture for achieving high I/O performance in scalable cluster computing. In a serverless cluster of computers, all distributed local disks can be integrated as a distributed-software redundant array of independent disks (ds-RAID) with a single I/O space. We report the new RAID-x design and its benchmark performance results. The advantage of RAID-x comes mainly from its orthogonal striping and mirroring (OSM) architecture. The bandwidth is enhanced with distributed striping across local and remote disks, while the reliability comes from orthogonal mirroring on local disks at the background. Our RAID-x design is experimentally compared with the RAID-5, RAID-10, and chained-declustering RAID through benchmarking on a research Linux cluster at USC. Andrew and Bonnie benchmark results are reported on all four disk-array architectures. Cooperative disk drivers and Linux extensions are developed to enable not only the single I/O space, but also the shared virtual memory and global file hierarchy. We reveal the effects of traffic rate and stripe unit size on I/O performance. Through scalability and overhead analysis, we find the strength of RAID-x in three areas: 1) improved aggregate I/O bandwidth especially for parallel writes, 2) orthogonal mirroring with low software overhead, and 3) enhanced scalability in cluster I/O processing. Architectural strengths and weakness of all four ds-RAID architectures are evaluated comparatively. The optimal choice among them depends on parallel read/write performance desired, the level of fault tolerance required, and the cost-effectiveness in specific I/O processing applications 相似文献

3.

The performance of disk arrays in shared-memory database machines

Randy H. Katz Wei Hong 《Distributed and Parallel Databases》1993,1(2):167-198

Disk arrays and shared-memory multiprocessors are new technologies that are rapidly becoming pervasive. They are complementary because disk arrays naturally balance the I/O workload by interleaving data across all disks while a shared-memory multiprocessor balances the processing workload across multiple processors. In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metricdata temperature (IO/s/Gbyte) as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley. 相似文献

4.

Dynamic algorithm selection for runtime concepts

Peter Pirkelbauer Sean Parent 《Science of Computer Programming》2010,75(9):773-786

A key benefit of generic programming is its support for producing modules with clean separation. In particular, generic algorithms are written to work with a wide variety of types without requiring modifications to them. The Runtime concept idiom extends this support by allowing unmodified concrete types to behave in a runtime polymorphic manner. In this paper, we describe one implementation of the runtime concept idiom, in the domain of the C++ standard template library (STL). We complement the runtime concept idiom with an algorithm library that considers both type and concept information to maximize performance when selecting algorithm implementations. We present two implementations, one in ISO C++ and one using an experimental language extension. We use our implementations to describe and measure the performance of runtime-polymorphic analogs of several STL algorithms. The tests demonstrate the effects of different compile-time vs. run-time algorithm selection choices. 相似文献

5.

Improving multimedia systems performance using constant-density recording disks

Philip Kwok Chung Tse Clement H.C. Leung 《Multimedia Systems》2000,8(1):47-56

Multimedia systems store and retrieve large amounts of data which require extremely high disk bandwidth and their performance critically depends on the efficiency of disk storage. However, existing magnetic disks are designed for small amounts of data retrievals geared to traditional operations; with speed improvements mainly focused on how to reduce seek time and rotational latency. When the same mechanism is applied to multimedia systems, overheads in disk I/O can result in dramatic deterioration in system performance. In this paper, we present a mathematical model to evaluate the performance of constant-density recording disks, and use this model to analyze quantitatively the performance of multimedia data request streams. We show that high disk throughput may be achieved by suitably adjusting the relevant parameters. In addition to demonstrating quantitatively that constant-density recording disks perform significantly better than traditional disks for multimedia data storage, a novel disk-partitioning scheme which places data according to their bandwidths is presented. 相似文献

6.

A performance analysis of an object-based I/O architecture in a video server environment

Khoa D. Huynh Taghi M. Khoshgoftaar 《Multimedia Systems》1995,3(4):162-177

In this paper, we present a performance analysis of how effective video server applications can be supported on personal computers (PCs) connected through a local area network (LAN). We considered both the standard 16-Mbit/s token ring and a 100-Mbit/s token ring, which follows closely the specifications for the Fiber Distributed Data Interface (FDDI). We examined three I/O architectures for a PC-based video server: an interrupt-driven I/O architecture, a peer-to-peer I/O architecture, and a concurrent, object-based I/O architecture that we proposed. The video server must support multiple MPEG-1 video streams at the same time to multiple clients on the LAN. We found that the network protocol layers require a lot of processing power, and that an implementation of our proposed I/O architecture, which takes advantage of the available power of the host processor to off-load I/O adapters, can deliver much better performance, and is more cost-effective, than other I/O architectures in a video server environment. 相似文献

7.

核外计算中的几种I/O优化方法 总被引：1，自引：0，他引：1

唐剑琪方滨兴胡铭曾王威《计算机研究与发展》2005,42(10):1820-1825

大数据量应用问题引入核外计算模式,由于访问磁盘数据的速度比较慢,I／O成为核外计算性能重要的限制因素．提出了一种使用运行库进行I／O优化的方法,给出了3种有效的优化策略：规则区域筛选、数据预取和边缘重用．编程人员可针对不同的应用问题使用相应的优化API来缩短程序执行时间．实验结果表明,通过减少I／O操作次数和内外存交换的数据量以及隐藏部分I／O操作延迟,有效提高了核外计算的性能．相似文献

8.

Automated tuning of parallel I/O systems: an approach to portableI/O performance for scientific applications

Ying Chen Winslett M. 《IEEE transactions on pattern analysis and machine intelligence》2000,26(4):362-383

相似文献

9.

Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform

Roberto R. Expósito Guillermo L. Taboada Sabela Ramos Jorge González-Domínguez Juan Touriño Ramón Doallo 《Journal of Grid Computing》2013,11(4):613-631

Cloud computing is currently being explored by the scientific community to assess its suitability for High Performance Computing (HPC) environments. In this novel paradigm, compute and storage resources, as well as applications, can be dynamically provisioned on a pay-per-use basis. This paper presents a thorough evaluation of the I/O storage subsystem using the Amazon EC2 Cluster Compute platform and the recent High I/O instance type, to determine its suitability for I/O-intensive applications. The evaluation has been carried out at different layers using representative benchmarks in order to evaluate the low-level cloud storage devices available in Amazon EC2, ephemeral disks and Elastic Block Store (EBS) volumes, both on local and distributed file systems. In addition, several I/O interfaces (POSIX, MPI-IO and HDF5) commonly used by scientific workloads have also been assessed. Furthermore, the scalability of a representative parallel I/O code has also been analyzed at the application level, taking into account both performance and cost metrics. The analysis of the experimental results has shown that available cloud storage devices can have different performance characteristics and usage constraints. Our comprehensive evaluation can help scientists to increase significantly (up to several times) the performance of I/O-intensive applications in Amazon EC2 cloud. An example of optimal configuration that can maximize I/O performance in this cloud is the use of a RAID 0 of 2 ephemeral disks, TCP with 9,000 bytes MTU, NFS async and MPI-IO on the High I/O instance type, which provides ephemeral disks backed by Solid State Drive (SSD) technology. 相似文献

10.

A scalable bandwidth guaranteed distributed continuous media file system using network attached autonomous disks

Akinlar C. Mukherjee S. 《Multimedia, IEEE Transactions on》2003,5(1):71-96

Repository for continuous media data differs from that of the traditional text-based data both in storage space and streaming bandwidth requirements. The file systems used for continuous media streams need to support large volumes and high bandwidth. We propose a scalable distributed continuous media file system built using autonomous disks. Autonomous disks are attached directly to the network and are able to perform lightweight processing. We discuss different ways to realize the autonomous disk, and describe a prototype implementation on a Linux platform using PC-based hardware. We present the basic requirements of the continuous media file system and present the design methodology and a prototype Linux-based implementation of the distributed file system that supports the requirements. We present experimental results on the performance of the proposed file system prototyped using autonomous disks. We show that the performance of the file system scales linearly with the number of disks and the number of clients. The file system performs much superior to NFS running on the same hardware platform and can deliver higher raw disk bandwidth to the applications. We also present bandwidth and time sensitive read/write procedures for the file system and show that the file system can provide strict bandwidth guarantees for continuous media streams. 相似文献

11.

中医脉象采集系统软件设计与实现

陈友斌杨建华杜新虎《测控技术》2008,27(11)

介绍一种基于USB总线的中医脉象采集系统的软件设计与实现方法。该软件包括设备固件、设备驱动程序、动态链接库和用户程序。设备驱动程序的数据传输部分采用数据块分解传输方法,支持用户自定义大小的I/O请求,为用户程序提供更加灵活的设计方法。用户程序最终在LabWindows/CVI平台上开发成功,实际运行表明,该软件设计满足脉象采集系统的要求。相似文献

12.

Tuning a parallel database algorithm on a shared-memory multiprocessor

Goetz Graefe Shreekant S. Thakkar 《Software》1992,22(7):495-517

Database query processing can benefit significantly from parallelism. Parallel database algorithms combine substantial CPU and I/O activity, memory requirements, and massive data exchange between processes, all of which must be considered to obtain optimal performance. Since parallel external sorting is a very typical example, we have focused on sorting to tune Volcano, a new query processing system. The purpose of the Volcano project is to provide efficient, extensible tools for query and request processing in novel application domains, particularly in object-oriented and scientific database systems, and for experimental database performance research. It includes all query processing algorithms conventionally used in relational database systems as well as several new ones, and can execute all of them in parallel. In this article, we present Volcano's parallel external sorting algorithm and a sequence of enhancements to improve its performance. We obtained very good absolute performance, 84 seconds for 100 MB of data, as well as near-linear speedup with sixteen CPUs and disks. Furthermore, these results were achieved on a shared-memory machine despite the common belief that parallel query processing is best implemented on distributed-memory systems. We detail our tuning measures and report on their effectiveness. 相似文献

13.

I/O issues in a multimedia system

Narasimha Reddy A.L. Wyllie J.C. 《Computer》1994,27(3):69-74

In future computer system design, I/O systems will have to support continuous media such as video and audio, whose system demands are different from those of data such as text. Multimedia computing requires us to focus on designing I/O systems that can handle real-time demands. Video- and audio-stream playback and teleconferencing are real-time applications with different I/O demands. We primarily consider playback applications which require guaranteed real-time I/O throughput. In a multimedia server, different service phases of a real-time request are disk, small computer systems interface (SCSI) bus, and processor scheduling. Additional service might be needed if the request must be satisfied across a local area network. We restrict ourselves to the support provided at the server, with special emphasis on two service phases: disk scheduling and SCSI bus contention. When requests have to be satisfied within deadlines, traditional real-time systems use scheduling algorithms such as earliest deadline first (EDF) and least slack time first. However, EDF makes the assumption that disks are preemptable, and the seek-time overheads of its strict real-time scheduling result in poor disk utilization. We can provide the constant data rate necessary for real-time requests in various ways that require trade-offs. We analyze how trade-offs that involve buffer space affect the performance of scheduling policies. We also show that deferred deadlines, which increase buffer requirements, improve system performance significantly 相似文献

14.

A general interprocedural framework for placement of split-phaselarge latency operations

Agrawal G. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(4):394-413

Overlapping split-phase large latency operations with computations is a standard technique for improving performance on modern architectures. In this paper, we present a general interprocedural technique for overlapping such accesses with computation. We have developed an Interprocedural Balanced Code Placement (IBCP) framework, which performs analysis on arbitrary recursive procedures and arbitrary control flow and replaces synchronous operations with a balanced pair of asynchronous operations. We have evaluated this scheme in the context of overlapping I/O operations with computation. We demonstrate how this analysis is useful for applications which perform frequent and large accesses to disks, including applications which snapshot or checkpoint their computations or out-of-core applications 相似文献

15.

剖析高性能存储系统中的存储区域网络

夏芳陈虹宋磊张侠《计算机工程与设计》2005,26(7):1740-1743

高性能存储系统（High Performance StorageSystem,简称HPSS）是专门为高性能计算机环境设计、管理和访问超大规模数据的层次化并行存储系统,可以在高性能计算机、磁盘、网络磁盘阵列、磁带库之间迁移大型的数据对象,支持高效的串、并行输入／输出及远程数据并行传输,数据传输速度只受底层计算机、网络和存储设备的限制。对HPSS以网络为中心体系结构进行了深入的研究与剖析,描述了目前存储区域网络（SAN）技术在HPSS中的应用情况,得出有助于构建高性能计算环境存储系统的看法和结论。相似文献

16.

SPIFFI-a scalable parallel file system for the Intel Paragon

Freedman C.S. Burger J. DeWitt D.J. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(11):1185-1200

This paper presents the design and performance of SPIFFI, a scalable high-performance parallel file system intended for use by extremely I/O intensive applications including “Grand Challenge” scientific applications and multimedia systems. This paper contains experimental results from a SPIFFI prototype on a 64 node/64 disk Intel Paragon. The results show that SPIFFI provides high performance and linear scaleup on real hardware. The paper also explains how shared file pointers (i.e., file pointers that are shared by multiple processes) can simplify the design of a parallel application. By sequentializing I/O accesses and by providing dynamic I/O load balancing, a shared file pointer may even improve an application's performance. This paper also presents the predictions of a SPIFFI simulator that we validated using the prototype. The simulator results show that SPIFFI continues to provide high performance even when it is scaled to configurations with as many as 128 disks or 256 compute nodes 相似文献

17.

基于WebGL 的工程制图网络虚拟模型库的开发

李兴田张丽萍《图学学报》2016,37(6):836

以AutoCAD 为建模工具,对工程制图中的组合体进行建模,通过分析其STL 格式的导出文件,利用WebGL 技术实现了模型数据文件的解析读取、着色器程序的编制和鼠标交互处理程序的添加,研究开发了运行于浏览器的工程制图模型库。该模型库模型丰富、交互友好, 实现了服务器端一次部署,客户端随处浏览,方便了教师课堂教学,培养了学生的空间想象力, 提高了学习效率,为虚拟模型库的研发提供了思路。相似文献

18.

NVMRA: utilizing NVM to improve the random write operations for NAND‐flash‐based mobile devices

下载免费PDF全文

Renhai Chen Zhaoyan Shen Chenlin Ma Zili Shao Yong Guan 《Software》2016,46(9):1263-1284

NAND flash memory has become the major storage media in mobile devices, such as smartphones. However, the random write operations of NAND flash memory heavily affect the I/O performance, thus seriously degrading the application performance in mobile devices. The main reason for slow random write operations is the out‐of‐place update feature of NAND flash memory. Newly emerged non‐volatile memory, such as phase‐change memory, spin transfer torque, supports in‐place updates and presents much better I/O performance than that of flash memory. All these good features make non‐volatile memory (NVM) as a promising solution to improve the random write performance for NAND flash memory. In this paper, we propose a non‐volatile memory for random access (NVMRA) scheme to utilize NVM to improve the I/O performance in mobile devices. NVMRA exploits the I/O behaviors of applications to improve the random write performance for each application. Based on different I/O behaviors, such as random write‐dominant I/O behavior, NVMRA adopts different storing decisions. The scheme is evaluated on a real Android 4.2 platform. The experimental results show that the proposed scheme can effectively improve the I/O performance and reduce the I/O energy consumption for mobile devices. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

19.

Performance analysis of advanced I/O architectures for PC-based video servers

Khoa D. Huynh Taghi M. Khoshgoftaar 《Multimedia Systems》1994,2(1):36-50

In the personal computing and workstation environments, more and more I/O adapters are becoming complete functional subsystems that are intelligent enough to handle I/O operations on their own without much intervention from the host processor. The IBM subsystem control block (SCB) architecture has been defined to enhance the potential of these intelligent adapters by defining services and conventions that deliver command information and data to and from the adapters. In recent years, a new storage architecture, the redundant array of independent disks (RAID), has been quickly gaining acceptance in the world of computing. In this paper, we discuss and present a performance analysis of the SCB architecture and disk array technology in typical video server environments. In particular, we would like to see whether a disk array can outperform a group of disks (of the same type, the same data capacity, and same cost) operating independently (not in parallel as in a disk array) in a video server environment where most disk I/O operations are large sequential reads. 相似文献

20.

Inverted file partitioning schemes in multiple disk systems 总被引：1，自引：0，他引：1

Byeong-Soo Jeong Omiecinski E. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(2):142-153

Multiple-disk I/O systems (disk arrays) have been an attractive approach to meet high performance I/O demands in data intensive applications such as information retrieval systems. When we partition and distribute files across multiple disks to exploit the potential for I/O parallelism, a balanced I/O workload distribution becomes important for good performance. Naturally, the performance of a parallel information retrieval system using an inverted file structure is affected by the partitioning scheme of the inverted file. In this paper, we propose two different partitioning schemes for an inverted file system for a shared-everything multiprocessor machine with multiple disks. We study the performance of these schemes by simulation under a number of workloads where the term frequencies in the documents are varied, the term frequencies in the queries are varied, the number of disks are varied and the multiprogramming level is varied 相似文献