首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Process migration provides many benefits for parallel environments including dynamic load balancing, data access locality or fault tolerance. This paper describes an in-memory application-level checkpoint-based migration solution for MPI codes that uses the Hierarchical Data Format 5 (HDF5) to write the checkpoint files. The main features of the proposed solution are transparency for the user, achieved through the use of CPPC (ComPiler for Portable Checkpointing); portability, as the application-level approach makes the solution adequate for any MPI implementation and operating system, and the use of the HDF5 file format enables the restart on different architectures; and high performance, by saving the checkpoint files to memory instead of to disk through the use of the HDF5 in-memory files. Experimental results prove that the in-memory approach reduces significantly the I/O cost of the migration process.  相似文献   

2.
杨丽鹏  车永刚 《计算机应用》2013,33(9):2423-2427
大规模计算流体动力学(CFD)计算对数据I/O能力提出了很高需求。层次式文件格式(HDF5)可有效管理大规模科学数据,并对并行I/O具有良好的支持。针对结构网格CFD并行程序,设计了其数据文件的HDF5存储模式,并基于HDF5并行I/O编程接口实现了其数据文件的并行I/O,在并行计算机系统上进行了性能测试与分析。结果表明,在使用4~32个进程时,基于HDF5并行I/O方式的写文件性能比每进程独立写普通文件的方式高6.9~16.1倍;基于HDF5并行I/O方式的读文件性能不及后者,为后者的20%~70%,但是读文件的时间开销远小于写文件的时间开销,因此对总体性能的影响较小。  相似文献   

3.
基于Erasure Code的分布式文件存储系统   总被引:1,自引:0,他引:1       下载免费PDF全文
在局域网环境下,实现一种基于Erasure Code的分布式文件存储系统。该系统由元数据服务器和多个文件存储节点组成,通过对元数据与文件数据分离存储以提高文件访问效率,将Erasure Code有效冗余存储技术应用于文件编解码以增强可靠性,采用MD5消息摘要技术保证文件完整性。对30 MB~600 MB大小的文件测试结果表明,该系统具有更高的可靠性、安全性以及资源利用率。  相似文献   

4.

With the fast increase of multimedia contents, efficient forensics investigation methods for multimedia files have been required. In multimedia files, the similarity means that the identical media (audio and video) data are existing among multimedia files. This paper proposes an efficient multimedia file forensics system based on file similarity search of video contents. The proposed system needs two key techniques. First is a media-aware information detection technique. The first critical step for the similarity search is to find the meaningful keyframes or key sequences in the shots through a multimedia file, in order to recognize altered files from the same source file. Second is a video fingerprint-based technique (VFB) for file similarity search. The byte for byte comparison is an inefficient similarity searching method for large files such as multimedia. The VFB technique is an efficient method to extract video features from the large multimedia files. It also provides an independent media-aware identification method for detecting alterations to the source video file (e.g., frame rates, resolutions, and formats, etc.). In this paper, we focus on two key challenges: to generate robust video fingerprints by finding meaningful boundaries of a multimedia file, and to measure video similarity by using fingerprint-based matching. Our evaluation shows that the proposed system is possible to apply to realistic multimedia file forensics tools.

  相似文献   

5.
KYLIN-2是中国核动力研究设计院自主研发的先进中子学栅格(组件)计算软件,针对KYLIN-2软件中海量数据存储和处理的问题,提出一种基于对分层数据存储格式v5(HDF5)的计算数据存储方案。首先,对HDF5文件格式进行了研究;其次,根据KYLIN-2软件需求,设计了基于HDF5文件格式的组件库KYMRES;最后,通过自开发的HDF5文件读写工具完成了KYMRES库的实现过程。通过性能测试表明,基于HDF5文件格式的组件库KYMRES较常规存储方案具有更高的I/O效率,其读、写效率平均提升到旧算法的2.3倍和4.5倍。KYMRES库在海量数据存储和处理方面具有显著优越性,为KYLIN-2软件提供了一种新型的数据存储和管理方案。  相似文献   

6.
在等离子体动力学、电磁学理论等物理问题的数值模拟中,各类数值模拟程序产生了大量复杂结构的科学数据.一方面,计算程序需要以高效率的I/O方式存储数据,另一方面,数据需要在各类程序间很容易地交换与共享.随着数据的规模与复杂度不断增加,传统数据管理方式的局限性日益突出.为此,设计了面向计算物理领域的数据存储模型--数值模拟网格数据模型(JAD),引入元数据管理机制,对数值模拟程序数据对象进行抽象与封装,在HDF5软件库基础上实现了高层I/O函数库(JADLib),集成先进的数据存储技术,提供直观、易用的应用程序编程接口(API),使得数值模拟数据以统一格式高效率地存储.目前,JADLib已推广应用于高功率微波、惯性约束聚变等领域多个数值模拟程序中,与元数据管理系统(JADIS)、并行可视化系统(JaVis)建立了耦合,使得用户可以直接利用这些系统进行数据的浏览、分析及可视化,促进了应用程序间的数据共享.  相似文献   

7.
目前存在的网页防篡改和自动恢复技术主要有三种:时间轮询技术,事件触发技术+核心内嵌技术和文件过滤技术+事件触发技术。这三种方式都是对目标文件进行监控,当目标文件被篡改时就从备份文件中还原出原文件,但是却没有对备份文件做保护,如果备份文件被破坏,则无法正常的恢复原网页文件,所以在文件过滤+事件触发技术的基础上,研究了如何去使用MD5校验,DES加密,文件重命名等三种方式去保护备份文件的安全性。  相似文献   

8.
9.
In recent years data grids have been deployed and grown in many scientific experiments and data centers. The deployment of such environments has allowed grid users to gain access to a large number of distributed data. Data replication is a key issue in a data grid and should be applied intelligently because it reduces data access time and bandwidth consumption for each grid site. Therefore this area will be very challenging as well as providing much scope for improvement. In this paper, we introduce a new dynamic data replication algorithm named Popular File Group Replication, PFGR which is based on three assumptions: first, users in a grid site (Virtual Organization) have similar interests in files and second, they have the temporal locality of file accesses and third, all files are read-only. Based on file access history and first assumption, PFGR builds a connectivity graph for a group of dependent files in each grid site and replicates the most popular group files to the requester grid site. After that, when a user of that grid site needs some files, they are available locally. The simulation results show that our algorithm increases performance by minimizing the mean job execution time and bandwidth consumption and avoids unnecessary replication.  相似文献   

10.
Several meetings of the Extremely Large Databases Community for large scale scientific applications advocate the use of multidimensional arrays as the appropriate model for representing scientific databases. Scientific databases gradually grow to massive sizes of the order of terabytes and petabytes. As such, the storage of such databases require efficient dynamic storage schemes where the array is allowed to arbitrarily extend the bounds of the dimensions. Conventional multidimensional array representations cannot extend or shrink their bounds without relocating elements of the data-set. In general, extendibility of the bounds of the dimensions, is limited to only one dimension. This paper presents a technique for storing dense multidimensional arrays by chunks such that the array can be extended along any dimension without compromising the access time for an element. This is done with a computed access mapping function, that maps the k-dimensional index onto a linear index of the storage locations. This concept forms the basis for the implementation of an array file of any number of dimensions, where the bounds of the array dimension can be extended arbitrarily. Such a feature currently exists in the Hierarchical Data Format version 5 (HDF5). However, extending the bound of a dimension in the HDF5 array file can be unusually expensive in time. Such extensions, in our storage scheme for dense array files, can still be performed while still accessing elements of the array at orders of magnitude faster than in HDF5 or conventional arrays-files. We also present theoretical and experimental analysis of our scheme with respect to access time and storage overhead. Such mapping scheme can be readily integrated into existing PGAS models for parallel processing in a cluster networked computing environment.  相似文献   

11.
A middleware is proposed to optimize file fetch process in transparent computing (TC) platform. A single TC server will receive file requests of large scale distributed operating systems, applications or user data from multiple clients. In consideration of limited size of server’s memory and the dependency among files, this work proposes a middleware to provide a file fetch sequence satisfying: (1) each client, upon receiving any file, is able to directly load it without waiting for pre-required files (i.e. “receive and load”); and (2) the server is able to achieve optimization in reducing overall file fetch time cost. The paper firstly addresses the features of valid file fetch sequence generating problem in the middleware. The method solves the concurrency control problem when the file fetch is required for the multiple clients. Then it explores the methods to determine time cost for file fetch sequence. Based on the established model, we propose a heuristic and greedy (HG) algorithm. According to the simulation results, we conclude that HG algorithm is able to reduce overall file fetch time roughly by 50% in the best cases compared with the time cost of traditional approaches.  相似文献   

12.
13.
This paper discusses collection, analysis and interpretation of data pertaining to files in personal computer (PC) environments. We developed programs to collect and analyze data from PCs running the OS/21 operating system and using the High Performance File System (HPFS). The data collection program gathers the information about file sizes, the times and dates of file creation, the last file access, and the last file update by scanning the contents of disk storage devices. The gathered information is used to analyze the distributions of file sizes, functional file lifetimes, and functional lifetimes of file's data. The analysis shows that: most files are small (more than 60% of files on a system are smaller than 8 Kbytes), about 60% of files on a system have never been accessed again after being created and very few files are ever modified. Recommended by: N. Boudriga  相似文献   

14.
Essential requirements for the use of computers in mineral exploration are: (1) an adaptable storage-retrieval system, (2) adequate data files, (3) a minimum establishment in terms of staff and organization, (4) integration of computer filing systems with paper filing systems so that they complement one another, and (5) understanding by users of the application of computer techniques to geological information.Data files applicable to mineral exploration fall into three categories: Field-Data files, Bibliographic-Index files, and Mineral-Deposit files. An increasing proportion of data files used by exploration companies will be acquired from government organizations and service agencies.Computer techniques must be used selectively. They become more applicable as: (1) projects last longer, (2) the amount of exploration work per unit area increases, and (3) the amount of previously generated information becomes greater.Limitations include: (1) nonapplicability of computer techniques to some types of exploration, (2) expense of file creation, (3) short-time span of many exploration projects, and (4) need to maintain a balance between a computer facility and the exploration department it is designed to serve.Causes of disappointment include preoccupation with statistical techniques at the expense of storage and retrieval, the nonselective application of computers, and failure to attain a minimum organizational establishment.Although individual computer-based approaches may lead directly to exploration targets which would not have been detected without computers, the essential advantage of a computer facility lies in its ability to improve the performance of each exploration geologist by increasing the quantity and quality of available data.  相似文献   

15.
Grid‐based simulation usually involves large quantities of data at each stage of the simulation process. These data include simulation input and output files, intermediate results files, log and error files, associated metadata, and information capturing the processes that generate the data. The question of how to effectively store and manage data files within a Grid computing environment is increasingly becoming an important issue. This paper illustrates how we built a lightweight e‐Science infrastructure for data management within a Grid computing environment, including the integration of data curation activities into the entire Grid‐based simulation process. Rather than focusing on specific implementation details, we aim to identify the key issues and research challenges, describing how various existing technologies and tools can be best integrated to address these requirements and challenges. Although the case of quantum mechanical simulation of materials properties is used in the paper, much of the discussion is as generic as possible so that approaches, methods and practice (e.g. integrated approach, workflow taxonomy and development approach, simple but useful semantic annotation approach) can be applied to wider domains and disciplines to facilitat the digital research. A comparison between our approach and Cloud computing, and lessons learned in data management within the Grid computing environment, are also presented. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

16.
This paper reviews the different types of data files and methods of storing and retrieving information using sequential access and random access files. A data management program that contains specifications on 300 robot models is used to show how to: (1) create a new data file, (2) add information to an existing file, (3) modify records within a file, (4) display information from a data file, and (5) to analyze data from individual records.  相似文献   

17.
调用.NET Framework提供的安全类命名空间,在VB.NET下,开发了一个能够计算任意文件的MD5哈希值的程序。经测试,在计算超过200MB以上的大文件时,也能够达到较快的速度。计算结果可保存在同名文本文件中,可应用于软件版权保护。  相似文献   

18.
随着互联网的发展,存储多媒体文件的场景日益增多,云存储系统成为了业界焦点,很多云存储系统为应用程序提供数据存储、查询和计算服务.许多应用程序拥有大量重复的体积较小的多媒体文件,传统的分布式文件系统已不能满足多媒体文件对存储性能的需求,它们通常会把多媒体文件分块存储在不同的存储服务器上,每次获取文件内容都需要从多个块结点把数据获取到,然后重构文件内容给应用程序,导致访问这类体积较小的多媒体文件会消耗更多的资源.为了解决数据冗余度高的应用程序在低存储成本条件下对多媒体文件进行分布式存储的问题,同时为了提高这类应用程序的运行效率,首先提出一种分布式存储目录建模方案,可以用来描述数据中心的存储目录逻辑结构;然后实现了一个多媒体文件云存储系统MFCSS,支持数据去冗和存储目录逻辑扩容.分析和实验结果表明:MFCSS系统在保存数据冗余度较高的多媒体文件时具有良好的性能,可以有效提高磁盘的存储效率,同时具备良好的扩展性,可以简化应用程序管理分布式存储环境中多媒体文件的过程.  相似文献   

19.
File Integrity Analyzers serve as a component of an Intrusion Detection environment by performing filesystem inspections to verify the content of security-critical files in order to detect suspicious modification. Existing file integrity frameworks exhibit single point-of-failure exposures. The Collaborative Object Notification Framework for Insider Defense using Autonomous Network Transactions (CONFIDANT) framework aims at fail-safe and trusted detection of unauthorized modifications to executable, data, and configuration files. In this paper, an IDS architecture taxonomy is proposed to classify and compare CONFIDANT with existing frameworks. The CONFIDANT file integrity verification framework is then defined and evaluated. CONFIDANT utilizes three echelons of agent interaction and four autonomous behaviors. Sensor agents in the lowest echelon comprise the sensor level to generate an assured report to companion agents of computed MD5 file digests. At the control level, beacon agents verify file integrity based on the digests from sensor-level agents assembled over time. Upper echelon transactions occur at the response level. Here watchdog behavior agents dispatch probe agents to implement the alarm signaling protocol. CONFIDANT has been implemented in the Concordia agent environment to evaluate and refine its agent behaviors. Evaluation shows that CONFIDANT mitigates single point-of-failure exposures that are present in existing frameworks. Supported in-part by National Security Agency subcontract MDA904-99-C-2642  相似文献   

20.
In most large computer installations files are moved between on-line disk and mass storage (tape, integrated mass storage device) either automatically by the system and/or at the direction of the user. In this paper we present and analyze long term file reference data in order to develop a basis for the construction of algorithms for file migration. Specifically, we examine the use of the on-line user (primarily text editor) data sets at the Stanford Linear Accelerator Center (SLAC) computer installation through the analysis of 13 months of file reference data. We find that most files are used very few times. Of those that are used sufficiently frequently that their reference patterns may be examined, we find that: 1) about a third show declining rates of reference during their lifetime, 2) of the remainder, very few (about 5 percent) show correlated interreference intervals, and 3) interreference intervals (in days) appear to be more skewed than would occur with the Bernoulli process. Thus, about two-thirds of all suffi1ciently active files appear to be referenced as a renewal process with a skewed interreference distribution. A large number of other file reference statistics (file lifetimes, interference distributions, moments, means, number of uses/ file, file sizes, file rates of reference, etc.) are computed and presented. Throughout, statistical tests are described and explained. The results of our analysis of file reference patterns are applied in a companion paper to the development and comparative evaluation of file migration algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号