共查询到20条相似文献,搜索用时 15 毫秒
1.
Scientific data analysis and visualization have become the key component for nowadays large scale simulations. Due to the rapidly increasing data volume and awkward I/O pattern among high structured files, known serial methods/tools cannot scale well and usually lead to poor performance over traditional architectures. In this paper, we propose a new framework: ParSA (parallel scientific data analysis) for high-throughput and scalable scientific analysis, with distributed file system. ParSA presents the optimization strategies for grouping and splitting logical units to utilize distributed I/O property of distributed file system, scheduling the distribution of block replicas to reduce network reading, as well as to maximize overlapping the data reading, processing, and transferring during computation. Besides, ParSA provides the similar interfaces as the NetCDF Operator (NCO), which is used in most of climate data diagnostic packages, making it easy to use this framework. We utilize ParSA to accelerate well-known analysis methods for climate models on Hadoop Distributed File System (HDFS). Experimental results demonstrate the high efficiency and scalability of ParSA, getting the maximum 1.3 GB/s throughput on a six nodes Hadoop cluster with five disks per node. Yet, it can only get 392 MB/s throughput on a RAID-6 storage node. 相似文献
2.
针对分布式文件系统中由于数据块随机放置而导致的服务器利用率低、能耗管理复杂的问题,建立了数据块访问特征向量模型描述用户对数据块的随机访问,运用K-means算法对数据块进行聚类计算,根据计算结果将数据节点划分为多个区域以存储不同聚类簇的数据块,在系统负载较低时进行数据块动态重配置,关闭不必要节点达到节能的目的。为使得策略适用于对能耗和资源利用率有不同要求的场景,算法中聚类簇间隔参数可灵活设置。实验通过和冷热区划分算法进行比较表明:按照聚类结果进行数据块重配置后,能耗节省效率优于冷热区划分算法,节省能耗35%~38%。 相似文献
3.
Using a central file server is good for interactive access to files, because of the coherency implied by a centralized design. In fact, within local area networks, this is a common case. However, distributed environments in use today may exhibit round‐trip times on the order of 50 or 100 ms. This is a problem for interactive file access to a central file server because of the resulting access times. Although aggressive caching and loosely synchronized replicas may be used for distributed file access, there are cases where the better coherency provided by a central server is still desirable. In this paper, we present ZX, a distributed file system and protocol designed with latency in mind. It can use caching, but it does not require caching or batching to address latency issues. ZX relies on a novel channel‐based file system interface. It includes find requests and leverages streaming requests to work well under high‐latency conditions. Unlike other protocols designed for distributed access to a central server, ZX tolerates round‐trip times on the order of 50 or 100 ms to access a central file server for interactive usage such as compiling shared sources, running binaries, editing documents, and other similar workloads. It can be used on UNIX using a FUSE adaptor while permitting native ZX speakers to run faster. 相似文献
4.
Rendering is a crucial process in the production of computer generated animation movies. It executes a computer program to transform 3D models into series of still images, which will eventually be sequenced into a movie. Due to the size and complexity of 3D models, rendering process becomes a tedious, time-consuming and unproductive task on a single machine. Accordingly, animation rendering is commonly carried out in a distributed computing environment where numerous computers execute in parallel to speedup the rendering process. In accordance with distribution of computing, data dissemination to all computers also needs certain mechanisms which allow large 3D models to be efficiently moved to those distributed computers to ensure the reduction of time and cost in animation production. This paper presents and evaluates BitTorrent file system (BTFS) for improving the communication performance of distributed animation rendering. The BTFS provides an efficient, secure and transparent distributed file system which decouples the applications from complicated communication mechanism. By having data disseminated in a peer-to-peer manner and using local cache, rendering time can be reduced. Its performance comparison with a production-grade 3D animation favorably shows that the BTFS outperforms traditional distributed file systems by more than 3 times in our test configuration. 相似文献
5.
6.
This paper describes an implementation and performance evaluation of different deadlock prevention algorithms. A deadlock prevention algorithm ensures that deadlock will never happen. The algorithms for deadlock prevention are proposed and implemented in a locally distributed system. A number of experiments were executed in a distributed system for various lengths of file operation and different numbers of files. The performance of the system and of each algorithm is evaluated and discussed. Some general results are derived for a single-host and a distributed system. 相似文献
7.
A. Calderón F. García-Carballeira L. M. Sánchez J. D. García J. Fernandez 《The Journal of supercomputing》2009,47(3):312-334
Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage
devices. This approach increases the performance and scalability of the system, but a fault in one single node can stop the
whole system. To avoid this problem, data must be stored using some kind of redundant technique, so any data stored in a faulty
element can be recovered. Fault tolerance can be provided in I/O systems by using replication or RAID based schemes. However,
most of the current systems apply the same technique for all files in the system.
This paper describes the fault tolerance support provided by Expand, a parallel file system based on standard servers. This
support can be applied to other parallel file systems with many benefices: fault tolerance at file level, flexible definition
of fault tolerance scheme to be used, possibility to change the fault tolerant support used for a file, etc.
相似文献
A. CalderónEmail: |
8.
9.
随着移动数据存储的发展,嵌入式USB主机系统得到了广泛应用,因此在一些嵌入式USB主机系统中实现文件系统显得非常必要和关键.首先分析FAT16文件系统的结构原理和基于USB主机接口的嵌入式文件系统的层次化设计方法,并在此基础上分别按层次对文件系统的实现提出了一些优化方法,这些优化方法明显提高了文件系统的实现效率和性能,以满足实时性要求高的一些数据采集的应用场合. 相似文献
10.
《International Journal of Parallel, Emergent and Distributed Systems》2013,28(5):407-433
Due to the explosive growth in the size of scientific data-sets, data-intensive computing and analysing are an emerging trend in computational science. In these applications, data pre-processing is widely adopted because it can optimise the data layout or format beforehand to facilitate the future data access. On the other hand, current research shows an increasing popularity of MapReduce framework for large-scale data processing. However, the data access patterns which are generally applied to scientific data-set are not supported by current MapReduce framework directly. This gap motivates us to provide support for these scientific data access patterns in MapReduce framework. In our work, we study the data access patterns in matrix files and propose a new concentric data layout solution to facilitate matrix data access and analysis in MapReduce framework. Concentric data layout is a data layout which maintains the dimensional property in chunk level. Contrary to the continuous data layout adopted in the current Hadoop framework, concentric data layout stores the data from the same sub-matrix into one chunk. This layout can guarantee that the average performance of data access is optimal regardless of the various access patterns. The concentric data layout requires reorganising the data before it is being analysed or processed. Our experiments are launched on a real-world halo-finding application; the results indicate that the concentric data layout improves the overall performance by up to 38%. 相似文献
11.
Journaling file systems, which are widely used in modern operating systems, guarantee file system consistency and data integrity by logging file system updates to a journal, which is a reserved space on the storage, before the updates are written to the data storage. Such journal writes increase the write traffic to the storage and thus degrade the file system performance, especially in full data journaling, which logs both metadata and data updates. In this paper, a new journaling approach is proposed to eliminate journal writes in server virtualization environments, which are gaining in popularity in server platforms. Based on reliable hardware subsystems and virtual machine monitor (VMM), the proposed approach eliminates journal writes by retaining journal data (i.e. logged file system updates) in the memory of each virtual machine and ensuring the integrity of these journal data through cooperation between the journaling file systems and the VMM. We implement the proposed approach in Linux ext3 in the Xen virtualization environment. According to the performance results, a performance improvement of up to 50.9journaling approach of ext3 due to journal write elimination. In metadata‐write dominated workloads, this approach could even outperform the metadata journaling approaches of ext3, which do not guarantee data integrity. These results demonstrate that, on virtual servers with reliable VMM and hardware subsystems, the proposed approach is an effective alternative to traditional journaling approaches. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献
12.
传统的版本文件系统一般针对单个文件的历史信息进行管理.在实际应用中,用户需要管理的版本信息不仅仅是针对一个文件,比如许多用户都是在一个大的工程中进行设计,用户往往需要对整个工程的所有文件进行版本管理.为此,需要研究基于应用特点的版本文件系统AVFS.对AVFS进行研究,重点讨论了其系统结构和实现技术. 相似文献
13.
14.
Performance isolation is highly desirable in cloud platforms where the virtual disks of virtual machines are simply large files on the shared and networked storage servers. However, existing isolation techniques cannot deal with the implications of the file system used by the networked storage servers, such that underlying resource usage is unpredictable (eg, the delayed write-back mechanism could postpone writes, and the journaling mechanism could amplify writes). The lack of visibility on underlying resource usage leads to the predicament of being unable to meet isolation goals. In this paper, we present a software-defined file system (SDFS) that exploits the underlying file system to allocate resources at per-image-file granularity and provide tenants with guaranteed throughput. The SDFS comprises two components: control plane and data plane. At the control plane, we provide a set of system calls to document tenant performance requirements into the metadata of image files. At the data plane, we construct a file-based scheduler to manage memory and disk resources according to the tenant performance requirements. The SDFS design does not require a modification to guest operating systems, hypervisors, or file server protocols. Through a prototype implementation, we demonstrate that the SDFS can meet isolation goals and increase resource utilization with negligible overhead. 相似文献
15.
新型存储器件的I/O性能通常比传统固态驱动器(SSD)高一个数量级,然而使用新型存储器件的分布式文件系统相对于使用SSD的分布式文件系统性能并没有显著的提高,这说明目前的分布式文件系统并不能充分发挥新型存储器件的性能。针对这个问题,对Hadoop分布式文件系统(HDFS)的数据写入流程及传输过程进行了量化分析。通过量化分析HDFS数据写入过程各阶段的时间开销,发现在写入数据的各个阶段中,节点间数据传输的时间占比较大。因此提出了对应的优化方案,通过异步写入的方式并行化数据传输与处理过程,使得不同数据包的处理阶段叠加起来,减少了数据包整体的处理时间,从而提升了HDFS的写入性能。实验结果表明,所提方案将HDFS的写入吞吐量提升了15%~24%,总体的写入执行时间降低了28%~36%。 相似文献
16.
A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers
Bin Dong Xiuqiao LiAuthor Vitae Qimeng WuAuthor VitaeLimin Xiao Li RuanAuthor Vitae 《Journal of Parallel and Distributed Computing》2012
Many solutions have been proposed to tackle the load imbalance issue of parallel file systems. However, all these solutions either adopt centralized algorithms, or lack considerations for both the network transmission and the tradeoff between benefits and side-effects of each dynamic file migration. Therefore, existing solutions will be prohibitively inefficient in large-scale parallel file systems. To address this problem, this paper presents SALB, a dynamic and adaptive load balancing algorithm which is totally based on a distributed architecture. To be also aware of the network transmission, SALB on the one hand adopts an adaptively adjusted load collection threshold in order to reduce the message exchanges for load collection, and on the other hand it employs an on-line load prediction model with a view to reducing the decision delay caused by the network transmission latency. Moreover, SALB employs an optimization model for selecting the migration candidates so as to balance the benefits and the side-effects of each dynamic file migration. Extensive experiments are conducted to prove the effectiveness of SALB. The results show that SALB achieves an optimal performance not only on the mean response time but also on the resource utilization among the schemes for comparison. The simulation results also indicate that SALB is able to deliver high scalability. 相似文献
17.
该文介绍了一种基于文件系统过滤驱动的涉密文档安全存储系统的总体结构及工作原理,阐述了基于文件系统过滤驱动技术实现文件加解密和文件防扩散功能的基本方法和过程。 相似文献
18.
Alireza Poshtkohi M.B. Ghaznavi-Ghoushchi 《Parallel Computing》2011,37(2):114-136
DotGrid platform is a Grid infrastructure integrated with a set of open and standard protocols recently implemented on the top of Microsoft .NET in Windows and MONO .NET in UNIX/Linux. DotGrid infrastructure along with its proposed protocols provides a right and solid approach to targeting other platforms, e.g., the native C/C++ runtime. In this paper, we propose a new concurrent file transfer protocol called DotDFS as a high-throughput distributed file transfer component for DotGrid. DotDFS introduces some open binary protocols for efficient file transfers on current Grid infrastructures. DotDFS protocol also provides mechanisms for multiple file streams to gain high-throughput file transfer similar to GridFTP protocol, but by proposing and implementing a new parallel TCP connection-oriented paradigm. Almost no research work has been conducted to suggest a concurrent file transfer protocol that simultaneously employs threaded and event-driven models in the protocol level. Due to our knowledge, DotDFS is the first concurrent file transfer protocol that, from this viewpoint, presents a new computing paradigm in the field of data transmission protocols. In our LAN tests, we have achieved better results than Globus GridFTP implementation particularly in multiple TCP streams and directory tree transfers. Our LAN experiences in memory-to-memory tests show that DotDFS accesses to the 94% bottleneck bandwidth while GridFTP is accessing 91%. In LAN disk-to-disk tests, comparing DotDFS protocol with GridFTP protocol unveils a set of interesting and technical problems in GridFTP for both the nature of the protocol and its implementation by Globus. In the WAN experimental studies, we propose a new idea for analytical modeling of file transfer protocols like DotDFS inspired by sampling, experimentation and mathematical interpolation approaches. The cross-platform and open standard-based features of DotDFS provide a substantial framework for unifying data access and resource sharing in real heterogeneous Grid environments. 相似文献
19.
"软件即服务(SaaS)"是一种通过网络发布与使用软件的新模式,在很大程度上消除了用户购买、维护与升级应用程序的需要,被认为是软件未来的主流应用模式之一.本文提出了一种新的支持现有Windows桌面软件的SaaS模式并实现了其原型系统Cloudow:用户可以在任意的联网兼容计算机上按需运行现有的Windows软件(无需安装),且软件的个性化配置可以被保留以便下次使用时恢复.Cloudow使用用户层虚拟化技术解决了软件无需安装便能运行的问题,并通过用户层文件系统设计实现了软件在网络环境下的透明使用.与现有的基于远程虚拟机计算或者基于Web应用的SaaS模式相比,Cloudow能够直接支持现有软件的服务端存储/客户端运行模式,无需修改代码,较好地兼顾了软件兼容性与性能.同时,为尽可能降低Internet环境所带来的远程数据访问延迟,Cloudow大量采用了元数据/数据/文件预取与缓存策略,显著提高了实际部署中的应用性能;测试表明,因为采用了这些优化策略,对于很多常用的Windows桌面应用而言,在Cloudow下额外运行时间开销平均为12%到20%. 相似文献
20.
Data processing complexity, partitionability, locality and provenance play a crucial role in the effectiveness of distributed data processing. Dynamics in data processing necessitates effective modeling which allows the understanding and reasoning of the fluidity of data processing. Through virtualization, resources have become scattered, heterogeneous, and dynamic in performance and networking. In this paper, we propose a new distributed data processing model based on automata where data processing is modeled as state transformations. This approach falls within a category of declarative concurrent paradigms which are fundamentally different than imperative approaches in that communication and function order are not explicitly modeled. This allows an abstraction of concurrency and thus suited for distributed systems. Automata give us a way to formally describe data processing independent from underlying processes while also providing routing information to route data based on its current state in a P2P fashion around networks of distributed processing nodes. Through an implementation, named Pumpkin, of the model we capture the automata schema and routing table into a data processing protocol and show how globally distributed resources can be brought together in a collaborative way to form a processing plane where data objects are self-routable on the plane. 相似文献