首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper proposes using file system custom metadata as a bidirectional communication channel between applications and the storage middleware. This channel can be used to pass hints that enable cross-layer optimizations, an option hindered today by the ossified file-system interface. We study this approach in the context of storage system support for large-scale workflow execution systems: Our workflow-optimized storage system (WOSS), exploits application hints to provide per-file optimized operations, and exposes data location to enable location-aware scheduling. We argue that an incremental adoption path for adopting cross-layer optimizations in storage exists, present the system architecture for a workflow-optimized storage system and its integration with a workflow runtime engine, and evaluate this approach using synthetic and real applications over multiple success metrics (application runtime, generated network stress, and energy). Our performance evaluation demonstrates that this design brings sizeable performance gains. On a large scale cluster (100 nodes), compared to two production class distributed storage systems (Ceph and GlusterFS), WOSS achieves up to 6× better performance for the synthetic benchmarks and 20–40% better application-level performance gain for real applications.  相似文献   

2.
Data-intensive applications that are inherently I/O bound have become a major workload on traditional high-performance computing (HPC) clusters. Simply employing data-intensive computing storage such as HDFS or using parallel file systems available on HPC clusters to serve such applications incurs performance and scalability issues. In this paper, we present a novel two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We build a two-level storage system prototype with Tachyon and OrangeFS, and analyze the resulting I/O throughput for typical MapReduce operations. Theoretical modeling and experiments show that the proposed two-level storage delivers higher aggregate I/O throughput than HDFS and OrangeFS and achieves scalable performance for both read and write. We expect this two-level storage approach to provide insights on system design for big data analytics on HPC clusters.  相似文献   

3.
针对MapReduce数据块处理机制、高维数据分布特征和KNN查询需求,本文设计一种基于B 树的高维索引结构(iPartition),创新性提出基于主成分区分度的优化数据划分策略和邻接数据域分散存储等原则,将数据均匀划分到不同的Slave节点,使尽可能多的数据域对计算共同贡献,提升MapReduce任务处理并行性;利用B 树构造分布式的双层索引实现查询时数据范围快速过滤,降低高维计算代价。实验表明,iPartition在高维数据近似查询环境下,具有良好的性能和扩展性。  相似文献   

4.
Parallel file systems are experiencing more and more applications from various fields. Various applications have different I/O workload characteristics, which have diverse requirements on accessing storage resources. However, parallel file systems often adopt the “one-size-fits-all” solution, which fails to meet specific application needs and hinders the full exploitation of potential performance. This paper presents a framework to enable dynamic file I/O path selection with fine granularity at runtime. The framework adopts a file handle-rich scheme to allow file systems choose corresponding optimizations to serve I/O requests. Consistency control algorithms are proposed to ensure data consistency while changing optimizations at runtime. One case study on our prototype shows that choosing proper optimizations can improve the I/O performance for small files and large files by up to 40 and 64.4 %, respectively. Another case study shows that the data prefetch performance for real-world application traces can be improved by up to 193 % by selecting correct prefetch patterns. Simulations in large-scale environment also show that our method is scalable and both the memory consumption and the consistency control overhead can be negligible.  相似文献   

5.
Scientific data analysis and visualization have become the key component for nowadays large scale simulations. Due to the rapidly increasing data volume and awkward I/O pattern among high structured files, known serial methods/tools cannot scale well and usually lead to poor performance over traditional architectures. In this paper, we propose a new framework: ParSA (parallel scientific data analysis) for high-throughput and scalable scientific analysis, with distributed file system. ParSA presents the optimization strategies for grouping and splitting logical units to utilize distributed I/O property of distributed file system, scheduling the distribution of block replicas to reduce network reading, as well as to maximize overlapping the data reading, processing, and transferring during computation. Besides, ParSA provides the similar interfaces as the NetCDF Operator (NCO), which is used in most of climate data diagnostic packages, making it easy to use this framework. We utilize ParSA to accelerate well-known analysis methods for climate models on Hadoop Distributed File System (HDFS). Experimental results demonstrate the high efficiency and scalability of ParSA, getting the maximum 1.3 GB/s throughput on a six nodes Hadoop cluster with five disks per node. Yet, it can only get 392 MB/s throughput on a RAID-6 storage node.  相似文献   

6.
曙光星云分布式文件系统:海量小文件存取   总被引:2,自引:0,他引:2  
随着互联网应用的发展和云计算的兴起,在线图片、音频、视频以及微博等服务逐渐广泛发展,这些应用展示了与传统应用截然不同的数据访问和存储模式.数据中心内每秒钟都有大量较小文件的生成、分析和返回,这些应用对高并发海量文件的高吞吐、低延迟读写提出了新的挑战.提出基于分布式表存储的全新的分布式文件系统HVFS来管理数以十亿计的文件,并同时支持高吞吐和低延迟的文件访问.HVFS通过改进分布式可扩展哈希来管理元数据、日志结构的格式和列存储来利用时空局部性.本文描述了HVFS的设计和实现并进行了中等规模的实验.实验显示HVFS的表存储结构能够线性的扩展,并在82个结点上提供超过240,000次/秒、100,000次/秒的数据(<1KB)写和读;基于FUSE的实现在32个节点上提供超过180,000个/秒的文件创建速度.  相似文献   

7.
This paper focuses on data-intensive workflows and addresses the problem of scheduling workflow ensembles under cost and deadline constraints in Infrastructure as a Service (IaaS) clouds. Previous research in this area ignores file transfers between workflow tasks, which, as we show, often have a large impact on workflow ensemble execution. In this paper we propose and implement a simulation model for handling file transfers between tasks, featuring the ability to dynamically calculate bandwidth and supporting a configurable number of replicas, thus allowing us to simulate various levels of congestion. The resulting model is capable of representing a wide range of storage systems available on clouds: from in-memory caches (such as memcached), to distributed file systems (such as NFS servers) and cloud storage (such as Amazon S3 or Google Cloud Storage). We observe that file transfers may have a significant impact on ensemble execution; for some applications up to 90 % of the execution time is spent on file transfers. Next, we propose and evaluate a novel scheduling algorithm that minimizes the number of transfers by taking advantage of data caching and file locality. We find that for data-intensive applications it performs better than other scheduling algorithms. Additionally, we modify the original scheduling algorithms to effectively operate in environments where file transfers take non-zero time.  相似文献   

8.
提出了一种基于确定性随机分布算法分布元数据和数据对象的可伸缩集群文件系统结构。其中目录路径属性与目录对象分离的元数据管理方法,在提高系统性能、均衡元数据分布和减少元数据迁移等方面具有明显优势。提出的基于动态区间映射的数据对象布局算法,支持权重分布和副本,在均衡数据分布和最少迁移数据方面都具有统计意义上的最优性,有效解决了动态存储系统的数据均衡分布与可伸缩性问题。  相似文献   

9.
In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. The I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We implement the proposed mechanism in the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. We demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation.  相似文献   

10.
针对交通领域多源动态海量数据高性能计算的实时性、动态扩展性处理要求,提出了一种基于GemFire的分布式内存数据库实验平台。采用键-值数据存储结构和分布式动态成员关系,通过加载浮动车系统的真实数据在完整的云计算架构下,进行了计算性能测试与分析。实验结果表明,平台可将千万级以上大数据量的计算时间缩短至原系统的10%以内,满足了交通物联网云平台整合利用各子系统数据资源的应用需求。  相似文献   

11.
蓝鲸分布式文件系统的分布式分层资源管理模型   总被引:10,自引:0,他引:10  
为了高效地管理海量分布式存储资源,蓝鲸分布式文件系统抛弃了传统的集中式资源管理方式。实现了分布式分层资源管理模型.该模型可以管理多个存储服务器,还能支持多个元数据服务器组成的集群进行分布式元数据处理,支持各种元数据和数据的负载平衡策略.同时,该模型中的带外数据传输功能克服了系统的性能瓶颈。提高了系统支持并发访问的能力.理论分析和实际测试结果都表明此模型能够满足多种不同的需求,提供很好的性能和良好的扩展性.  相似文献   

12.
当前的大规模存储系统提供大量的聚合I/O带宽,但并没有实现高度的元数据扩展性以管理分布于成千上万存储节点之上的文件。本文提出服务端驱动的无锁元数据操作来改进文件元数据操作的扩展性。服务端驱动技术简化了一致性维护,无锁技术既避免了资源冲突,又增进了元数据操作的并行性。实现了文件创建、删除操作。实验结果表明,该方法能显著提高系统的性能和扩展性。  相似文献   

13.
This article studies the performance and scalability of a geometric multigrid solver implemented within the hierarchical hybrid grids (HHG) software package on current high performance computing clusters up to nearly 300,000 cores. HHG is based on unstructured tetrahedral finite elements that are regularly refined to obtain a block‐structured computational grid. One challenge is the parallel mesh generation from an unstructured input grid that roughly approximates a human head within a 3D magnetic resonance imaging data set. This grid is then regularly refined to create the HHG grid hierarchy. As test platforms, a BlueGene/P cluster located at Jülich supercomputing center and an Intel Xeon 5650 cluster located at the local computing center in Erlangen are chosen. To estimate the quality of our implementation and to predict runtime for the multigrid solver, a detailed performance and communication model is developed and used to evaluate the measured single node performance, as well as weak and strong scaling experiments on both clusters. Thus, for a given problem size, one can predict the number of compute nodes that minimize the overall runtime of the multigrid solver. Overall, HHG scales up to the full machines, where the biggest linear system solved on Jugene had more than one trillion unknowns. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
针对聚类算法需要处理数据集的规模越来越大、时效性要求越来越高,对算法的大数据适应能力和性能要求更高的问题,提出一种在Spark分布式内存计算平台下的模糊C均值(FCM)算法Spark-FCM。首先对矩阵通过水平分割实现分布式存储,不同向量存储在不同节点;然后基于FCM算法的计算特点,设计了分布式和缓存敏感的常用矩阵操作,包括乘法、转置和加法等;最后基于矩阵操作和Spark平台特点,设计了Spark-FCM算法,主要数据结构采用分布式矩阵存储,具有节点间数据移动少和每个步骤分布式计算特点。通过在单机和集群环境下测试,算法具有良好的可扩展性,并可以适应大规模数据集,算法性能与数据量成线性关系,集群环境下性能比单机提高2~3倍。  相似文献   

15.
Data replication is becoming a popular technology in many fields such as cloud storage, Data grids and P2P systems. By replicating files to other servers/nodes, we can reduce network traffic and file access time and increase data availability to react natural and man-made disasters. However, it does not mean that more replicas can always have a better system performance. Replicas indeed decrease read access time and provide better fault-tolerance, but if we consider write access, maintaining a large number of replications will result in a huge update overhead. Hence, a trade-off between read access time and write updating cost is needed. File popularity is an important factor in making decisions about data replication. To avoid data access fluctuations, historical file popularity can be used for selecting really popular files. In this research, a dynamic data replication strategy is proposed based on two ideas. The first one employs historical access records which are useful for picking up a file to replicate. The second one is a proactive deletion method, which is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead. A unified cost model is used as a means to measure and compare the performance of our data replication algorithm and other existing algorithms. The results indicate that our new algorithm performs much better than those algorithms.  相似文献   

16.
张宝军  潘瑞芳 《计算机应用》2015,35(8):2158-2163
为解决新一代博客系统海量信息的存储问题,结合云存储技术,提出了一种新的博客系统架构BlogCloud。该架构以分布式存储技术为核心,避免了集中式存储的性能瓶颈问题,可扩展性高;采用半分布式P2P网络拓扑结构,能快速定位网络中的存储资源;只将稳定节点作为存储节点,避免了不稳定节点带来的网络波动问题;遵循就近存储原则,同时在客户端缓存文件,减少了网络传输;允许用户自定义文件分块的大小,对大的文件可分块并行传输,提高了文件传输的速度,对小的文件则不用分块,节省了文件分块、合并的开销;具备数据冗余备份功能,在网络中多个存储节点保留文件副本,并实行异地备份,增强了数据的安全性和可靠性。在虚拟机上对BlogCloud和ZSWIN博客系统进行比较测试,结果显示:BlogCloud的吞吐量明显高于ZSWIN;将不稳定节点作为存储节点会降低BlogCloud的性能;在存储节点和索引节点减少的情况下BlogCloud仍然能够稳定运行,可靠性较高。结果表明,BlogCloud架构能够满足新一代博客系统的存储要求。  相似文献   

17.
18.
Pull-based overlays are used in some of today’s largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime and executing it. This model helps overcome the problems of direct job submission in the highly complex grid environments, namely, heterogeneity, imprecise status information, relatively high failure rates and slow adaptation to changes of grid conditions or user priorities. This article presents a distributed scheduling architecture for such late-binding overlays. In this architecture, execution nodes share a distributed hash table and cooperatively perform job assignment. As our experiments prove, scalability problems of centralized matching are avoided, achieving low and predictable scheduling overheads even for execution of large workflows, and total turnaround times are improved. This is in line with the predictions of a theoretical model of grid workflow execution that the article also discusses. Scalability makes fine-grained scheduling possible and enables new functionalities, like a distributed data cache shared by the execution nodes, which helps alleviate the commonly congested storage services. In addition, we show that our system is more resilient to problems like communication breakdowns between computation centres. Moreover, the new architecture is better prepared to deal with demanding scenarios like intense demand of popular data files or remote data processing.  相似文献   

19.
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.  相似文献   

20.
高能物理是典型的数据密集型计算,数据访问性能对整个系统至关重要并与应用的计算模式密切相关.从剖析高能物理的典型计算模式入手,总结出其数据访问的特点,提出针对操作系统I/O调度、分布式文件系统缓存等多个因素的优化措施,优化后数据访问性能和CPU利用率明显提高.大规模存储系统对于元数据管理、数据可靠性、扩容等可管理性等功能也有较高要求,结合现有Lustre并行文件系统的不足,提出了Gluster的高能物理存储系统设计,在进行数据管理以及扩容等方面的优化后,系统已经正式投入使用,数据访问性能能够满足高能物理计算的需求,同时具有更好的可扩展性和可靠性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号