期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mlock:building delegable metadata service for the parallel file systems

ZHANG Quan FENG Dan WANG Fang WU Sen 《中国科学:信息科学(英文版)》2015,(3):66-79

The ever-growing demand for high performance computation calls for progressively larger parallel distributed file systems to match their requirement.These file systems can achieve high performance for large I/O operations through distributing load across numerous data servers.However,they fail to provide quality service for applications pertaining to small files.In this paper,we propose a delegable metadata service(DMS)for hiding latency of metadata accesses and optimizing small-file performance.In addition,four techniques have been designed to maintain consistency and efficiency in DMS:pre-allocate serial metahandles,directory-based metadata replacement,packing transaction operations and fine-grained lock revocation.These schemes have been employed in Cappella parallel distributed file system,and various experiments complying with industrial standards have been conducted for evaluation of its efficiency.The results show that our design has achieved significant improvement in performance of both metadata operations and small-file access.Moreover,this scheme is widely applicable for integration within many other distributed file systems. 相似文献

2.

Symmetric active/active metadata service for high availability parallel file systems

Xubin Li Christian Xin Stephen L. 《Journal of Parallel and Distributed Computing》2009,69(12):961-973

High availability data storage systems are critical for many applications as research and business become more data driven. Since metadata management is essential to system availability, multiple metadata services are used to improve the availability of distributed storage systems. Past research has focused on the active/standby model, where each active service has at least one redundant idle backup. However, interruption of service and even some loss of service state may occur during a fail-over depending on the replication technique used. In addition, the replication overhead for multiple metadata services can be very high. The research in this paper targets the symmetric active/active replication model, which uses multiple redundant service nodes running in virtual synchrony. In this model, service node failures do not cause a fail-over to a backup and there is no disruption of service or loss of service state. A fast delivery protocol is further discussed to reduce the latency of the total order broadcast needed. The prototype implementation shows that metadata service high availability can be achieved with an acceptable performance trade-off using the symmetric active/active metadata service solution. 相似文献

3.

MapReduce based parallel gene selection method

A. K. M. Tauhidul Islam Byeong-Soo Jeong A. T. M. Golam Bari Chae-Gyun Lim Seok-Hee Jeon 《Applied Intelligence》2015,42(2):147-156

相似文献

4.

Exploring large-scale small file storage for search engines

Weizhe Zhang Gangzhao Lu Hui He Qizhen Zhang Chuanliang Yu 《The Journal of supercomputing》2016,72(8):2911-2923

相似文献

5.

Data investigation based on XFS file system metadata

Yongmin Park Hyunsoo Chang Taeshik Shon 《Multimedia Tools and Applications》2016,75(22):14721-14743

相似文献

6.

基于PML结构文件的MapReduce算法优化*

田世海魏志强《计算机应用研究》2016,33(9)

针对目前物联网和云计算技术结合后,物联网RFID产生的小型数据致使云计算中MapReduce算法产生运算瓶颈问题进行了研究。运用PML和EPC编码技术保证了数据存储的完整,采用快速排序和改进XGrind压缩技术对于MapReduce算法进行优化。由实验获得,优化后MapReduce算法减小64%的I/O吞吐和45%的CPU耗用,同时使查询数据效率提高75%,可改善查询物联网获取的海量信息的效能。最终通过对云计算下MapReduce算法的优化,为云计算下的物联网追溯平台数据高效使用提供了技术支撑。相似文献

7.

Fault tolerant file models for parallel file systems: introducing distribution patterns for every file

A. Calderón F. García-Carballeira L. M. Sánchez J. D. García J. Fernandez 《The Journal of supercomputing》2009,47(3):312-334

Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage devices. This approach increases the performance and scalability of the system, but a fault in one single node can stop the whole system. To avoid this problem, data must be stored using some kind of redundant technique, so any data stored in a faulty element can be recovered. Fault tolerance can be provided in I/O systems by using replication or RAID based schemes. However, most of the current systems apply the same technique for all files in the system. This paper describes the fault tolerance support provided by Expand, a parallel file system based on standard servers. This support can be applied to other parallel file systems with many benefices: fault tolerance at file level, flexible definition of fault tolerance scheme to be used, possibility to change the fault tolerant support used for a file, etc.

A. CalderónEmail:

相似文献

8.

MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix

Sowkuntla Pandu Prasad P. S. V. S. Sai 《Applied Intelligence》2022,52(1):154-173

Fuzzy-rough set theory is an efficient method for attribute reduction. It can effectively handle the imprecision and uncertainty of the data in the attribute reduction. Despite its efficacy, current approaches to fuzzy-rough attribute reduction are not efficient for the processing of large data sets due to the requirement of higher space complexities. A limited number of accelerators and parallel/distributed approaches have been proposed for fuzzy-rough attribute reduction in large data sets. However, all of these approaches are dependency measure based methods in which fuzzy similarity matrices are used for performing attribute reduction. Alternative discernibility matrix based attribute reduction methods are found to have less space requirements and more amicable to parallelization in building parallel/distributed algorithms. This paper therefore introduces a fuzzy discernibility matrix-based attribute reduction accelerator (DARA) to accelerate the attribute reduction. DARA is used to build a sequential approach and the corresponding parallel/distributed approach for attribute reduction in large data sets. The proposed approaches are compared to the existing state-of-the-art approaches with a systematic experimental analysis to assess computational efficiency. The experimental study, along with theoretical validation, shows that the proposed approaches are effective and perform better than the current approaches.

相似文献

9.

A prediction-based dynamic file assignment strategy for parallel file systems

《Parallel Computing》2015

Nowadays, the rapid development of the internet calls for a high performance file system, and a lot of efforts have already been devoted to the issue of assigning nonpartitioned files in a parallel file system with the aim of pursuing a prompt response to requests. Yet most of the existing strategies still fail to bring about an optimal performance on system mean response time metrics, and new strategies which can achieve better performance in terms of mean response time become indispensable for parallel file systems. This paper, while addressing the issue of assigning nonpartitioned files in parallel file systems where the file accesses exhibit Poisson arrival rates and fixed service times, presents an on-line file assignment strategy, named prediction-based dynamic file assignment (PDFA), to minimize the mean response time among disks under different workload conditions, and a comparison of the PDFA with the well-known file assignment algorithms, such as HP and SOR. Comprehensive experimental results show that PDFA is able to improve the performance consistently in terms of mean response time among all algorithms for comparison. 相似文献

10.

基于MapReduce的微博用户搜索排名算法

梁秋实吴一雷封磊《计算机应用》2012,32(11):2989-2993

在微博搜索领域,单纯依赖于粉丝数量的搜索排名使刷粉行为有了可乘之机,通过将用户看作网页,将用户间的“关注”关系看作网页间的链接关系,使PageRank关于网页等级的基本思想融入到微博用户搜索,并引入一个状态转移矩阵和一个自动迭代的MapReduce工作流将计算过程并行化,进而提出一种基于MapReduce的微博用户搜索排名算法。在Hadoop平台上对该算法进行了实验分析,结果表明,该算法避免了用户排名单纯与其粉丝数量相关,使那些更具“重要性”的用户在搜索结果中的排名获得提升,提高了搜索结果的相关性和质量。相似文献

11.

Two-level Hash/Table approach for metadata management in distributed file systems

Antonio F. Díaz Mancia Anguita Hugo E. Camacho Erik Nieto Julio Ortega 《The Journal of supercomputing》2013,64(1):144-155

相似文献

12.

基于MapReduce的并行MRACO-PAM聚类算法

《计算机工程与科学》2017,(10):1801-1806

聚类分析是数据处理算法中常用的方法,PAM算法自提出以来便成为了最常使用的聚类算法之一。虽然传统PAM算法解决了K-Means算法在聚类过程中对脏数据敏感的问题,但是传统PAM算法存在收敛速度慢、处理大数据集效率不高等问题。针对这些问题,利用蚁群搜索机制来增强PAM算法的全局搜索能力和局部探索能力,并基于MapReduce并行编程框架提出MRACO-PAM算法来实现并行化计算,并进行实验。实验结果表明,基于MapReduce框架的并行MRACO-PAM聚类算法的收敛速度得到了改善,具备处理大规模数据的能力,而且具有良好的可扩展性。相似文献

13.

Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems

Quanqing XU Rajesh Vellore ARUMUGAM Khai Leong YONG Yonggang WEN Yew-Soon ONG Weiya XI 《Frontiers of Computer Science》2015,9(6):904

Big data is an emerging term in the storage industry, and it is data analytics on big storage, i.e., Cloud-scale storage. In Cloud-scale (or EB-scale) file systems, load balancing in request workloads across a metadata server cluster is critical for avoiding performance bottlenecks and improving quality of services.Many good approaches have been proposed for load balancing in distributed file systems. Some of them pay attention to global namespace balancing, making metadata distribution across metadata servers as uniform as possible. However, they do not work well in skew request distributions, which impair load balancing but simultaneously increase the effectiveness of caching and replication. In this paper, we propose Cloud Cache (C²), an adaptive and scalable load balancing scheme for metadata server cluster in EB-scale file systems. It combines adaptive cache diffusion and replication scheme to cope with the request load balancing problem, and it can be integrated into existing distributed metadata management approaches to efficiently improve their load balancing performance. C² runs as follows: 1) to run adaptive cache diffusion first, if a node is overloaded, loadshedding will be used; otherwise, load-stealing will be used; and 2) to run adaptive replication scheme second, if there is a very popular metadata item (or at least two items) causing a node be overloaded, adaptive replication scheme will be used, in which the very popular item is not split into several nodes using adaptive cache diffusion because of its knapsack property. By conducting performance evaluation in trace-driven simulations, experimental results demonstrate the efficiency and scalability of C². 相似文献

14.

Optimization of small file access on object based parallel file system

ZHOU En qiang DONG Yong ZHANG Wei LU Yu tong 《计算机工程与科学》2013,35(12):8

相似文献

15.

基于NoSQL的FITS文件头元数据存储和查询研究

刘应波王锋季凯帆邓辉戴伟梁波《计算机应用研究》2015,(2):461-465

随着大型天文望远镜的投入使用,观测台站正面临PB量级的海量数据存储、快速检索难题;同时由于在数据检索中起着关键作用的FITS文件头的可变性,导致难以使用传统的关系型数据库来建立可适应这种变化需求的非结构化数据模型。针对这个难题,提出了使用NoSQL对天文上广泛使用的FITS文件头中所包含的可变元数据信息进行存储和查询;讨论了关系型数据模型存储可变FITS文件头的不足;分析了NoSQL存储可变FITS头元数据信息的可行性;使用形式化的关系型代数对这种存储查询方式进行了一般化的讨论。通过具体查询实例验证了该方案在存储天文可变FITS文件头的有效性和可行性。相似文献

16.

基于MapReduce的并行频繁项集挖掘算法研究

刘卫明张弛《计算机应用研究》2021,38(3):689-695

针对并行MRPrePost(parallel prepost algorithm based on MapReduce)频繁项集挖掘算法在大数据环境存在运行时间长、内存占用量大和节点负载不均衡的问题,提出一种基于DiffNodeset的并行频繁项集挖掘算法(parallel frequent itemsets mining using DiffNodeset,PFIMD).该算法首先采用一种数据结构DiffNodeset,有效地避免了N-list基数过大的问题;此外提出一种双向比较策略(2-way comparison strategy,T-wcs),以减少两个DiffNod-eset在连接过程中的无效计算,极大地降低了算法时间复杂度;最后考虑到集群负载对并行算法效率的影响,进一步提出了一种基于动态分组的负载均衡策略(load balancing strategy based on dynamic grouping,LBSBDG),该策略通过将频繁1项集F-list中的每项进行均匀分组,降低了集群中每个计算节点上PPC-Tree树的规模,进而减少了先序后序遍历PPC-Tree树所需的时间.实验结果表明,该算法在大数据环境下进行频繁项集挖掘具有较好的效果. 相似文献

17.

MRI:面向并行迭代的MapReduce模型

马志强张力杨双涛《计算机工程与科学》2016,38(12):2434-2441

机器学习领域内的多数模型均需要通过迭代计算以求解其最优参数,而MapReduce模型在迭代计算中的缺陷不足导致其在迭代计算中无法得到广泛应用。为解决上述矛盾,基于MapReduce模型提出并实现了一种可用于模型参数求解的并行迭代模型MRI。MRI模型在保持Map以及Reduce阶段的基础上,新增了Iterate阶段以及相关通信协议,实现了迭代过程中模型参数的更新、分发与迭代控制;通过对MapReduce状态机进行增强,实现了节点任务的重用,避免了迭代过程中节点任务重复创建、初始化以及回收带来的性能开销;在任务节点实现了数据缓存,保障了数据的本地性,并在Map节点增加了基于内存的块缓存机制,进一步提高训练集加载效率,以提高整体迭代效率。基于梯度下降算法的实验结果表明:MRI模型在并行迭代计算方面性能优于MapReduce模型。相似文献

18.

基于MapReduce的并行抽样路径K-匿名隐私保护算法

《电子技术应用》2017,(9):132-136

K-匿名算法及现存K-匿名改进算法大多使用牺牲时间效率降低发布数据信息损失量的方法实现数据的匿名化,但随着数据量的急剧增长,传统的数据匿名化方法已不适用于对较大数据的处理。针对K-匿名算法在单机执行过程中产生大量频繁项集和重复搜索数据表的缺点,将MapReduce模型引入到抽样泛化路径K-匿名算法中对其进行优化。该方法兼具MapReduce及抽样泛化算法的优点,高效分布式匿名化数据集,降低发布数据集信息损失量,提高数据的可用性。实验结果表明:当数据量较大时,该优化算法在时间效率及数据精度方面有显著提高。相似文献

19.

A system planning method based on templates for large-scale manufacturing information systems

《Information & Management》1999,36(1):1-7

In the field of manufacturing, there is a need to develop large-scale manufacturing information systems. This is especially true in the Japanese steel manufacturing industry where CIM is the core management technology. But developing such systems requires large amounts of time and manpower, and furthermore, these type of projects are very difficult to manage. Therefore, in order to ease the process of analysis and design, we propose procedures based on a two-dimensional template with specific criteria for large-scale manufacturing IS architectures. In each manufacturing system, there are two important elements that correspond to the two dimensions of the template. One is a functional category and the other is a management structure. We show here the effectiveness of applying this method to system planning of large-scale IS in one representative steel manufacturing plant. 相似文献

20.

基于MapReduce的轨迹压缩并行化方法

吴家皋夏轩刘林峰《计算机应用》2017,37(5):1282-1286

带有全球定位系统（GPS）功能设备的增多,产生大量的时空轨迹数据,给数据的存储、传输和处理带来了沉重的负担。为了减轻这种负担,各种轨迹压缩方法也随之产生。提出了一种基于MapReduce的并行化轨迹压缩方法,针对并行化导致的分段点前后轨迹的相关性被破坏的问题,首先,采用两种分段点相互交错的划分方法划分轨迹;然后,将分段轨迹分配到多个节点上进行并行化压缩;最后,对压缩结果进行匹配合并。性能测试分析结果表明,所提出的并行化轨迹压缩方法能够大幅提高压缩效率,而且能完全消除因分段导致分段点前后相关性被破坏带来的误差。相似文献