首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
基于HMT和哈希树的Apriori并行算法研究   总被引:1,自引:0,他引:1  
为了进一步提高基于HMT和哈希树的Apriori算法的性能,提出了一种基于独立内存并行环境的并行化方案,充分利用空闲的计算资源来提高关联规则数据挖掘的效率.将原始数据集平均分配到并行环境中的各个子计算节点中,在各个子计算节点中并行地进行关联规则支持度计数,并从各个子计算节点中收集合并支持度计数的结果,得到目标频繁项集,进而实现Apriori算法的并行化.实验结果表明,该并行化方案可以很好地提高原算法的效率.  相似文献   

2.
PFP_Growth算法是FP_Growth算法在Hadoop平台上基于MapReduce的并行化,该算法在分组过程中没有考虑负载均衡问题,导致各个节点完成任务时间不一致,甚至相差很大,从而降低了算法的执行效率。为了提高算法的执行效率,提出了一种基于Spark的RPFP算法,该算法对PFP_Growth算法在均衡分组和降低时间复杂度两方面进行优化,通过把负载大的项放在负载总和最小的组里面实现均衡分组,通过在链头表结构中加入一张哈希表达到快速访问元素地址的目的,从而降低时间复杂度。实验结果表明,RPFP通过优化PFP算法,有效提高了频繁项集的挖掘效率。  相似文献   

3.
仿2维匹配算法对屏幕图像中的非连续色调区域有很好的压缩性能,但该算法中哈希表的空间开销较大,不利于硬件实现。为了减小哈希表的空间,通过对原算法优化提出了一种3字节计算哈希值方法,将源数据看作是一个由以YUV三元组为元素组成的数据集合,然后以YUV三元组为单位计算哈希值,这样不但减少了哈希值的计算量,而且使哈希表的存储空间得到很大的节省。实验结果表明,3字节计算哈希值方法使哈希表的存储空间减少为原算法的1/3,所测试屏幕图像的BD-rate性能也有所提高。  相似文献   

4.
基于大规模序列比对软件的并行优化方案   总被引:1,自引:1,他引:0       下载免费PDF全文
基于基因电脑克隆软件SiClone和可变剪接分析软件AltSplice的并行优化工作,提出一种基于大规模序列比对软件的并行优化方案。该方案对所要进行比对分析的大规模序列库按某种策略进行分割部署,使机群中每个节点处理整个序列库的一部分,可有效减少磁盘I/O和节点间通信开销,适合低价高效的Linux机群系统。  相似文献   

5.
基于Hadoop分布式计算平台,给出一种适用于大数据集的并行挖掘算法。该算法对非结构化的原始大数据集以及中间结果文件进行垂直划分以确保能够获得完整的频繁项集,将各个垂直分块数据分配给不同的Hadoop计算节点进行处理,以减少各个计算节点的存储数据,进而减少各个计算节点执行交集操作的次数,提高并行挖掘效率。实验结果表明,给出的并行挖掘算法解决了大数据集挖掘过程中产生的大量数据通信、中间数据以及执行大量交集操作的问题,算法高效、可扩展。  相似文献   

6.
P2P文件完整性校验延迟隐藏算法   总被引:2,自引:0,他引:2       下载免费PDF全文
P2P下载中文件完整性校验会影响下载性能,针对该问题,提出一种校验延迟的隐藏算法。利用文件完整性校验中使用的哈希算法的流式特性和TCP异步接收缓冲区的特点,将大文件块的哈希计算分成多次对较小的子数据块的计算,收到一个子数据块后,就开始计算哈希。由于计算每一个子数据块的哈希的时间开销很小,保证了计算延迟可以被TCP异步接收缓冲区所隐藏,使哈希计算与数据接收几乎可以并行进行,消除其对P2P文件下载性能的影响,提高了下载效率。  相似文献   

7.
研究了动态网络环境下基于网络的存储系统的数据放置算法,分析了现有的数据放置算法,提出了通用带权分布式哈希表算法.与相容哈希算法和对数算法定义的评判函数相比,考虑了各个节点的存储空间、数据分发节点与数据存储节点之间的物理距离、网络带宽等的限制.仿真结果表明,该算法能够实现数据的公平分发.  相似文献   

8.
张延松  张宇  黄伟  王珊  陈红 《软件学报》2009,20(Z1):165-175
根据OLAP查询的特点和内存数据库的性能特征提出了由多个内存数据库组成的并行OLAP查询处理系统,将OLAP应用中的多维聚集查询分布到各个计算节点并行进行聚集计算,并将聚集计算的结果进行合并输出.与其他并行处理方法相比,该算法充分利用OLAP DB结构中维表远小于事实表的特性,根据数据库中事实表的数据量和节点的数据处理能力进行水平数据库分片,并根据聚集函数的可分布计算特性提高查询处理的并行度,延迟并行查询处理中的合并过程,充分利用节点的并行处理能力,减少并行查询处理过程中的数据通信量,提高系统并行查询处理性能.该算法易于实现,具有较好的可扩展性和性能,适用于企业级海量数据处理领域的需求.  相似文献   

9.
针对现有键值数据库存储系统缺乏热点意识,导致系统在高度倾斜的工作负载下性能较差且不可靠,提出了一种自适应热点感知哈希索引模型,该模型基于key值摘要信息实现了一个高性能哈希表。首先,利用key的摘要信息代替key值,压缩key的存储空间,优化哈希表中桶的数据结构;其次,利用CPU的数据级并行技术以及CPU cache line,对哈希表的探查操作进行优化;最后,为解决摘要信息导致key值无法精准比较,需要额外磁盘I/O的问题,设计了一种自适应key值调度算法,该算法根据当前可用内存大小、哈希索引负载以及访问热点情况动态地调整key值的存储位置。在YCSB仿真数据集上进行了实验,实验表明,相较于最先进的哈希表,自适应热点感知哈希索引在相同内存使用率的情况下,将速度提升至1.2倍。  相似文献   

10.
基于CUDA的并行粒子群优化算法的设计与实现   总被引:1,自引:0,他引:1  
针对处理大量数据和求解大规模复杂问题时粒子群优化(PSO)算法计算时间过长的问题, 进行了在显卡(GPU)上实现细粒度并行粒子群算法的研究。通过对传统PSO算法的分析, 结合目前被广泛使用的基于GPU的并行计算技术, 设计实现了一种并行PSO方法。本方法的执行基于统一计算架构(CUDA), 使用大量的GPU线程并行处理各个粒子的搜索过程来加速整个粒子群的收敛速度。程序充分使用CUDA自带的各种数学计算库, 从而保证了程序的稳定性和易写性。通过对多个基准优化测试函数的求解证明, 相对于基于CPU的串行计算方法, 在求解收敛性一致的前提下, 基于CUDA架构的并行PSO求解方法可以取得高达90倍的计算加速比。  相似文献   

11.
Over the past few years, research and development in bioinformatics (e.g. genomic sequence alignment) has grown with each passing day fueling continuing demands for vast computing power to support better performance. This trend usually requires solutions involving parallel computing techniques because cluster computing technology reduces execution times and increases genomic sequence alignment efficiency. One example, mpiBLAST is a parallel version of NCBI BLAST that combines NCBI BLAST with message passing interface (MPI) standards. However, as most laboratories cannot build up powerful cluster computing environments, Grid computing framework concepts have been designed to meet the need. Grid computing environments coordinate the resources of distributed virtual organizations and satisfy the various computational demands of bioinformatics applications. In this paper, we report on designing and implementing a BioGrid framework, called G‐BLAST, that performs genomic sequence alignments using Grid computing environments and accessible mpiBLAST applications. G‐BLAST is also suitable for cluster computing environments with a server node and several client nodes. G‐BLAST is able to select the most appropriate work nodes, dynamically fragment genomic databases, and self‐adjust according to performance data. To enhance G‐BLAST capability and usability, we also employ a WSRF Grid Service Portal and a Grid Service GUI desk application for general users to submit jobs and host administrators to maintain work nodes. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

12.
张龙  李巍  李云春 《计算机工程》2008,34(2):147-150
为了解决大型分布式系统由集中管理导致的扩展性和鲁棒性差的问题,利用改进的结构化对等网组织分布式计算资源,将逻辑空间中的节点分为主机节点和资源节点,分别采取相容性Hash和位置保留Hash与对等网逻辑空间匹配,以满足资源信息的范围查询。  相似文献   

13.
基于并行B+-树的并行Join算法的设计、分析与实现   总被引:1,自引:0,他引:1  
B^+-树是一种有效的数据库存储结构,被普遍应用于各种关系数据库系统。把B^+-树并行化,使之用于并行数据库系统显然是一项很有意义的重要工作。本文研究了适用于并行数据库的并行B^+-树存储结构,提出两类基于并行B^+-树工并行Join算法。理论和实验结果表明,这些算法效率高基其它并行Join算法。  相似文献   

14.
Similar alarm sequence alignment algorithms have been used to find similar alarm floods in the historical database for the prediction and prevention of alarm floods. However, the existing modified Smith–Waterman (SW) algorithm has a high computation complexity, preventing its online applications within a tolerable computation time period. This paper proposes a new local alignment algorithm, based on the basic local alignment search tool (BLAST). The novelty of the proposed algorithm is three-fold. First, a priority-based similarity scoring strategy makes the proposed algorithm more sensitive to alarms having higher alarm priorities. Second, a set-based pre-matching mechanism avoids unnecessary computations by excluding all irrelevant alarm floods and alarm tags. Third, the seeding and extending steps of the conventional BLAST are adapted for alarm floods, which reduce the searching space significantly. Owing to the novelties, the proposed algorithm is much faster in computation and provides a higher alignment accuracy than the SW algorithm. The efficiency of the proposed algorithm is demonstrated by industrial case studies based on the historical alarm floods from an oil conversion plant.  相似文献   

15.
Identification and verification of a video clip via its fingerprint find applications in video browsing, database search and security. For this purpose, the video sequence must be collapsed into a short fingerprint using a robust hash function based on signal processing operations. We propose two robust hash algorithms for video based both on the discrete cosine transform (DCT), one on the classical basis set and the other on a novel randomized basis set (RBT). The robustness and randomness properties of the proposed hash functions are investigated in detail. It is found that these hash functions are resistant to signal processing and transmission impairments, and therefore can be instrumental in building database search, broadcast monitoring and watermarking applications for video. The DCT hash is more robust, but lacks security aspect, as it is easy to find different video clips with the same hash value. The RBT based hash, being secret key based, does not allow this and is more secure at the cost of a slight loss in the receiver operating curves  相似文献   

16.
单菊林  关振群  宋超 《计算机学报》2007,30(11):1989-1997
针对三维推进波前算法(AFT-Advancing Front Technique)存在的效率与收敛性问题,文中提出了一整套改进方案,给出了基于拓扑连接的网格数据结构和基于Hash表的网格元素的插入、查找、删除算法,提高了整个算法的效率.通过在网格生成过程中动态维护前沿的尺寸信息,提高四面体单元的整体质量.在内核回退求解时通过引入前沿优先因子,改变前沿推进的路径,大大增加了回退求解的成功概率;对于极少数不能回退求解的内核采用基于线性规划的插点方法加以解决,这样就基本保证了整个算法的收敛.在网格生成以后,通过删除不必要的内部节点、合并相关四面体单元以及对所有内部节点进行基于角度的优化,从而进一步有效提高了网格质量.数值算例表明,文中提出的改进算法具有接近线性的时间复杂度,生成网格质量好.该算法已经得到工程应用.  相似文献   

17.
A distributed hash table (DHT) is an infrastructure to support resource discovery in large distributed systems. In a DHT, data items such as resources, indexes of resources or resource metadata, are distributed across an overlay network based on a hash function. However, this may not be desirable in commercial applications such as Grid and cloud computing whereby the presence of multiple administrative domains leads to the issues of data ownership and self-economic interests. In this paper, we present R-DHT (Read-only DHT), a DHT-based resource discovery scheme without distributing data items. To map each data item back onto its resource owner, a physical host, we virtualize each host into virtual nodes. Nodes are further organized as a segment-based overlay network which increases node failure resiliency without replicating data items. We demonstrate the feasibility of our proposed scheme by presenting R-Chord, an implementation of R-DHT using Chord as the underlying overlay graph, with lookup and maintenance optimizations. Through analytical and simulation analyses, we evaluate the performance of R-DHT and compare it with traditional DHTs in terms of lookup path length, resiliency to node failures, and maintenance overhead. Overall, we found that R-DHT is effective and efficient for resource indexing and discovery in large distributed systems with a strong commercial requirement.  相似文献   

18.
张宇  张延松  陈红  王珊 《软件学报》2017,28(3):490-501
众核架构协处理器Xeon Phi成为新兴的主流高性能计算平台.对于数据库应用而言,内存分析处理是一种计算密集型负载,其主要的性能取决于大事实表与维表之间的内存外键连接性能.本文关注于一种相对于缓存相关的分区哈希连接算法和缓存不相关的无分区哈希连接算法的缓存友好型外键连接算法,以适应Xeon Phi协处理器较小的LLC和高并发线程的特点.通过挖掘OLAP模式中的代理键特征,基于键值匹配的哈希探测操作可以进一步简化为事实表与维表之间基于主-外键参照完整性约束的代理键参照访问,因此复杂的哈希表和CPU代价较高的哈希探测操作可以简化为通过映射外键值为代理键向量内存偏移地址的方法对代理向量直接访问.基于代理向量参照访问的外键连接算法能够简单并高效地应用于Xeon Phi协处理器平台,通过更多的核心和高并发线程来掩盖内存访问延迟.实验中对传统的哈希连接算法(无分区哈希连接算法和基数分区哈希连接算法)和基于代理向量参照技术的外键连接算法在Xeon E5-2650 v3 10核处理器平台和Xeon Phi 5110P 60核协处理器平台进行性能测试和比较,实验结果给出了主流的内存外键连接算法在不同数据集和不同平台上全面的性能特征.  相似文献   

19.
Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid‐enabled, high‐throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid‐enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a ‘mini‐Grid’ testbed and the results presented here show that for large problem sizes, a distributed, Grid‐enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

20.
In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file system in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slowdown, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all compute nodes, based on a distributed hash function. Our cluster experiments with Montage and BLAST workflows, using up to 512 cores, show that MemFS has both better performance and better scalability than the state-of-the-art, locality-based file system, AMFS. Furthermore, our evaluation on a public commercial cloud validates our cluster results. On this platform MemFS shows excellent scalability up to 1024 cores and is able to saturate the 10G Ethernet bandwidth when running BLAST and Montage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号