期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

俞东进郑苏杭李万清《计算机应用研究》2012,29(2):478-481

为了在多核处理器上充分利用多核资源以提升挖掘性能,提出了一种动态与静态任务分配机制相结合的基于多核的并行序列模式挖掘算法。该算法采用数据并行与任务并行相结合的策略,在各处理器核生成局部序列模式后,再与其他处理器核协同,以最终获得所有的全局序列模式。算法通过并行局部归约技术消除了局部序列的重复生成与计算,并可结合静态与动态任务分配机制解决处理器的负载不均衡问题。理论分析和实验都证实了该算法可有效利用多核计算平台及多核体系结构优势,具有较高的运行效率和加速比。相似文献

2.

基于BIDE的多核并行闭合序列模式挖掘

下载免费PDF全文

俞东进郑苏杭李万清吴为《计算机工程》2012,38(12):55-58

基于经典的BIDE算法,提出一种多核并行闭合序列模式挖掘算法——MT_BIDE。该算法在频繁序列扩展判断前进行剪枝,在扩展过程中动态调整频繁序列及其伪投影数据集,平衡不同线程间挖掘闭合序列模式的计算量差异。实验结果表明,该算法具有较高的运行效率和加速比。相似文献

3.

并行动态位向量频繁闭合序列模式挖掘算法

陈倩刘云高钰莹《计算机工程与科学》2018,40(10):1717-1725

针对在时间和空间上都具有高计算成本的长序列数据库,一个更有效和更紧凑且可以完全提取信息的挖掘模式是当前的研究热点。提出一种并行动态位向量频繁闭合序列模式的挖掘算法（PDBV FCSP）,该算法采用多核处理器架构和DBV数据结构相结合的方式,有效加快了序列数据库的处理速度,并对搜索空间进行划分,尽早执行预处理序列的闭合检查,减少了所需的存储空间和挖掘频繁闭合序列模式的执行时间,克服了现有并行挖掘算法通信开销、同步和数据复制等问题。利用重新分配工作的动态负载平衡机制,解决处理器之间的负载均衡问题,最大限度地减少了CPU空闲时间。对DBV VDF算法和PDBV FCSP（2 4核）算法进行仿真比较,结果表明,PDBV FCSP算法在运行时间、内存使用和可伸缩性等方面都有较优的性能提升,且当内核数增加时,性能更优。相似文献

4.

一种基于投影树的并行序列挖掘算法

佘春东范植华孙世新胡四泉车著明《计算机工程与应用》2004,40(14):4-5,56

在许多科学和商业领域,序列模式的发现技术发挥着越来越重要的作用,然而人们对于高效的基于投影树算法的并行模式关注较少。该文首先介绍了频繁序列挖掘模式的基本概念,然后基于投影树算法,提出了分布式存储并行序列挖掘算法,并对算法的性能进行了详细的分析。相似文献

5.

基于任务图的多处理器负载均衡调度算法

下载免费PDF全文

芦奉良刘羽张军《计算机工程》2011,37(11):77-79,82

针对共享存储多处理机系统中各处理机负载不均衡的问题,提出一种新的任务调度算法--多重波前法.在任务图划分的基础上,采用分层调度方式对原波前法进行改进,通过对任务序列进行多重遍历和重组以降低各处理器的分配误差,利用循环调度算法提高任务调度结果的精度,并给出该算法的并行实现.实验结果证明,该算法具有较低的任务分配误差和较高... 相似文献

6.

序列模式挖掘的并行算法研究 总被引：1，自引：0，他引：1

马传香简钟《计算机工程》2005,31(6):16-17,136

序列模式在许多领域都有着重要的应用,大量的数据和模式需要高效的、可扩展的并行算法.针对目前序列模式挖掘算法存在的普遍问题,提出了一个适合无共享并行环境下的算法PMSP,有效地解决了存储受限以及时效性问题,并将它与当前相对较优的并行算法HPSPM做了比较,实验表明PMSP是有效的. 相似文献

7.

基于MapReduce的高维数据频繁项集挖掘

赵欣灿朱云毛伊敏《计算机工程》2022,48(3):81-89

传统的数据挖掘算法在面向大规模高维数据的挖掘过程中,存在数据特征捕捉准确率低、节点负载不均衡、数据交互频繁、频繁项集紧凑化程度低等问题。提出基于MapReduce的并行挖掘算法PARDG-MR,结合高维数据特征,设计基于维度粒化算法和负载均衡算法的DGPL策略,并对数据进行预处理,以解决高维复杂数据特征属性捕捉困难及数据划分中节点负载不均衡的问题。通过构建基于PJPFP-Tree树的频繁项集并行挖掘策略PARM,实现频繁项集的并行化分组过程,从而提高数据处理的运行效率。在此基础上,提出基于剪枝前缀推论的整合节点剪枝算法PJPFP,提高频繁项集挖掘过程中的剪枝效率,增强频繁项集的紧凑化程度。在Webdocs、NDC、Gisette 3个数据集上的实验结果表明,相比PFP-growth、PWARM、MRPrePost算法,该算法的运行时间平均缩短了约20%,能够有效提高数据挖掘效率且降低内存空间。相似文献

8.

基于MapReduce的高效用序列模式挖掘算法

程思远马超李聪聪《计算机系统应用》2015,24(12):228-232

由于数据规模的快速增长,高效用序列模式挖掘算法效率严重下降.针对这种情况,提出基于MapReduce的高效用序列模式挖掘算法HusMaR.算法基于MapReduce框架,使用效用矩阵高效地生成候选项;使用随机映射策略均衡计算资源;使用基于领域的剪枝策略来防止组合爆炸.实验结果表明,在大规模数据集下,算法取得了较高的并行效率. 相似文献

9.

一种挖掘多核处理器存储级并行的算法

彭林张小强刘德峰谢伦国田祖伟《计算机研究与发展》2009,46(Z2)

多核处理器中,各个处理器核之间可以并发地进行外部存储访问,提供不同于单处理器的存储级并行(memory level parallelism)能力.不规则应用中的循环,传统的并行方法难以识别其并行性,不能充分利用多核处理器存储级并行能力和并行计算能力.对基于软件开发多核处理器存储级并行进行了讨论,提出一种前瞻并行多线程算法LLSM(loop level speculative mssultithreading).LLSM对不规则应用中的循环进行并行化,在多核处理器上的测试数据表明:该算法能够有效地挖掘多核处理器的存储级并行能力和计算能力,同时指出多核环境下存储级并行计算公式需要考虑线程同步开销. 相似文献

10.

关联规则的并行挖掘算法研究 总被引：1，自引：0，他引：1

陈涛石伟胜陈启买《现代计算机》2006,(7):27-30

给出了并行挖掘关联规则的形式化描述和并行挖掘的模型.在研究基于Aprior算法的各种并行实现如CD、DD、IDD和HD算法后,针对这些算法扩展性差以及负载不平衡的缺点,提出了在IDD和HD算法上使用Sidle调度策略,有效地解决了IDD和HD算法中非常重要的候选项目集在各个处理器节点之间的划分问题,尽可能使得各个节点负载平衡,从而提高算法的效率. 相似文献

11.

Effect of Data Distribution in Parallel Mining of Associations 总被引：1，自引：0，他引：1

David W. Cheung Yongqiao Xiao 《Data mining and knowledge discovery》1999,3(3):291-314

Association rule mining is an important new problem in data mining. It has crucial applications in decision support and marketing strategy. We proposed an efficient parallel algorithm for mining association rules on a distributed share-nothing parallel system. Its efficiency is attributed to the incorporation of two powerful candidate set pruning techniques. The two techniques, distributed and global prunings, are sensitive to two data distribution characteristics: data skewness and workload balance. The prunings are very effective when both the skewness and balance are high. We have implemented FPM on an IBM SP2 parallel system. The performance studies show that FPM outperforms CD consistently, which is a parallel version of the representative Apriori algorithm (Agrawal and Srikant, 1994). Also, the results have validated our observation on the effectiveness of the two pruning techniques with respect to the data distribution characteristics. Furthermore, it shows that FPM has nice scalability and parallelism, which can be tuned for different business applications. 相似文献

12.

Mining non-redundant sequential rules with dynamic bit vectors and pruning techniques

Minh-Thai Tran Bac Le Bay Vo Tzung-Pei Hong 《Applied Intelligence》2016,45(2):333-342

Most algorithms for mining sequential rules focus on generating all sequential rules. These algorithms produce an enormous number of redundant rules, making mining inefficient in intelligent systems. In order to solve this problem, the mining of non-redundant sequential rules was recently introduced. Most algorithms for mining such rules depend on patterns obtained from existing frequent sequence mining algorithms. Several steps are required to organize the data structure of these sequences before rules can be generated. This process requires a great deal of time and memory. The present study proposes a technique for mining non-redundant sequential rules directly from sequence databases. The proposed method uses a dynamic bit vector data structure and adopts a prefix tree in the mining process. In addition, some pruning techniques are used to remove unpromising candidates early in the mining process. Experimental results show the efficiency of the algorithm in terms of runtime and memory usage. 相似文献

13.

分布式序列模式发现算法的研究 总被引：12，自引：0，他引：12

邹翔张巍刘洋蔡庆生《软件学报》2005,16(7):1262-1269

提出算法FDMSP(fast distributed mining of sequential patterns),以解决分布式环境下的序列模式挖掘问题.首先对分布式环境下序列模式的性质进行了分析.算法采用前缀投影技术划分模式搜索空间,利用序列模式前缀指定选举站点统计序列的全局支持计数,利用局部约减、选举约减、计数约减等方法减少候选序列数,同时将算法分为3个子过程异步运行,使得算法具有较低的I/O开销、内存开销和通信开销,从而高效地生成全局序列模式.实验结果显示,在具有海量数据的局域网环境中,FDMSP算法的性能优于将数据集中后采用GSP算法68.5%～99.5%,并且FDMSP算法具有良好的可伸缩性. 相似文献

14.

Finding Frequent Patterns Using Length-Decreasing Support Constraints

Masakazu?Seno Email author George?Karypis 《Data mining and knowledge discovery》2005,10(3):197-228

Finding prevalent patterns in large amount of data has been one of the major problems in the area of data mining. Particularly, the problem of finding frequent itemset or sequential patterns in very large databases has been studied extensively over the years, and a variety of algorithms have been developed for each problem. The key feature in most of these algorithms is that they use a constant support constraint to control the inherently exponential complexity of these two problems. In general, patterns that contain only a few items will tend to be interesting if they have a high support, whereas long patterns can still be interesting even if their support is relatively small. Ideally, we want to find all the frequent patterns whose support decreases as a function of their length without having to find many uninteresting infrequent short patterns. Developing such algorithms is particularly challenging because the downward closure property of the constant support constraint cannot be used to prune short infrequent patterns.In this paper we present two algorithms, LPMiner and SLPMiner. Given a length-decreasing support constraint, LPMiner finds all the frequent itemset patterns from an itemset database, and SLPMiner finds all the frequent sequential patterns from a sequential database. Each of these two algorithms combines a well-studied efficient algorithm for constant-support-based pattern discovery with three effective database pruning methods that dramatically reduce the runtime. Our experimental evaluations show that both LPMiner and SLPMiner, by effectively exploiting the length-decreasing support constraint, are up to two orders of magnitude faster, and their runtime increases gradually as the average length of the input patterns increases.This work was supported by NSF CCR-9972519, EIA-9986042, ACI-9982274, ACI-0133464, and by Army High Performance Computing Research Center contract number DA/DAAG55-98-1-0441. Access to computing facilities was provided by the Minnesota Supercomputing Institute.Masakazu Seno has been a system software programmer at Hitachi Software Engineering Co., Ltd. in Japan for eight years. He joined Prof. George Karypis’s research team at the University of Minnesota in 2000 to work on data mining projects, and received his master’s degree in computer science there. He is now back to the company and currently involved in the development project of a relational database management system.George Karypis received his Ph.D. degree in computer science at the University of Minnesota and he is currently an associate professor at the Department of Computer Science and Engineering at the University of Minnesota. His research interests spans the areas of parallel algorithm design, data mining, bioinformatics, information retrieval, applications of parallel processing in scientific computing and optimization, sparse matrix computations, parallel preconditioners, and parallel programming languages and libraries. His research has resulted in the development of software libraries for serial and parallel graph partitioning (METIS and ParMETIS), hypergraph partitioning (hMETIS), for parallel Cholesky factorization (PSPASES), for collaborative filtering-based recommendation algorithms (SUGGEST), clustering high dimensional datasets (CLUTO), and finding frequent patterns in diverse datasets (PAFI). He has coauthored over ninety journal and conference papers on these topics and a book title “Introduction to Parallel Computing” (Publ. AddisonWesley, 2003, 2nd edition). In addition, he is serving on the program committees of many conferences and workshops on these topics and is an associate editor of the IEEE Transactions on Parallel and Distributed Systems. 相似文献

15.

从图数据库中挖掘频繁跳跃模式 总被引：4，自引：0，他引：4

刘勇李建中高宏《软件学报》2010,21(10):2477-2493

很多频繁子图挖掘算法已被提出.然而,这些算法产生的频繁子图数量太多而不能被用户有效地利用.为此,提出了一个新的研究问题:挖掘图数据库中的频繁跳跃模式.挖掘频繁跳跃模式既可以大幅度地减少输出模式的数量,又能使有意义的图模式保留在挖掘结果中.此外,跳跃模式还具有抗噪声干扰能力强等优点.然而,由于跳跃模式不具有反单调性质,挖掘它们非常具有挑战性.通过研究跳跃模式自身的特性,提出了两种新的裁剪技术:基于内扩展的裁剪和基于外扩展的裁剪.在此基础上又给出了一种高效的挖掘算法GraphJP(an algorithm for mining jump patterns from graph databases).另外,还严格证明了裁剪技术和算法GraphJP的正确性.实验结果表明,所提出的裁剪技术能够有效地裁剪图模式搜索空间,算法GraphJP是高效、可扩展的. 相似文献

16.

Mining top-k co-occurrence items with sequential pattern

《Expert systems with applications》2017

Frequent sequential pattern mining has become one of the most important tasks in data mining. It has many applications, such as sequential analysis, classification, and prediction. How to generate candidates and how to control the combinatorically explosive number of intermediate subsequences are the most difficult problems. Intelligent systems such as recommender systems, expert systems, and business intelligence systems use only a few patterns, namely those that satisfy a number of defined conditions. Challenges include the mining of top-k patterns, top-rank-k patterns, closed patterns, and maximal patterns. In many cases, end users need to find itemsets that occur with a sequential pattern. Therefore, this paper proposes approaches for mining top-k co-occurrence items usually found with a sequential pattern. The Naive Approach Mining (NAM) algorithm discovers top-k co-occurrence items by directly scanning the sequence database to determine the frequency of items. The Vertical Approach Mining (VAM) algorithm is based on vertical database scanning. The Vertical with Index Approach Mining (VIAM) algorithm is based on a vertical database with index scanning. VAM and VIAM use pruning strategies to reduce the search space, thus improving performance. VAM and VIAM are especially effective in mining the co-occurrence items of a long input pattern. The three algorithms were evaluated using real-world databases. The experimental results show that these algorithms perform well, especially VAM and VIAM. 相似文献

17.

基于CTID序列模式的一种改进算法 总被引：2，自引：0，他引：2

刘月波陆阶平刘同明《微机发展》2005,15(3):20-22,120

提高序列模式挖掘算法效率的关键在于减少发现频繁序列的时间。文中基于CTID概念提出了一种改进的频繁序列模式挖掘算法——SPM，它充分利用频繁项集和中间挖掘结果，得到更多有效的序列模式，并简化了剪枝步骤，从而提高了算法效率。实验证明该算法可行。相似文献

18.

基于改进PrefixSpan的序列模式挖掘算法 总被引：1，自引：0，他引：1

公伟刘培玉贾娴《计算机应用》2011,31(9):2405-2407

针对PrefixSpan算法构造投影数据库开销大的问题,提出一种基于改进PrefixSpan的序列模式挖掘算法SPMIP。该方法通过添加剪枝步和减少某些特定序列模式生成过程的扫描,来减少投影数据库的规模及扫描投影数据库的时间,提高算法效率,并最终得到需要的序列模式。实验结果证明在获得序列模式不受影响情况下,SPMIP算法比PrefixSpan算法效率更高。相似文献

19.

大型时态数据库中的Burst模式挖掘 总被引：1，自引：0，他引：1

曾德胜张师超王日凤谢冲《计算机应用》2006,26(10):2413-2416

首先分析了挖掘整个大型时态数据库时可能存在的两个问题,提出了解决的一种新方法。该方法采用“先分后合”的思想：先将大型数据库划分成多个小型数据集,接着对这些数据集进行四次裁剪后再进行综合评价,最后挖掘出潜在的Burst 模式。实验结果表明,该方法准确有效。挖掘出的Burst模式给公司决策者在制定决策的时候提供参考帮助和支持。相似文献