共查询到17条相似文献,搜索用时 93 毫秒
1.
2.
数据流的特点要求挖掘算法只能经过一次扫描获得挖掘结果,并且要求较低的空间复杂度。结合数据流的特点,提出一种基于滑动窗口的数据流频繁项集挖掘新算法MFIM。该算法采用二进制向量矩阵表示滑动窗口中的事务序列,以这种新的结构来记录频繁项集的动态变化,有效地挖掘数据流频繁项集。理论分析与实验结果表明该算法能获得较好的时间复杂度与空间复杂度。 相似文献
3.
4.
利用Apriori算法和FP-growth算法挖掘密集型数据集的全部频繁项集代价高昂,针对该问题提出一种基于链表数组的关联规则挖掘算法,该方法使用链表数组为每个项目建立事务链表,只需要扫描数据库1次,就能够快速得到每个候选项的支持度,从而有效的发现频繁项集。通过与经典算法分析对比表明,该算法具有较快的挖掘速度。 相似文献
5.
为改进基于数据库垂直表示的频繁项集挖掘算法的性能,给出了用索引数组方法来改进计算性能的思路.提出了索引数组的概念及其计算方法,并提出了一种新的高效的频繁项集挖掘算法Index-FIMiner.该算法大大减少了不必要的tidset求交及相应的频繁性判断操作,同时也论证了代表项可直接与其包含索引中的所有项集的组合进行连接,这些结果项集的支持度均与代表项的支持度相等,从而降低了这些频繁项集的处理代价,提高了算法的性能.实验结果表明,Index-FIMiner算法具有较高的挖掘效率. 相似文献
6.
7.
Apriori算法是当前使用最广泛的关联规则挖掘方法中最为经典的算法之一;但是该算法需要反复的扫描数据库,在I/O上花消很大,并且在得到频繁-2项集的过程中会产生庞大的候选-2项集,其次在筛选得到频繁-k项集时,并没排除那些不应该参组合的元素,而导致该算法效率很低,针对上面影响计算效率的三个方面提出基于压缩事务矩阵相乘得到频繁项目集的算法,只需一次扫描数据库,经过压缩处理产生产生事务矩阵,通过矩阵间运算得到频繁项目集,有效提高了关联规则的挖掘效率。 相似文献
8.
针对装备认知测试性智能决策问题,提出基于云和频繁项集的认知测试性诊断方案权衡优化方法。研究装备认知测试性中信息流在定性域和定量域的描述和转换方法,给出基于数据概要的中心云产生方法,实现事务数据清洗与筛选;研究基于频繁项集和新增项集的数据挖掘方法,提出基于2-范数及协方差的数据相关性分析方法,实现基于云和频繁项集的认知测试性诊断方案权衡优化的数据挖掘过程;得到基于存储层-云层-应用层-决策层的认知测试性仿真诊断与权衡优化模型,并对该模型进行补充说明。该方案可为装备认知测试性诊断方案权衡优化的智能化发展奠定基础。 相似文献
9.
10.
在数据挖掘研究中,关联规则挖掘作为数据挖掘研究中的一个重要部分,引起越来越多的关注。因此,主要研究关联规则挖掘,首先介绍关联规则挖掘的一些基础知识、概念描述等,然后对关联规则挖掘的常用算法进行分类探讨,最后分析其中的几种典型算法。 相似文献
11.
Frequent pattern mining is the most important phase of association rule mining process because of its time and space complexity. Several methods have attempted to improve the performance of association rule mining by enhancing frequent pattern mining efficiency. Due to the large size of the data-sets and huge amounts of data which should be mined, many parallel and distributed mining approaches have been introduced to divide data-sets or to distribute mining processes between multiple processors or computers and thus, improve the efficiency of the mining process. In this paper, we propose a hadoop-based parallel implementation of PrePost+ algorithm for frequent itemset mining. In our parallel approach, the process of constructing N-Lists of itemsets has been distributed between the mappers and the operation of the final pruning process and extracting frequent itemsets has been carried out by reducers in a map-reduce parallel programming model. The experimental results show that our hadoop-based PrePost+(HBPrePost+) algorithm outperforms one of the best existing parallel methods of frequent itemset mining (PARMA) in terms of execution time. 相似文献
12.
Uncertain frequent pattern mining has been much discussed in recent decades. It is widely used in various fields and helps analysts to comprehend the deep meaning of collected data from the frequencies of items. In past studies, researchers have focused on discrete models. However, a discrete model only explains the presence of combinations of items without giving specific data intervals. To compensate for the drawbacks of discrete models, we focus on continuous uncertain data and improve a continuous uncertain frequent tree for the extraction of frequent patterns, notably time costs. Attribute overlapping usually causes the high time cost in the extraction phase. To avoid long branches in the tree, two approaches are proposed. The first approach is to name each attribute at given level with an uncertain frequent pattern. By using links and reshaping the uncertain frequency tree, the number of combinations decreases. The second approach is called uncertain frequent pattern map transforming. It uses a discrete transformation to decrease the time cost. In experiments, our two approaches were compared with different mainstream approaches. According to the results, our approaches not only cost less time to explore frequent patterns but also exhibited high accuracy for continuous uncertain data. 相似文献
13.
《中国工程学刊》2012,35(5):547-554
Development of least association rules (ARs) mining algorithms is one of the more challenging areas in data mining. Exclusive measurements, complexity and excessive computational cost are the main obstacles as compared to frequent pattern mining. Indeed, most previous studies still use the Apriori-like algorithms. To address this issue, this article proposes a new correlation measurement called definite factor (DF) and a scalable trie-based algorithm named significant least pattern growth (SLP-Growth). This algorithm generates the least patterns based on interval support and finally determines it significances using DF. Experiments with the real datasets show that the SLP-Growth can discover highly positive correlated and significant of least ARs. Indeed, it also outperforms the fast frequent pattern-Growth algorithm up to two times, thus verifying its efficiency. 相似文献
14.
多关系频繁模式发现能够直接从复杂结构化数据中发现涉及多个关系的复杂频繁模式,避免了传统方法的局限。有别于主流基于归纳逻辑程序设计技术的方法,提出了基于合取查询包含关系的面向语义的精简化多关系频繁模式发现方法,具有理论与技术基础的新颖性,解决了两种语义冗余问题。实验表明,该方法在可理解性、功能、效率以及可扩展性方面具有优势。 相似文献
15.
Devashish Das Yong Chen Shiyu Zhou Crispian Sievenpiper 《Quality and Reliability Engineering International》2016,32(4):1307-1319
Multiple streams of binary data occur commonly in practice. In this paper, we propose a hierarchical statistical model to describe multi‐stream binary data that demonstrate over‐dispersion. In such a model, a group of binary streams in a multi‐stream dataset is modeled by a beta‐binominal hierarchical mixture distribution. Using this hierarchical model structure, a cumulative sum (CUSUM) chart based on the log‐likelihood ratio is developed to monitor all the data streams simultaneously. The performance of the CUSUM chart is investigated and compared to conventional monitoring schemes through numerical studies and a real‐world dataset. It is shown that the CUSUM method using the hierarchical model is effective and advantageous over the conventional methods. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
16.
“Sequential pattern mining” is a prominent and significant method to explore the knowledge and innovation from the large database. Common sequential pattern mining algorithms handle static databases. Pragmatically, looking into the functional and actual execution, the database grows exponentially thereby leading to the necessity and requirement of such innovation, research, and development culminating into the designing of mining algorithm. Once the database is updated, the previous mining result will be incorrect, and we need to restart and trigger the entire mining process for the new updated sequential database. To overcome and avoid the process of rescanning of the entire database, this unique system of incremental mining of sequential pattern is available. The previous approaches, system, and techniques are a priori-based frameworks but mine patterns is an advanced and sophisticated technique giving the desired solution. We propose and incorporate an algorithm called STISPM for incremental mining of sequential patterns using the sequence tree space structure. STISPM uses the depth-first approach along with backward tracking and the dynamic lookahead pruning strategy that removes infrequent and irregular patterns. The process and approach from the root node to any leaf node depict a sequential pattern in the database. The structural characteristic of the sequence tree makes it convenient and appropriate for incremental sequential pattern mining. The sequence tree also stores all the sequential patterns with its count and statistics, so whenever the support system is withdrawn or changed, our algorithm using frequent sequence tree as the storage structure can find and detect all the sequential patterns without mining the database once again. 相似文献