首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
频繁项集挖掘中的两种哈希树构建方法   总被引:1,自引:0,他引:1  
1 引言从大型数据库中发现频繁项集/模式的研究作为关联规则、序贯模式、因果关系、最大模式、多维模式等挖掘问题的核心,已经成为近年数据挖掘领域的研究热点,并有不少有效的挖掘算法被提出。在这些挖掘算法中,它们大多数都采用了类似于Apriori算法的方法进行频繁项集的挖掘与更新。类Apriori算法的共同特点是:为了找出库中所有包含k(k>1)个项的频繁k-项集,首先产生包含频  相似文献   

2.
快速开采最大频繁项目集   总被引:95,自引:3,他引:95  
路松锋  卢正鼎 《软件学报》2001,12(2):293-297
发现最大频繁项目集是多种数据开采应用中的关键问题,提出一种快速开采最大频繁项目集的算法DMFI(discovery maximum frequent itemsets), 该算法把自底向上和自顶向下的搜索策略进行了合并。通过其独特的排序方法和有效的剪枝策略,大大减少了修选项目集的生成,从而显著地降低了CUP时间。  相似文献   

3.
MAXFP-M iner: 利用FP- tree 快速挖掘最大频繁项集   总被引:3,自引:0,他引:3  
为提高频繁项集的挖掘效率,提出了最大频繁项集树的概念和基于FP-tree的最大频繁项集挖掘算法MAXFP-Miner,首先建立了FP-tree,在此基础上建立最大频繁项集树MAXFP-tree,MAXFP-tree中包含了所有最大频繁项集,缩小了搜索空间,提高了算法的效率,算法分析和实验表明,该算法特别适合于挖掘稠密型及具有长频繁项集的数据集。  相似文献   

4.
From data to global generalized knowledge   总被引:1,自引:0,他引:1  
The attribute-oriented induction (AOI) is a useful data mining method that extracts generalized knowledge from relational data and user's background knowledge. The method uses two thresholds, the relation threshold and attribute threshold, to guide the generalization process, and output generalized knowledge, a set of generalized tuples which describes the major characteristics of the target relation. Although AOI has been widely used in various applications, a potential weakness of this method is that it only provides a snapshot of the generalized knowledge, not a global picture. When thresholds are different, we would obtain different sets of generalized tuples, which also describe the major characteristics of the target relation. If a user wants to ascertain a global picture of induction, he or she must try different thresholds repeatedly. That is time-consuming and tedious. In this study, we propose a global AOI (GAOI) method, which employs the multiple-level mining technique with multiple minimum supports to generate all interesting generalized knowledge at one time. Experiment results on real-life dataset show that the proposed method is effective in finding global generalized knowledge.  相似文献   

5.
一种新的普遍化关联规则挖掘算法   总被引:1,自引:0,他引:1  
提出了一种新颖的普遍化关联规则挖掘算法GARL。该算法连续扫描数据库事务序列,在最多不超过两遍扫描后生成所有频繁项目集,在首次扫描数据库时,能为用户给出反馈信息,允许用户对最小支持率进行调整,该算法能连续处理事务序列,可用于网上在线数据挖掘。  相似文献   

6.
Rare association rules correspond to rare, or infrequent, itemsets, as opposed to frequent ones that are targeted by conventional pattern miners. Rare rules reflect regularities of local, rather than global, scope that can nevertheless provide valuable insights to an expert, especially in areas such as genetics and medical diagnosis where some specific deviations/illnesses occur only in a small number of cases. The work presented here is motivated by the long-standing open question of efficiently mining strong rare rules, i.e., rules with high confidence and low support. We also propose an efficient solution for finding the set of minimal rare itemsets. This set serves as a basis for generating rare association rules.  相似文献   

7.
The medical diagnosis system described here uses underlying knowledge in the isokinetic domain, obtained by combining the expertise of a physician specialised in isokinetic techniques and data mining techniques applied to a set of existing data. An isokinetic machine is basically a physical support on which patients exercise one of their joints, in this case the knee, according to different ranges of movement and at a constant speed. The data on muscle strength supplied by the machine are processed by an expert system that has built-in knowledge elicited from an expert in isokinetics. It cleans and pre-processes the data and conducts an intelligent analysis of the parameters and morphology of the isokinetic curves. Data mining methods based on the discovery of sequential patterns in time series and the fast Fourier transform, which identifies similarities and differences among exercises, were applied to the processed information to characterise injuries and discover reference patterns specific to populations. The results obtained were applied in two environments: one for the blind and another for elite athletes.  相似文献   

8.
近年来,随着互联网技术飞速发展与普及,大量社交网络平台迅速崛起。社交网络平台拉近了日常人际关系,提供了便捷的信息通讯交流通道。同时,针对社交网络平台数据挖掘的技术研究成为不可缺少的网络数据研究领域一部分。现有社交网络数据挖掘技术所采用的传统数据挖掘算法与数据分离模式,存在大数据多元特征条件下,数据挖掘准确度降低、挖掘分类逻辑混乱等现象。针对问题产生根源,提出基于朴素贝叶斯算法的社交网络数据挖掘技术研究。采用基于朴素贝叶斯算法设计的PCIE-FN社交网络数据挖掘平台进行全面化的深入性解决。通过实验证明,提出的基于朴素贝叶斯算法的社交网络数据挖掘技术研究,各项数据满足社交网络数据挖掘日常应用要求。  相似文献   

9.
数据挖掘的概念、系统结构和方法   总被引:12,自引:5,他引:7  
首先对数据挖掘的概念及相关流派加以归纳,然后给出一个数据挖掘系统的体系结构,并通过它介绍数据挖掘系统的主要功能部件,最后对数据挖掘的主要方法进行分析。  相似文献   

10.
基于约简概念格的关联规则提取改进算法*   总被引:2,自引:1,他引:2  
陈湘  吴跃 《计算机应用研究》2011,28(4):1293-1295
概念格是关联规则挖掘领域中的一种重要技术,在概念格上生成所有的频繁项集需要对概念格的节点进行排序并进行一一比较。为了提高在概念格上生成频繁项集的效率,本文提出了一个基于约简概念格的生成频繁项集的新算法。该算法通过利用节点之间的父子关系能够直接生成生成全部频繁项集,省略了对节点进行排序的时间开销,并且大大减少了节点比较的次数,从而提高了频繁项集的生成效率。实验结果证明了其可靠性和高效性。  相似文献   

11.
In data mining applications, it is important to develop evaluation methods for selecting quality and profitable rules. This paper utilizes a non-parametric approach, Data Envelopment Analysis (DEA), to estimate and rank the efficiency of association rules with multiple criteria. The interestingness of association rules is conventionally measured based on support and confidence. For specific applications, domain knowledge can be further designed as measures to evaluate the discovered rules. For example, in market basket analysis, the product value and cross-selling profit associated with the association rule can serve as essential measures to rule interestingness. In this paper, these domain measures are also included in the rule ranking procedure for selecting valuable rules for implementation. An example of market basket analysis is applied to illustrate the DEA based methodology for measuring the efficiency of association rules with multiple criteria.  相似文献   

12.
文章研究了两个基本的关联规则推导关系,在此基础上建立了最大频繁集的关联规则矩阵视图,把一个频繁集生成的所有规则全部展现在一个矩阵中,并通过研究矩阵中的各规则元素的关系,得到一个频繁集或规则矩阵的基集和核(即最小规则集),可以从大型事务数据库生成的大量关联规则中挖掘出最小规则集和有用户感兴趣的规则。  相似文献   

13.
Nowadays data-sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of “complex” sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of formal concept analysis and its extension based on “pattern structures”. Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e. a data reduction of sequential structures) are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analysing interesting patient patterns from a French healthcare data-set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use-case which is the main motivation for this work.  相似文献   

14.
项婧  任劼 《计算机工程与设计》2006,27(15):2905-2908
近年来,需要深入研究癌症细胞的基因表达技术正在不断增多。机器学习算法已经被广泛用于当今世界的许多领域,但是却很少应用于生物信息领域。系统研究了决策树的生成、修剪的原理和算法以及其它与决策树相关的问题;并且根据CAMDA2000(critical assessment of mieroarray data analysis)提供的急性淋巴白血病(ALL)和急性骨髓白血病(AML)数据集,设计并实现了一个基于ID3算法的决策树分类器,并利用后剪枝算法简化决策树。最后通过实验验证算法的有效性,实验结果表明利用该决策树分类器对白血病微阵列实验数据进行判别分析,分类准确率很高,证明了决策树算法在医学数据挖掘领域有着广泛的应用前景。  相似文献   

15.
序列模式挖掘的一种渐进算法   总被引:24,自引:0,他引:24  
周斌  吴泉源 《计算机学报》1999,22(8):882-887
序列模式挖掘是数据挖掘中最重要的研究课题之一,基于时序相关数据的序列模式挖掘有其自身的特色。作者提出一种渐进式序列模式挖掘算法IMSP,目的是在数据库变化不大时,能够利用前次的结果,加速本次挖掘过程。  相似文献   

16.
Much has been written about word of mouth and customer behavior. Telephone call detail records provide a novel way to understand the strength of the relationship between individuals. In this paper, we predict using call detail records the impact that the behavior of one customer has on another customer's decisions. We study this in the context of churn (a decision to leave a communication service provider) and cross-buying decisions based on an anonymized data set from a telecommunications provider. Call detail records are represented as a weighted graph and a novel statistical learning technique, Markov logic networks, is used in conjunction with logit models based on lagged neighborhood variables to develop the predictive model. In addition, we propose an approach to propositionalization tailored to predictive modeling with social network data. The results show that information on the churn of network neighbors has a significant positive impact on the predictive accuracy and in particular the sensitivity of churn models. The results provide evidence that word of mouth has a considerable impact on customers' churn decisions and also on the purchase decisions, leading to a 19.5% and 8.4% increase in sensitivity of predictive models.  相似文献   

17.
一种时序数据的离群数据挖掘新算法   总被引:11,自引:0,他引:11  
离群数据挖掘是数据挖掘的重要内容,针对时序数据进行离群数据挖掘方法的研究。首先通过对时序数据进行离散傅立叶变换将其从时域空间变换到频域空间,将时序数据映射为多维空间的点,在此基础上,提出一种新的基于距离的离群数据挖掘算法。对某钢铁企业电力负荷时序数据进行仿真实验,结果表明了算法的有效性。  相似文献   

18.
基于iceberg概念格并置集成的闭频繁项集挖掘算法   总被引:2,自引:0,他引:2  
由于概念格的完备性,在基于概念格的数据挖掘过程中,构造概念格的时间复杂度和空间复杂度一直是影响其应用的主要因素.结合iceberg概念格的半格特性和概念格的集成思想,首先在理论上分析并置集成后的iceberg概念格与由完备概念格裁剪得到的iceberg格同构;然后分析了iceberg概念格集成过程中的映射关系;最终提出一个新颖的基于iceberg概念格并置的闭频繁项集挖掘算法(Icegalamera).此算法避免了完备概念格的计算,并且在构造过程中采用集成和剪枝策略,从而显著提高了挖掘效率.实验证明其产生的闭频繁项集的完备性.使用稠密和稀疏数据集在单站点模式下进行了性能测试,结果表明稀疏数据集上性能优势明显.  相似文献   

19.
Biclusters are subsets of genes that exhibit similar behavior over a set of conditions. A biclustering algorithm is a useful tool for uncovering groups of genes involved in the same cellular processes and groups of conditions under which these processes take place. In this paper, we propose a polynomial time algorithm to identify functionally highly correlated biclusters. Our algorithm identifies (1) gene sets that simultaneously exhibit additive, multiplicative, and combined patterns and allow high levels of noise, (2) multiple, possibly overlapped, and diverse gene sets, (3) biclusters that simultaneously exhibit negatively and positively correlated gene sets, and (4) gene sets for which the functional association is very high. We validate the level of functional association in our method by using the GO database, protein-protein interactions and KEGG pathways.  相似文献   

20.
This paper reports on conceptual development in applications of neural networks to data mining and knowledge discovery. Hypothesis generation is one of the significant differences of data mining from statistical analyses. Nonlinear pattern hypothesis generation is a major task of data mining and knowledge discovery. Yet, few methods of nonlinear pattern hypothesis generation are available.

This paper proposes a model of data mining to support nonlinear pattern hypothesis generation. This model is an integration of linear regression analysis model, Kohonen's self-organizing maps, the algorithm for convex polytopes, and back-propagation neural networks.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号