共查询到20条相似文献,搜索用时 0 毫秒
1.
频繁项集挖掘中的两种哈希树构建方法 总被引:1,自引:0,他引:1
1 引言从大型数据库中发现频繁项集/模式的研究作为关联规则、序贯模式、因果关系、最大模式、多维模式等挖掘问题的核心,已经成为近年数据挖掘领域的研究热点,并有不少有效的挖掘算法被提出。在这些挖掘算法中,它们大多数都采用了类似于Apriori算法的方法进行频繁项集的挖掘与更新。类Apriori算法的共同特点是:为了找出库中所有包含k(k>1)个项的频繁k-项集,首先产生包含频 相似文献
2.
快速开采最大频繁项目集 总被引:95,自引:3,他引:95
发现最大频繁项目集是多种数据开采应用中的关键问题,提出一种快速开采最大频繁项目集的算法DMFI(discovery maximum frequent itemsets), 该算法把自底向上和自顶向下的搜索策略进行了合并。通过其独特的排序方法和有效的剪枝策略,大大减少了修选项目集的生成,从而显著地降低了CUP时间。 相似文献
3.
4.
From data to global generalized knowledge 总被引:1,自引:0,他引:1
Yen-Liang ChenAuthor Vitae Yu-Ying WuAuthor VitaeRay-I ChangAuthor Vitae 《Decision Support Systems》2012,52(2):295-307
The attribute-oriented induction (AOI) is a useful data mining method that extracts generalized knowledge from relational data and user's background knowledge. The method uses two thresholds, the relation threshold and attribute threshold, to guide the generalization process, and output generalized knowledge, a set of generalized tuples which describes the major characteristics of the target relation. Although AOI has been widely used in various applications, a potential weakness of this method is that it only provides a snapshot of the generalized knowledge, not a global picture. When thresholds are different, we would obtain different sets of generalized tuples, which also describe the major characteristics of the target relation. If a user wants to ascertain a global picture of induction, he or she must try different thresholds repeatedly. That is time-consuming and tedious. In this study, we propose a global AOI (GAOI) method, which employs the multiple-level mining technique with multiple minimum supports to generate all interesting generalized knowledge at one time. Experiment results on real-life dataset show that the proposed method is effective in finding global generalized knowledge. 相似文献
5.
6.
Laszlo Szathmary Petko Valtchev Amedeo Napoli 《International Journal of Software and Informatics》2010,4(3):219-238
Rare association rules correspond to rare, or infrequent, itemsets, as opposed
to frequent ones that are targeted by conventional pattern miners. Rare rules reflect regularities
of local, rather than global, scope that can nevertheless provide valuable insights
to an expert, especially in areas such as genetics and medical diagnosis where some specific
deviations/illnesses occur only in a small number of cases. The work presented here is motivated
by the long-standing open question of efficiently mining strong rare rules, i.e., rules
with high confidence and low support. We also propose an efficient solution for finding the
set of minimal rare itemsets. This set serves as a basis for generating rare association rules. 相似文献
7.
Fernando Alonso Juan P. Caraa-Valente Angel L. Gonzlez Csar Montes 《Expert systems with applications》2002,23(4)
The medical diagnosis system described here uses underlying knowledge in the isokinetic domain, obtained by combining the expertise of a physician specialised in isokinetic techniques and data mining techniques applied to a set of existing data. An isokinetic machine is basically a physical support on which patients exercise one of their joints, in this case the knee, according to different ranges of movement and at a constant speed. The data on muscle strength supplied by the machine are processed by an expert system that has built-in knowledge elicited from an expert in isokinetics. It cleans and pre-processes the data and conducts an intelligent analysis of the parameters and morphology of the isokinetic curves. Data mining methods based on the discovery of sequential patterns in time series and the fast Fourier transform, which identifies similarities and differences among exercises, were applied to the processed information to characterise injuries and discover reference patterns specific to populations. The results obtained were applied in two environments: one for the blind and another for elite athletes. 相似文献
8.
陈湘辉 《计算机测量与控制》2017,25(6):42-42
近年来,随着互联网技术飞速发展与普及,大量社交网络平台迅速崛起。社交网络平台拉近了日常人际关系,提供了便捷的信息通讯交流通道。同时,针对社交网络平台数据挖掘的技术研究成为不可缺少的网络数据研究领域一部分。现有社交网络数据挖掘技术所采用的传统数据挖掘算法与数据分离模式,存在大数据多元特征条件下,数据挖掘准确度降低、挖掘分类逻辑混乱等现象。针对问题产生根源,提出基于朴素贝叶斯算法的社交网络数据挖掘技术研究。采用基于朴素贝叶斯算法设计的PCIE-FN社交网络数据挖掘平台进行全面化的深入性解决。通过实验证明,提出的基于朴素贝叶斯算法的社交网络数据挖掘技术研究,各项数据满足社交网络数据挖掘日常应用要求。 相似文献
9.
数据挖掘的概念、系统结构和方法 总被引:12,自引:5,他引:7
毛国君 《计算机工程与设计》2002,23(8):13-17
首先对数据挖掘的概念及相关流派加以归纳,然后给出一个数据挖掘系统的体系结构,并通过它介绍数据挖掘系统的主要功能部件,最后对数据挖掘的主要方法进行分析。 相似文献
10.
基于约简概念格的关联规则提取改进算法* 总被引:2,自引:1,他引:2
概念格是关联规则挖掘领域中的一种重要技术,在概念格上生成所有的频繁项集需要对概念格的节点进行排序并进行一一比较。为了提高在概念格上生成频繁项集的效率,本文提出了一个基于约简概念格的生成频繁项集的新算法。该算法通过利用节点之间的父子关系能够直接生成生成全部频繁项集,省略了对节点进行排序的时间开销,并且大大减少了节点比较的次数,从而提高了频繁项集的生成效率。实验结果证明了其可靠性和高效性。 相似文献
11.
In data mining applications, it is important to develop evaluation methods for selecting quality and profitable rules. This paper utilizes a non-parametric approach, Data Envelopment Analysis (DEA), to estimate and rank the efficiency of association rules with multiple criteria. The interestingness of association rules is conventionally measured based on support and confidence. For specific applications, domain knowledge can be further designed as measures to evaluate the discovered rules. For example, in market basket analysis, the product value and cross-selling profit associated with the association rule can serve as essential measures to rule interestingness. In this paper, these domain measures are also included in the rule ranking procedure for selecting valuable rules for implementation. An example of market basket analysis is applied to illustrate the DEA based methodology for measuring the efficiency of association rules with multiple criteria. 相似文献
12.
文章研究了两个基本的关联规则推导关系,在此基础上建立了最大频繁集的关联规则矩阵视图,把一个频繁集生成的所有规则全部展现在一个矩阵中,并通过研究矩阵中的各规则元素的关系,得到一个频繁集或规则矩阵的基集和核(即最小规则集),可以从大型事务数据库生成的大量关联规则中挖掘出最小规则集和有用户感兴趣的规则。 相似文献
13.
Aleksey Buzmakov Elias Egho Nicolas Jay Sergei O. Kuznetsov Amedeo Napoli Chedy Raïssi 《国际通用系统杂志》2016,45(2):135-159
Nowadays data-sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of “complex” sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of formal concept analysis and its extension based on “pattern structures”. Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e. a data reduction of sequential structures) are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analysing interesting patient patterns from a French healthcare data-set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use-case which is the main motivation for this work. 相似文献
14.
近年来,需要深入研究癌症细胞的基因表达技术正在不断增多。机器学习算法已经被广泛用于当今世界的许多领域,但是却很少应用于生物信息领域。系统研究了决策树的生成、修剪的原理和算法以及其它与决策树相关的问题;并且根据CAMDA2000(critical assessment of mieroarray data analysis)提供的急性淋巴白血病(ALL)和急性骨髓白血病(AML)数据集,设计并实现了一个基于ID3算法的决策树分类器,并利用后剪枝算法简化决策树。最后通过实验验证算法的有效性,实验结果表明利用该决策树分类器对白血病微阵列实验数据进行判别分析,分类准确率很高,证明了决策树算法在医学数据挖掘领域有着广泛的应用前景。 相似文献
15.
序列模式挖掘的一种渐进算法 总被引:24,自引:0,他引:24
序列模式挖掘是数据挖掘中最重要的研究课题之一,基于时序相关数据的序列模式挖掘有其自身的特色。作者提出一种渐进式序列模式挖掘算法IMSP,目的是在数据库变化不大时,能够利用前次的结果,加速本次挖掘过程。 相似文献
16.
Estimating the effect of word of mouth on churn and cross-buying in the mobile phone market with Markov logic networks 总被引:1,自引:0,他引:1
Torsten DierkesAuthor VitaeMartin BichlerAuthor Vitae Ramayya KrishnanAuthor Vitae 《Decision Support Systems》2011,51(3):361-371
Much has been written about word of mouth and customer behavior. Telephone call detail records provide a novel way to understand the strength of the relationship between individuals. In this paper, we predict using call detail records the impact that the behavior of one customer has on another customer's decisions. We study this in the context of churn (a decision to leave a communication service provider) and cross-buying decisions based on an anonymized data set from a telecommunications provider. Call detail records are represented as a weighted graph and a novel statistical learning technique, Markov logic networks, is used in conjunction with logit models based on lagged neighborhood variables to develop the predictive model. In addition, we propose an approach to propositionalization tailored to predictive modeling with social network data. The results show that information on the churn of network neighbors has a significant positive impact on the predictive accuracy and in particular the sensitivity of churn models. The results provide evidence that word of mouth has a considerable impact on customers' churn decisions and also on the purchase decisions, leading to a 19.5% and 8.4% increase in sensitivity of predictive models. 相似文献
17.
18.
基于iceberg概念格并置集成的闭频繁项集挖掘算法 总被引:2,自引:0,他引:2
由于概念格的完备性,在基于概念格的数据挖掘过程中,构造概念格的时间复杂度和空间复杂度一直是影响其应用的主要因素.结合iceberg概念格的半格特性和概念格的集成思想,首先在理论上分析并置集成后的iceberg概念格与由完备概念格裁剪得到的iceberg格同构;然后分析了iceberg概念格集成过程中的映射关系;最终提出一个新颖的基于iceberg概念格并置的闭频繁项集挖掘算法(Icegalamera).此算法避免了完备概念格的计算,并且在构造过程中采用集成和剪枝策略,从而显著提高了挖掘效率.实验证明其产生的闭频繁项集的完备性.使用稠密和稀疏数据集在单站点模式下进行了性能测试,结果表明稀疏数据集上性能优势明显. 相似文献
19.
Jaegyoon Ahn 《Information Sciences》2011,181(3):435-449
Biclusters are subsets of genes that exhibit similar behavior over a set of conditions. A biclustering algorithm is a useful tool for uncovering groups of genes involved in the same cellular processes and groups of conditions under which these processes take place. In this paper, we propose a polynomial time algorithm to identify functionally highly correlated biclusters. Our algorithm identifies (1) gene sets that simultaneously exhibit additive, multiplicative, and combined patterns and allow high levels of noise, (2) multiple, possibly overlapped, and diverse gene sets, (3) biclusters that simultaneously exhibit negatively and positively correlated gene sets, and (4) gene sets for which the functional association is very high. We validate the level of functional association in our method by using the GO database, protein-protein interactions and KEGG pathways. 相似文献
20.
Shouhong 《Data & Knowledge Engineering》2002,40(3):273-283
This paper reports on conceptual development in applications of neural networks to data mining and knowledge discovery. Hypothesis generation is one of the significant differences of data mining from statistical analyses. Nonlinear pattern hypothesis generation is a major task of data mining and knowledge discovery. Yet, few methods of nonlinear pattern hypothesis generation are available.
This paper proposes a model of data mining to support nonlinear pattern hypothesis generation. This model is an integration of linear regression analysis model, Kohonen's self-organizing maps, the algorithm for convex polytopes, and back-propagation neural networks. 相似文献