首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于支持度的关联规则挖掘算法无法找到那些非频繁但效用很高的项集,基于效用的关联规则会漏掉那些效用不高但发生比较频繁、支持度和效用值的积(激励)很大的项集。提出了基于激励的关联规则挖掘问题及一种自下而上的挖掘算法HM-miner。激励综合了支持度与效用的优点,能同时度量项集的统计重要性和语义重要性。HM-miner利用激励的上界特性进行减枝,能有效挖掘高激励项集。  相似文献   

2.
This paper introduces a new approach to a problem of data sharing among multiple parties, without disclosing the data between the parties. Our focus is data sharing among parties involved in a data mining task. We study how to share private or confidential data in the following scenario: multiple parties, each having a private data set, want to collaboratively conduct association rule mining without disclosing their private data to each other or any other parties. To tackle this demanding problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define a protocol using homomorphic encryption techniques to exchange the data while keeping it private.  相似文献   

3.
This paper proposes a methodology for text mining relying on the classical knowledge discovery loop, with a number of adaptations. First, texts are indexed and prepared to be processed by frequent itemset levelwise search. Association rules are then extracted and interpreted, with respect to a set of quality measures and domain knowledge, under the control of an analyst. The article includes an experimentation on a real-world text corpus holding on molecular biology.  相似文献   

4.
关联规则挖掘是数据挖掘问题中一个典型任务。其挖掘响应时间是数据挖掘系统中重要的问题之一。为了高效解决这一问题,给出了关联规则实视图的概念以及相应的代价模型;提出了针对数据挖掘环境的实视图选择算法,以便在存储空间约束的条件下,取得较好的查询性能。实验结果表明,该算法能有效地选取实视图,从而大大提高关联规则挖掘算法的效率。  相似文献   

5.
关联规则挖掘Apriori算法的改进   总被引:3,自引:0,他引:3  
在分析研究关联规则挖掘Apriori算法及其若干改进算法的基础上,对Apriori算法做了进一步地改进,提出一种基于条件判断的新思想.改进后的算法根据条件采用了事务压缩与候选项压缩的相结合的方式,减小了不必要的开销,从而提高了挖掘速度.  相似文献   

6.
Mining association rules plays an important role in data mining and knowledge discovery since it can reveal strong associations between items in databases. Nevertheless, an important problem with traditional association rule mining methods is that they can generate a huge amount of association rules depending on how parameters are set. However, users are often only interested in finding the strongest rules, and do not want to go through a large amount of rules or wait for these rules to be generated. To address those needs, algorithms have been proposed to mine the top-k association rules in databases, where users can directly set a parameter k to obtain the k most frequent rules. However, a major issue with these techniques is that they remain very costly in terms of execution time and memory. To address this issue, this paper presents a novel algorithm named ETARM (Efficient Top-k Association Rule Miner) to efficiently find the complete set of top-k association rules. The proposed algorithm integrates two novel candidate pruning properties to more effectively reduce the search space. These properties are applied during the candidate selection process to identify items that should not be used to expand a rule based on its confidence, to reduce the number of candidates. An extensive experimental evaluation on six standard benchmark datasets show that the proposed approach outperforms the state-of-the-art TopKRules algorithm both in terms of runtime and memory usage.  相似文献   

7.
利用时态关联规则的分析,可以得到一系列相关性的项目集合,从而为决策提供更加有利的帮助和支持。在研究了传统的静态关联规则的基础上,提出了一种以交易规模的变化率为处理对象,即考虑各类项目交易量的变动状况的时态关联规则的表述与挖掘方法,并对其表述形式及算法实现进行了探讨。  相似文献   

8.
为了使传统的关联规则挖掘算法在结合到具体领域时具有更强的适应性,提出了DS-Apriori算法。该算法建立在语义本体的基础上,根据项集内部的语义相关度动态的确定该项集的最小支持度,并采用了项集语义相关度的增量计算方法。实验结果表明,DS-Apriori算法在很大程度上提高了关联规则挖掘算法的效率和效果。  相似文献   

9.
Pattern Analysis and Applications - Rare association rule mining is an imperative field of data mining that attempts to identify rare correlations among the items in a database. Although numerous...  相似文献   

10.
Two parameters, namely support and confidence, in association rule mining, are used to arrange association rules in either increasing or decreasing order. These two parameters are assigned values by counting the number of transactions satisfying the rule without considering user perspective. Hence, an association rule, with low values of support and confidence, but meaningful to the user, does not receive the same importance as is perceived by the user. Reflecting user perspective is of paramount importance in light of improving user satisfaction for a given recommendation system. In this paper, we propose a model and an algorithm to extract association rules, meaningful to a user, with an ad-hoc support and confidence by allowing the user to specify the importance of each transaction. In addition, we apply the characteristics of a concept lattice, a core data structure of Formal Concept Analysis (FCA) to reflect subsumption relation of association rules when assigning the priority to each rule. Finally, we describe experiment results to verify the potential and efficiency of the proposed method.  相似文献   

11.
In the rapidly changing financial market, investors always have difficulty in deciding the right time to trade. In order to enhance investment profitability, investors desire a decision support system. The proposed artificial intelligence methodology provides investors with the ability to learn the association among different parameters. After the associations are extracted, investors can apply the rules in their decision support systems. In this work, the model is built with the ultimate goal of predicting the level of the Hang Seng Index in Hong Kong. The movement of Hang Seng Index, which is associated with other economics indices including the gross domestic product (GDP) index, the consumer price index (CPI), the interest rate, and the export value of goods from Hong Kong, is learnt by the proposed method. The case study shows that the proposed method is a feasible way to provide decision support for investors who may not be able to identify the hidden rules between the Hang Seng Index and other economics indices.  相似文献   

12.
数据挖掘技术可以从收集到的大量数据集中挖掘出潜在的知识,这就可能把涉及到个人隐私的信息挖掘出来,从而产生了隐私保护下的数据挖掘。首先分析了国外学者Rizvi提出的隐私保护关联规则挖掘算法MASK,然后使用分治策略对MASK进行了改进。时间复杂度分析和实验结果均表明,对MASK算法的改进是有效的。  相似文献   

13.
The most computationally demanding aspect of Association Rule Mining is the identification and counting of support of the frequent sets of items that occur together sufficiently often to be the basis of potentially interesting rules. The task increases in difficulty with the scale of the data and also with its density. The greatest challenge is posed by data that is too large to be contained in primary memory, especially when high data density and/or low support thresholds give rise to very large numbers of candidates that must be counted. In this paper, we consider strategies for partitioning the data to deal effectively with such cases. We describe a partitioning approach which organises the data into tree structures that can be processed independently. We present experimental results that show the method scales well for increasing dimensions of data and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds. Shakil Ahmed received a first class BSc (Hons) degree from Dhaka University, Bangladesh, in 1990; and an MSc (first class), also Dhaka University, in 1992. He received his PhD from The University of Liverpool, UK, in 2005. From 2000 onwards he is a member of the Data Mining Group at the Department of Computer Science of the University of Liverpool, UK. His research interests include data mining, Association Rule Mining and pattern recognition. Frans Coenen has been working in the field of Data Mining for many years and has written widely on the subject. He received his PhD from Liverpool Polytechnic in 1989, after which he took up a post as a RA within the Department of Computer Science at the University of Liverpool. In 1997, he took up a lecturing post within the same department. His current Data Mining research interests include Association rule Mining, Classification algorithms and text mining. He is on the programme committee for ICDM'05 and was the chair for the UK KDD symposium (UKKDD'05). Paul Leng is professor of e-Learning at the University of Liverpool and director of the e-Learning Unit, which is responsible for overseeing the University's online degree programmes, leading to degrees of MSc in IT and MBA. Along with e-Learning, his main research interests are in Data Mining, especially in methods of discovering Association Rules. In collaboration with Frans Coenen, he has developed efficient new algorithms for finding frequent sets and is exploring applications in text mining and classification.  相似文献   

14.
Recently, a utility-based mining approach has emerged as an alternative mechanism to frequency-based mining in an attempt to reflect not only the statistical correlation but also the semantic significance (e.g., price and quantity) of items. However, existing mining trajectories utilizing high-utility itemsets may not offer firms sufficient business insights unless they can precisely assess the value of association rules, which may vary substantially depending on many business parameters included in the assessment. In this study, we propose a utility-based association-rule mining method that valuates association rules by measuring their specific business benefits accruing to firms. Based on previous studies, three key elements (opportunity, effectiveness, and probability) are identified to define and operationalize a users’ preference as a utility function. To apply the utility-based mechanism to the processing of large transaction databases, we constructed functional algorithms, with heightened attention paid to their pruning strategies, and evaluated them based on real-world databases. Experimental results show that the proposed approach can provide users with greater business benefits than the high-utility itemset mining approach, suggesting several important strategic implications for both research and practice.  相似文献   

15.
Data structure for association rule mining: T-trees and P-trees   总被引:1,自引:0,他引:1  
Two new structures for association rule mining (ARM), the T-tree, and the P-tree, together with associated algorithms, are described. The authors demonstrate that the structures and algorithms offer significant advantages in terms of storage and execution time.  相似文献   

16.
The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying indicators to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder if a so-called “interesting” rule noted LHSRHS is meaningful when 30% of the LHS data are not up-to-date anymore, 20% of the RHS data are not accurate, and 15% of the LHS data come from a data source that is well-known for its bad credibility. This paper presents an overview of data quality characterization and management techniques that can be advantageously employed for improving the quality awareness of the knowledge discovery and data mining processes. We propose to integrate data quality indicators for quality aware association rule mining. We propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-Cup-98 datasets show that variations on data quality have a great impact on the cost and quality of discovered association rules and confirm our approach for the integrated management of data quality indicators into the KDD process that ensure the quality of data mining results.  相似文献   

17.
Privacy preserving association rule mining has been an active research area since recently. To this problem, there have been two different approaches—perturbation based and secure multiparty computation based. One drawback of the perturbation based approach is that it cannot always fully preserve individual’s privacy while achieving precision of mining results. The secure multiparty computation based approach works only for distributed environment and needs sophisticated protocols, which constrains its practical usage. In this paper, we propose a new approach for preserving privacy in association rule mining. The main idea is to use keyed Bloom filters to represent transactions as well as data items. The proposed approach can fully preserve privacy while maintaining the precision of mining results. The tradeoff between mining precision and storage requirement is investigated. We also propose δ-folding technique to further reduce the storage requirement without sacrificing mining precision and running time.  相似文献   

18.
符燕华  顾嗣扬 《计算机应用》2006,26(1):213-0215
利用数量积方法从垂直型分布数据中挖掘关联规则,并且保持其隐私性。给出了数量积算法,分析其安全性,同时还举例说明如何利用数量积算法进行垂直型分布式数据挖掘。  相似文献   

19.
因果关联规则是知识库中一类重要的知识类型,具有重要的应用价值。首先对因果关系的特殊性质进行了分析,然后基于语言场和广义归纳逻辑因果模型,从表示、挖掘、评价和应用几方面,对因果关联规则的研究进行了详细论述。并在此基础上提出了隐含因果关联规则的概念。通过语言场和推理机制的运用,使因果关联规则这一重要知识形式的挖掘和评价过程具有良好的逻辑性和扩张性。  相似文献   

20.
针对构建FP-Tree时存在的大量内存消耗问题,提出了CCFP(constraint clip FP-tree)算法,该算法利用有项和缺项约束对事务数据库进行修剪后构造简化的FP-Tree,经再一次扫描后得到关联规则.实验结果表明:该算法较一般的FP-Tree算法能节省大量的内存空间,同时,运行效率也略有提高.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号