首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 11 毫秒
1.
传统的关联规则挖掘是单向的,不能确定相互依赖的规则,找到的规则不一定是有意义的,甚至是错误的。鉴于此,本文在分析的基础上,提出双向关联规则挖掘算法。并根据其相关性找出对我们有意义的规则。  相似文献   

2.
FP-tree上频繁概念格的无冗余关联规则提取   总被引:1,自引:0,他引:1  
为解决经典关联规则生成算法挖掘效率低及形成规则冗余性大的问题,提出在FP-tree基础上直接生成频繁概念格并提取无冗余关联规则的算法。其建格过程根据FP-tree频繁项目头表中各项的索引可分别独立进行,由支持度计数约束进行结点的筛选,形成频繁概念格的Hasse图,图中结点包含频繁项集及其支持度计数信息,通过对全部叶子结点的扫描可生成无冗余关联规则。通过实例验证该算法行之有效。  相似文献   

3.
.基于规则提取量的Web日志关联规则挖掘方法*   总被引:2,自引:0,他引:2  
引入规则提取量的度量标准,提出一种基于免疫多克隆遗传策略的Web日志关联规则挖掘方法。该算法在遗传算法的基础上引入免疫多克隆算子,有效地克服了遗传算法容易陷入局部最优的缺点,具有更强的全局与局部搜索能力。实验结果表明,该算法能高效地解决Web日志关联规则挖掘问题。  相似文献   

4.
5.
Learning to rank, a task to learn ranking functions to sort a set of entities using machine learning techniques, has recently attracted much interest in information retrieval and machine learning research. However, most of the existing work conducts a supervised learning fashion. In this paper, we propose a transductive method which extracts paired preference information from the unlabeled test data. Then we design a loss function to incorporate this preference data with the labeled training data, and learn ranking functions by optimizing the loss function via a derived Ranking SVM framework. The experimental results on the LETOR 2.0 benchmark data collections show that our transductive method can significantly outperform the state-of-the-art supervised baseline.  相似文献   

6.
A systematic approach to the assessment of fuzzy association rules   总被引:3,自引:0,他引:3  
In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such rules by means of appropriate quality measures is less obvious. Particularly, it assumes an understanding of the semantic meaning of a fuzzy rule. This aspect has been ignored by most existing proposals, which must therefore be considered as ad-hoc to some extent. In this paper, we develop a systematic approach to the assessment of fuzzy association rules. To this end, we proceed from the idea of partitioning the data stored in a database into examples of a given rule, counterexamples, and irrelevant data. Evaluation measures are then derived from the cardinalities of the corresponding subsets. The problem of finding a proper partition has a rather obvious solution for standard association rules but becomes less trivial in the fuzzy case. Our results not only provide a sound justification for commonly used measures but also suggest a means for constructing meaningful alternatives.
Henri PradeEmail:
  相似文献   

7.
归纳逻辑编程(ILP)可以用于学习各种形式的逻辑规则,但在尝试用于学习Web页面的信息提取规则时存在格式不匹配问题.给出了系统结构的数据流图,重点分析了格式不匹配问题,提出了一种解决方案,主要包括规则的语法定义和动态生长方法.生成的规则结构清晰,可以用于从Web页面提取信息.  相似文献   

8.
A new approach to online generation of association rules   总被引:6,自引:0,他引:6  
We discuss the problem of online mining of association rules in a large database of sales transactions. The online mining is performed by preprocessing the data effectively in order to make it suitable for repeated online queries. We store the preprocessed data in such a way that online processing may be done by applying a graph theoretic search algorithm whose complexity is proportional to the size of the output. The result is an online algorithm which is independent of the size of the transactional data and the size of the preprocessed data. The algorithm is almost instantaneous in the size of the output. The algorithm also supports techniques for quickly discovering association rules from large itemsets. The algorithm is capable of finding rules with specific items in the antecedent or consequent. These association rules are presented in a compact form, eliminating redundancy. The use of nonredundant association rules helps significantly in the reduction of irrelevant noise in the data mining process  相似文献   

9.
Parallel mining of association rules   总被引:15,自引:0,他引:15  
We consider the problem of mining association rules on a shared nothing multiprocessor. We present three algorithms that explore a spectrum of trade-offs between computation, communication, memory usage, synchronization, and the use of problem specific information. The best algorithm exhibits near perfect scaleup behavior, yet requires only minimal overhead compared to the current best serial algorithm  相似文献   

10.
关联规则相关性的度量   总被引:1,自引:0,他引:1  
用Apriori算法生成的关联规则包含有无用规则,甚至误导规则。为了使生成的规则更有效,引入了统计学中的卡方检验从统计意义上检验规则是否关联,并找到卡方检验值与相关系数的数量关系,实现了两种方法的统一,并用基于相关系数的算法去生成关联规则。  相似文献   

11.
多关系关联规则算法综述   总被引:2,自引:0,他引:2       下载免费PDF全文
多关系数据挖掘是借鉴ILP技术,并结合机器学习方法所提出的数据挖掘新课题。多关系关联规则是多关系方法在概念描述任务中最具代表性的研究方向之一,此类方法在发挥多关系方法的模式表达能力与利用背景知识能力的同时,借鉴成熟的关联规则方法的思想与优化策略,取得了较高的性能与表达复杂模式的能力,同时在面向复杂结构数据的应用中获得了较好的效果。在简述多关系方法的基础上,通过分析与比较目前具有代表性的多关系关联规则算法,总结了各算法的优势与不足,并指出了该领域目前的主要热点问题。  相似文献   

12.
Association rules (AR) represent a consolidated tool in data mining applications as they are able to discover regularities in large data sets. The information mined by the rules is very often difficult to exploit because of the presence of too many associations where to detect the really relevant logical implications. In this framework, by combining methodological and graphical pruning techniques, AR post-analysis tools are proposed. The methodological techniques will ensure the statistical significance of the AR which were not pruned, while the graphical ones will provide interactive and powerful visualization tools.  相似文献   

13.
In this paper we deal with the problem of mining for approximate dependencies (AD) in relational databases. We introduce a definition of AD based on the concept of association rule, by means of suitable definitions of the concepts of item and transaction. This definition allow us to measure both the accuracy and support of an AD. We provide an interpretation of the new measures based on the complexity of the theory (set of rules) that describes the dependence, and we employ this interpretation to compare the new measures with existing ones. A methodology to adapt existing association rule mining algorithms to the task of discovering ADs is introduced. The adapted algorithms obtain the set of ADs that hold in a relation with accuracy and support greater than user-defined thresholds. The experiments we have performed show that our approach performs reasonably well over large databases with real-world data.  相似文献   

14.
基于概念格的关联规则挖掘方法   总被引:3,自引:0,他引:3  
对概念格在关联规则挖掘中的应用进行了研究.通过将概念格的外延和内涵分别与事务数据库中的事务和特征相对应,可以从概念格上产生频繁项集,进而挖掘关联规则.提出了一种基于概念格的关联规则挖掘方法,在背景中对象约简的基础上,构造出对象约简后的概念格,从新的概念格中先产生基本规则集,再根据用户给出的支持度阈值从基本规则集中挖掘出对用户有意义的规则,并给出了算法描述.该方法求出的关联规则和利用Apriori算法求出的结果是一致的.  相似文献   

15.
The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, the number of useful rules is hard to estimate. If the number is too large, we cannot effectively extract the meaningful rules. This paper analyzes the meanings of the parameters and designs a variety of equations between the number of rules and the parameters by using regression method. Finally, we experimentally obtain a preferable regression equation. This paper uses multiple correlation coeficients to test the fitting efiects of the equations and uses significance test to verify whether the coeficients of parameters are significantly zero or not. The regression equation that has a larger multiple correlation coeficient will be chosen as the optimally fitted equation. With the selected optimal equation, we can predict the number of rules under the given parameters and further optimize the choice of the three parameters and determine their ranges of values.  相似文献   

16.
The high dimensionality of massive data results in the discovery of a large number of association rules. The huge number of rules makes it difficult to interpret and react to all of the rules, especially because many rules are redundant and contained in other rules. We discuss how the sparseness of the data affects the redundancy and containment between the rules and provide a new methodology for organizing and grouping the association rules with the same consequent. It consists of finding metarules, rules that express the associations between the discovered rules themselves. The information provided by the metarules is used to reorganize and group related rules. It is based only on data-determined relationships between the rules. We demonstrate the suggested approach on actual manufacturing data and show its effectiveness on several benchmark data sets.  相似文献   

17.
神经网络在确定关联规则挖掘算法权值中的应用研究   总被引:1,自引:0,他引:1  
提出了运用神经网络确定权值的方法,将网络告警信息的三个主要属性作为神经网络的输入,通过样本的训练来确定神经网络的连接权,从而识别网络告警的权值。这种权值确定法既体现了专家的经验知识,又能够随着网络拓扑的变化更新连接权。建模及仿真结果表明,与其他权值确定方法相比,神经网络方法更加实用和有效。  相似文献   

18.
周勇  鲍钰 《计算机应用》2004,24(8):54-56
通过对Web日志信息的数据预处理和分析挖掘,辅助适量编程和算法TPARD(Target Pages Association Rule Discovery),最终实现了互联网目标页面间隐式关联规则的发现,从而可以优化网站结构,进一步提高对Web终端用户的服务质量。  相似文献   

19.
改进的基于距离的关联规则聚类   总被引:2,自引:1,他引:1  
关联规则挖掘会产生大量的规则,为了从这些规则中识别出有用的信息,需要对规则进行有效的分类组织.现有的规则聚类方法往往直接计算规则间的距离,忽略了项与项之间的联系,不能精确得出规则间的距离.提出一种改进的规则间距离的度量方法,首先计算项间的距离,其次计算相集间的距离和规则间的距离,最后基于此距离利用DBSCAN算法对关联规则进行聚类.实验结果表明,此方法是有效可行的,并能准确发现孤立规则.  相似文献   

20.
针对数量型关联规则挖掘中划分边界过硬问题,以及加权关联规则中为确保向下封闭性成立而引起的规则丢失问题,提出一种新的加权模糊关联挖掘模型及其挖掘算法 NFWARM.为了避免区间划分引起的边界过硬问题,该模型引入模糊集软化属性的划分边界;同时,使用属性权重刻画元素对规则的贡献,在保证频繁项集向下封闭性的情况下,不会引起规则丢失.实验结果表明,该算法适用于包含布尔型和数值型数据的大型数据库的规则挖掘,并且得到的频繁项目集数目和规则数目有显著增加.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号