首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recent research has highlighted the practical benefits of subjective interestingness measures, which quantify the novelty or unexpectedness of a pattern when contrasted with any prior information of the data miner (Silberschatz and Tuzhilin, Proceedings of the 1st ACM SIGKDD international conference on Knowledge discovery and data mining (KDD95), 1995; Geng and Hamilton, ACM Comput Surv 38(3):9, 2006). A key challenge here is the formalization of this prior information in a way that lends itself to the definition of a subjective interestingness measure that is both meaningful and practical. In this paper, we outline a general strategy of how this could be achieved, before working out the details for a use case that is important in its own right. Our general strategy is based on considering prior information as constraints on a probabilistic model representing the uncertainty about the data. More specifically, we represent the prior information by the maximum entropy (MaxEnt) distribution subject to these constraints. We briefly outline various measures that could subsequently be used to contrast patterns with this MaxEnt model, thus quantifying their subjective interestingness. We demonstrate this strategy for rectangular databases with knowledge of the row and column sums. This situation has been considered before using computation intensive approaches based on swap randomizations, allowing for the computation of empirical p-values as interestingness measures (Gionis et al., ACM Trans Knowl Discov Data 1(3):14, 2007). We show how the MaxEnt model can be computed remarkably efficiently in this situation, and how it can be used for the same purpose as swap randomizations but computationally more efficiently. More importantly, being an explicitly represented distribution, the MaxEnt model can additionally be used to define analytically computable interestingness measures, as we demonstrate for tiles (Geerts et al., Proceedings of the 7th international conference on Discovery science (DS04), 2004) in binary databases.  相似文献   

2.
为了减少偏好度量过程中的人为干预,同时提高偏好度量算法的效率和准确性,提出一种基于信任系统的偏好协同度量框架。首先,提出了规则间的距离和规则集的内部距离等概念来具体化规则之间的关系。在此基础上,提出了基于规则集平均内部距离的规则集聚合算法PRA,旨在保证损失最少信息的情况下筛选出最具代表性的全体用户的共同偏好,即共识偏好。之后,提出Common belief的概念和一种改进的信任系统,使用共识偏好作为信任系统的证据,在考虑用户一致性的同时还允许用户保留个性化信息。在信任系统下,提出了基于信任系统的有趣度度量标准,并量化了偏好的信任度和偏离度,用于描述用户偏好和信任系统的一致或相悖程度,并将用户偏好分为泛化偏好或个性化偏好,最终依据信任度和偏离度得出有趣度,从而找出最有趣的规则。在计算有趣度的过程中,提出了一个可以使用不同信任度公式来计算有趣度的可扩展的计算框架。为了进一步验证度量框架的准确性和有效性,以加权的余弦相似度公式和相关系数公式为例,提出了IMCos算法和IMCov算法。实验结果表明,信任度和偏离度有效地反映了偏好的不同特征,并且与两种最新的算法CONTENUM和TKO相比,度量框架发现的Top-K规则在召回率、准确率和F1-Measure等指标上均更优。  相似文献   

3.
Knowledge discovery in databases is used to discover useful and understandable knowledge from large databases. A process of knowledge discovery consists of two steps, the data mining step and the evaluation step. In this paper, evaluating and ranking the interestingness of summaries generated from databases, which is a part of the second step, is studied using diversity measures. Sixteen previously analyzed diversity measures of interestingness are used along with three not previously considered ones, brought from different well-known areas. The latter three measures are evaluated theoretically according to five principles that a measure must satisfy to be qualified acceptable for ranking summaries. A theoretical correlation study between the eight measures that satisfy all five principles is presented based on mathematical proofs. An empirical evaluation is conducted using three real databases. Then, a classification of the eight measures is deduced. The resulting classification is used to reduce the number of measures to only two, which are the best over all criteria, and that produce non-similar results. This helps the user interpret the most important discovered knowledge in his decision making process.  相似文献   

4.
挖掘所关注规则的多策略方法研究   总被引:20,自引:1,他引:19  
通过数据挖掘,从大型数据库中发现了大量规则,如何选取所关注的规则,是知识发现的重要研究内容。该文研究了利用领域知识对规则的主观关注程度进行度量的方法,给出了一个能够度量规则的简洁性和新奇性的客观关注程度的计算函数,提出了选取用户关注的规则的多策略方法。  相似文献   

5.
Because clinical research is carried out in complex environments, prior domain knowledge, constraints, and expert knowledge can enhance the capabilities and performance of data mining. In this paper we propose an unexpected pattern mining model that uses decision trees to compare recovery rates of two different treatments, and to find patterns that contrast with the prior knowledge of domain users. In the proposed model we define interestingness measures to determine whether the patterns found are interesting to the domain. By applying the concept of domain-driven data mining, we repeatedly utilize decision trees and interestingness measures in a closed-loop, in-depth mining process to find unexpected and interesting patterns. We use retrospective data from transvaginal ultrasound-guided aspirations to show that the proposed model can successfully compare different treatments using a decision tree, which is a new usage of that tool. We believe that unexpected, interesting patterns may provide clinical researchers with different perspectives for future research.  相似文献   

6.
This paper presents a framework for exact discovery of the top-k sequential patterns under Leverage. It combines (1) a novel definition of the expected support for a sequential pattern—a concept on which most interestingness measures directly rely—with (2) Skopus: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest. Our interestingness measure employs the partition approach. A pattern is interesting to the extent that it is more frequent than can be explained by assuming independence between any of the pairs of patterns from which it can be composed. The larger the support compared to the expectation under independence, the more interesting is the pattern. We build on these two elements to exactly extract the k sequential patterns with highest leverage, consistent with our definition of expected support. We conduct experiments on both synthetic data with known patterns and real-world datasets; both experiments confirm the consistency and relevance of our approach with regard to the state of the art.  相似文献   

7.
The discovery of association rules is a very efficient data mining technique that is especially suitable for large amounts of categorical data. This paper shows how the discovery of association rules can be of benefit for numeric data as well. Based on a review of previous approaches we introduce Q2, a faster algorithm for the discovery of multi-dimensional association rules over ordinal data. We experimentally compare the new algorithm with the previous approach, obtaining performance improvements of more than an order of magnitude on supermarket data. In addition, a new absolute measure for the interestingness of quantitative association rules is introduced. It is based on the view that quantitative association rules have to be interpreted with respect to their Boolean generalizations. This measure has two major benefits compared to the previously used relative interestingness measure; first, it speeds up rule extraction and evaluation and second, it is easier to interpret for a user. Finally we introduce a rule browser which supports the exploration of ordinal data with quantitative association rules.  相似文献   

8.
Discovering injective episodes with general partial orders   总被引:1,自引:1,他引:0  
Frequent episode discovery is a popular framework for temporal pattern discovery in event streams. An episode is a partially ordered set of nodes with each node associated with an event type. Currently algorithms exist for episode discovery only when the associated partial order is total order (serial episode) or trivial (parallel episode). In this paper, we propose efficient algorithms for discovering frequent episodes with unrestricted partial orders when the associated event-types are unique. These algorithms can be easily specialized to discover only serial or parallel episodes. Also, the algorithms are flexible enough to be specialized for mining in the space of certain interesting subclasses of partial orders. We point out that frequency alone is not a sufficient measure of interestingness in the context of partial order mining. We propose a new interestingness measure for episodes with unrestricted partial orders which, when used along with frequency, results in an efficient scheme of data mining. Simulations are presented to demonstrate the effectiveness of our algorithms.  相似文献   

9.
Market basket analysis is one of the typical applications in mining association rules. The valuable information discovered from data mining can be used to support decision making. Generally, support and confidence (objective) measures are used to evaluate the interestingness of association rules. However, in some cases, by using these two measures, the discovered rules may be not profitable and not actionable (not interesting) to enterprises. Therefore, how to discover the patterns by considering both objective measures (e.g. probability) and subjective measures (e.g. profit) is a challenge in data mining, particularly in marketing applications. This paper focuses on pattern evaluation in the process of knowledge discovery by using the concept of profit mining. Data Envelopment Analysis is utilized to calculate the efficiency of discovered association rules with multiple objective and subjective measures. After evaluating the efficiency of association rules, they are categorized into two classes, relatively efficient (interesting) and relatively inefficient (uninteresting). To classify these two classes, Decision Tree (DT)‐based classifier is built by using the attributes of association rules. The DT classifier can be used to find out the characteristics of interesting association rules, and to classify the unknown (new) association rules.  相似文献   

10.
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的兴趣度的评估,过去的研究发现,在实际应用中往往很容易从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不感兴趣的,本文对规则的兴趣度度量的两个方面作了讨论:一个是主观兴趣度度量,另一个是客观兴趣度度量,最后介绍了如何利用模板进行挖掘有趣的规则。  相似文献   

11.
影响关联规则挖掘的有趣性因素的研究   总被引:7,自引:2,他引:7  
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的感兴趣程度的评估。实际应用中可从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不一定感兴趣的。关联规则挖掘中的有趣性问题可从客观和主观两个方面对关联规则的兴趣度进行评测。利用模板将用户感兴趣的规则和不感兴趣的规则区分开,以此来完成关联规则有趣性的主观评测;在关联规则的置信度和支持度基础上对关联规则的有趣性的客观评测增加了约束。  相似文献   

12.
A number of studies, theoretical, empirical, or both, have been conducted to provide insight into the properties and behavior of interestingness measures for association rule mining. While each has value in its own right, most are either limited in scope or, more importantly, ignore the purpose for which interestingness measures are intended, namely the ultimate ranking of discovered association rules. This paper, therefore, focuses on an analysis of the rule-ranking behavior of 61 well-known interestingness measures tested on the rules generated from 110 different datasets. By clustering based on ranking behavior, we highlight, and formally prove, previously unreported equivalences among interestingness measures. We also show that there appear to be distinct clusters of interestingness measures, but that there remain differences among clusters, confirming that domain knowledge is essential to the selection of an appropriate interestingness measure for a particular task and business objective.  相似文献   

13.
兴趣度量在关联规则挖掘中常用来发现那些潜在的令人感兴趣的模式,基于FP树结构的FP-growth算法是目前较高效的关联规则挖掘算法之一,如果挖掘潜在的有价值的低支持度模式,这种算法效率较低。为此,本文提出一种新的兴趣度量—项项正相关兴趣度量,该量度具有良好的反单调性,所得到的模式中任意一项在事务中的出现均可提升模式中其余项出现的可能性。同时,提出一种改进的FP挖掘算法,该算法采用一种压缩的FP树结构,并利用非递归调用方法来减少挖掘中建立额外条件模式树的开销。更为重要的是,在频繁项集挖掘中引入项项正相关兴趣度量剪枝策略,有效过滤掉非正相关长模式和无效项集,扩大了可挖掘支持度阈值范围。实验结果表明,该算法是有效和可行的。  相似文献   

14.
Many studies have shown the limits of the support/confidence framework used in Apriori ‐like algorithms to mine association rules. There are a lot of efficient implementations based on the antimonotony property of the support, but candidate set generation (e.g., frequent item set mining) is still costly. In addition, many rules are uninteresting or redundant and one can miss interesting rules like nuggets. We are thus facing a complexity issue and a quality issue. One solution is to not use frequent itemset mining and to focus as soon as possible on interesting rules using additional interestingness measures. We present here a formal framework that allows us to make a link between analytic and algorithmic properties of interestingness measures. We introduce the notion of optimonotony in relation with the optimal rule discovery framework. We then demonstrate a necessary and sufficient condition for the existence of optimonotony. This result can thus be applied to classify the measures. We study the case of 39 classical measures and show that 31 of them are optimonotone. These optimonotone measures can thus be used with an underlying pruning strategy. Empirical evaluations show that the pruning strategy is efficient and leads to the discovery of nuggets using an optimonotone measure and without the support constraint.  相似文献   

15.
Discovering knowledge from data means finding useful patterns in data, this process has increased the opportunity and challenge for businesses in the big data era. Meanwhile, improving the quality of the discovered knowledge is important for making correct decisions in an unpredictable environment. Various models have been developed in the past; however, few used both data quality and prior knowledge to control the quality of the discovery processes and results. In this paper, a multi-objective model of knowledge discovery in databases is developed, which aids the discovery process by utilizing prior process knowledge and different measures of data quality. To illustrate the model, association rule mining is considered and formulated as a multi-objective problem that takes into account data quality measures and prior process knowledge instead of a single objective problem. Measures such as confidence, support, comprehensibility and interestingness are used. A Pareto-based integrated multi-objective Artificial Bee Colony (IMOABC) algorithm is developed to solve the problem. Using well-known and publicly available databases, experiments are carried out to compare the performance of IMOABC with NSGA-II, MOPSO and Apriori algorithms, respectively. The computational results show that IMOABC outperforms NSGA-II, MOPSO and Apriori on different measures and it could be easily customized or tailored to be in line with user requirements and still generates high-quality association rules.  相似文献   

16.
《Knowledge》1999,12(5-6):309-315
This paper discusses several factors influencing the evaluation of the degree of interestingness of rules discovered by a data mining algorithm. This article aims at: (1) drawing attention to several factors related to rule interestingness that have been somewhat neglected in the literature; (2) showing some ways of modifying rule interestingness measures to take these factors into account; (3) introducing a new criterion to measure attribute surprisingness, as a factor influencing the interestingness of discovered rules.  相似文献   

17.
18.
基于灵敏度分析的动态指标选取方法   总被引:7,自引:0,他引:7  
在传统的方法中,采用静态的评价指标来评估动态发展的事物,显然是不合时宜的。为了对动态发展的事物进行客观的评价,该文提出了一种基于灵敏度分析的动态指标选取方法。该方法应用评价指标对评价结果影响程度的大小来进行评价指标的取舍,有力地避免了人为主观因素对评价指标体系建立的影响。该方法具有以下两个优点:指标的动态选取和指标的客观选取。该文以坦克陆上机动性能评估指标体系的建立为例,就如何进行评估中指标及其数量的选取这一问题,详细介绍了这种方法的具体应用。不难看出,只要评价方法正确,应用这种“基于灵敏度分析的动态指标选取”方法。建立一套动态的、客观的评价指标体系完全是正确的、可行的。  相似文献   

19.
20.
While many studies have considered the usability of website homepages, subjective issues such as preference have been under explored. This paper describes a pilot study that investigates subjects' preferences for different homepages. The study applies Berlyne's theory of experimental aesthetics to website homepages. This theory suggests that there is an inverted-U shape relationship between preference for a stimulus and its complexity. Twelve subjects evaluated 12 homepages. The study used a ranking method to measure subjects' preferences and the relationships between complexity, pleasure and interestingness. In addition, verbal reports were collected. No support was found for an inverted-U shape relationship and the findings indicate that complexity is not a predictor of pleasure. However, the results uncovered a number of subjective factors that underlie preference. These factors include individual differences in taste and lifestyle all of which are highly personal factors that change and develop over time. In addition, the findings suggest a link between interestingness and curiosity. Lastly, the findings show an agreement on the judgements of complexity, and disagreement on aesthetic preferences. In conclusion, the paper points out the challenges faced in researching preference because of its highly subjective character.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号