首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
From data to global generalized knowledge   总被引:1,自引:0,他引:1  
The attribute-oriented induction (AOI) is a useful data mining method that extracts generalized knowledge from relational data and user's background knowledge. The method uses two thresholds, the relation threshold and attribute threshold, to guide the generalization process, and output generalized knowledge, a set of generalized tuples which describes the major characteristics of the target relation. Although AOI has been widely used in various applications, a potential weakness of this method is that it only provides a snapshot of the generalized knowledge, not a global picture. When thresholds are different, we would obtain different sets of generalized tuples, which also describe the major characteristics of the target relation. If a user wants to ascertain a global picture of induction, he or she must try different thresholds repeatedly. That is time-consuming and tedious. In this study, we propose a global AOI (GAOI) method, which employs the multiple-level mining technique with multiple minimum supports to generate all interesting generalized knowledge at one time. Experiment results on real-life dataset show that the proposed method is effective in finding global generalized knowledge.  相似文献   

2.
本文分析了几种传统属性归纳算法,针对它们的不足,提出了基于取样的概念层次挖掘算法,它不仅可以处理不平衡的概念层次,而且得到的泛化规则可以反映实际的数据分布。此外,这种算法具有最优的时间和空间复杂性。实验证明,本文算法是有效、可行的。  相似文献   

3.
将面向属性的归纳方法(attribute-oriented induction)用于壁画的展示,提出一种基于知识发现的壁画展示方法。对壁画按内容、位置、时间等强相关维属性,引入本体的层次化描述方式用于对比展示,可帮助研究者更好地获取对象的隐性知识,启发新的类描述和关联规则的发现。结合基于绘画构图学特征的相关度评价方法,可有效地选取研究者关注的内容进行比较和展示。实验以真实的敦煌壁画研究课题为例,验证了本文方法在辅助壁画研究中的有效性。  相似文献   

4.
In recent years, Remote Sensing Images (RS-Images) are widely recognized as an essential geospatial data due to their superior ability to offer abundant and instantaneous ground truth information. One of the active RS-Image approaches is the RS-Image recommendation from the Internet for meeting the user's queried Area-of-Interest (AOI). Although a number of studies on RS-Image ranking and recommendation have been proposed, most of them only consider the spatial distance between RS-Image and AOI. It is inappropriate since both of the RS-Image and AOI not only have the spatial information but also the cover range information. In this paper, we propose a novel framework named Location-based rs-Image Finding Engine (LIFE) to rank and recommend a series of relevant RS-Images to users according to the user-specific AOI. In LIFE, we first propose a cluster-based RS-Image index structure to efficiently maintain the large amount of RS-Images. Then, two quantitative indicators named Available Space (AS) and Image Extension (IE) are proposed to measure the Extensibility and Centrality between RS-Image and AOI, respectively. To our best knowledge, this is the first work on RS-Image recommendation that considers the issues of extensibility and centrality simultaneously. Through comprehensive experimental evaluations, the experiment result shows that both indicators have their own distinguished ranking behaviors and are able to successfully recommend meaningful RS-Image results. Besides, the experimental results show that the proposed LIFE framework outperforms the state-of-the-art approach Hausdorff in terms of Precision, Recall and Normalized Discounted Cumulative Gain (NDCG).  相似文献   

5.
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.  相似文献   

6.
基于属性归纳的中药方剂数据挖掘   总被引:2,自引:0,他引:2  
传统的面向属性归纳技术(AOI)存在概化粗糙及算法效率较低等缺陷。为适应中药方剂数据挖掘的复杂需求,提出基于中药数据驱动的属性关联概化算法;为关联的维度创建概念树,利用关联属性与基准属性的相关性以提高归纳的效率,实现了面向属性关联归纳的数据挖掘系统TCMDBMiner。实验结果表明,新算法较传统算法的归纳概化效率提高了23%以上,挖掘结果符合中医理论。  相似文献   

7.
8.
By identifying useful knowledge embedded in the behavior of search engines, users can provide valuable information for web searching and data mining. Numerous algorithms have been proposed to find the desired interesting patterns, i.e., frequent pattern, in real-world applications. Most of those studies use frequency to measure the interestingness of patterns. However, each object may have different importance in these real-world applications, and the frequent ones do not usually contain a large portion of the desired patterns. In this paper, we present a novel method, called exploiting highly qualified patterns with frequency and weight occupancy (QFWO), to suggest the possible highly qualified patterns that utilize the idea of co-occurrence and weight occupancy. By considering item weight, weight occupancy and the frequency of patterns, in this paper, we designed a new highly qualified patterns. A novel Set-enumeration tree called the frequency-weight (FW)-tree and two compact data structures named weight-list and FW-table are designed to hold the global downward closure property and partial downward closure property of quality and weight occupancy to further prune the search space. The proposed method can exploit high qualified patterns in a recursive manner without candidate generation. Extensive experiments were conducted both on real-world and synthetic datasets to evaluate the effectiveness and efficiency of the proposed algorithm. Results demonstrate that the obtained patterns are reasonable and acceptable. Moreover, the designed QFWO with several pruning strategies is quite efficient in terms of runtime and search space.  相似文献   

9.
多数据库中全局负关联规则挖掘研究   总被引:1,自引:0,他引:1  
全局负关联规则挖掘是多数据库关联信息挖掘的重要研究内容,具有广泛的应用范围和使用价值.合并各子数据库的负关联规则是现有全局负关联规则挖掘常用的方法,但数据密度大、规则不全面及运算时间高等问题影响了已有全局负关联规则挖掘方法的效率.本文给出一种新的全局负关联规则挖掘算法,其具体步骤为:(1)扫描各子数据库,建立多数据库频繁模式树;(2)依据频繁项集全局一致性原则,对多数据库频繁模式树执行精简操作;(3)在此基础上产生全局极小非频繁项集;(4)依据极大频繁项集向上闭包原则,产生全局非频繁项集;(5)在规则相关度的基础上提取全局负关联规则.大量的对比实验结果表明,本文算法具有快速发现全局负关联规则的能力.  相似文献   

10.
In item promotion applications, there is a strong need for tools that can help to unlock the hidden profit within each individual customer’s transaction history. Discovering association patterns based on the data mining technique is helpful for this purpose. However, the conventional association mining approach, while generating “strong” association rules, cannot detect potential profit-building opportunities that can be exposed by “soft” association rules, which recommend items with looser but significant enough associations. This paper proposes a novel mining method that automatically detects hidden profit-building opportunities through discovering soft associations among items from historical transactions. Specifically, this paper proposes a relaxation method of association mining with a new support measurement, called soft support, that can be used for mining soft association patterns expressed with the “most” fuzzy quantifier. In addition, a novel measure for validating the soft-associated rules is proposed based on the estimated possibility of a conditioned quantified fuzzy event. The new measure is shown to be effective by comparison with several existing measures. A new association mining algorithm based on modification of the FT-Tree algorithm is proposed to accommodate this new support measure. Finally, the mining algorithm is applied to several data sets to investigate its effectiveness in finding soft patterns and content recommendation.  相似文献   

11.
Hyperclique pattern discovery   总被引:6,自引:0,他引:6  
Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support. Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns. In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support property which can help efficiently eliminate spurious patterns involving items with substantially different support levels. Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures. In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.
Vipin KumarEmail:
  相似文献   

12.
Twitter data has recently been considered to perform a large variety of advanced analysis. Analysis of Twitter data imposes new challenges because the data distribution is intrinsically sparse, due to a large number of messages post every day by using a wide vocabulary. Aimed at addressing this issue, generalized itemsets – sets of items at different abstraction levels – can be effectively mined and used to discover interesting multiple-level correlations among data supplied with taxonomies. Each generalized itemset is characterized by a correlation type (positive, negative, or null) according to the strength of the correlation among its items.This paper presents a novel data mining approach to supporting different and interesting targeted analysis – topic trend analysis, context-aware service profiling – by analyzing Twitter posts. We aim at discovering contrasting situations by means of generalized itemsets. Specifically, we focus on comparing itemsets discovered at different abstraction levels and we select large subsets of specific (descendant) itemsets that show correlation type changes with respect to their common ancestor. To this aim, a novel kind of pattern, namely the Strong Flipping Generalized Itemset (SFGI), is extracted from Twitter messages and contextual information supplied with taxonomy hierarchies. Each SFGI consists of a frequent generalized itemset X and the set of its descendants showing a correlation type change with respect to X.Experiments performed on both real and synthetic datasets demonstrate the effectiveness of the proposed approach in discovering interesting and hidden knowledge from Twitter data.  相似文献   

13.
Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI (Temporal High Utility Itemsets)-Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions.  相似文献   

14.
Real-world optimization problems typically involve multiple objectives to be optimized simultaneously under multiple constraints and with respect to several variables. While multi-objective optimization itself can be a challenging task, equally difficult is the ability to make sense of the obtained solutions. In this two-part paper, we deal with data mining methods that can be applied to extract knowledge about multi-objective optimization problems from the solutions generated during optimization. This knowledge is expected to provide deeper insights about the problem to the decision maker, in addition to assisting the optimization process in future design iterations through an expert system. The current paper surveys several existing data mining methods and classifies them by methodology and type of knowledge discovered. Most of these methods come from the domain of exploratory data analysis and can be applied to any multivariate data. We specifically look at methods that can generate explicit knowledge in a machine-usable form. A framework for knowledge-driven optimization is proposed, which involves both online and offline elements of knowledge discovery. One of the conclusions of this survey is that while there are a number of data mining methods that can deal with data involving continuous variables, only a few ad hoc methods exist that can provide explicit knowledge when the variables involved are of a discrete nature. Part B of this paper proposes new techniques that can be used with such datasets and applies them to discrete variable multi-objective problems related to production systems.  相似文献   

15.
Numerous interestingness measures have been proposed in statistics and data mining to assess object relationships. This is especially important in recent studies of association or correlation pattern mining. However, it is still not clear whether there is any intrinsic relationship among many proposed measures, and which one is truly effective at gauging object relationships in large data sets. Recent studies have identified a critical property, null-(transaction) invariance, for measuring associations among events in large data sets, but many measures do not have this property. In this study, we re-examine a set of null-invariant interestingness measures and find that they can be expressed as the generalized mathematical mean, leading to a total ordering of them. Such a unified framework provides insights into the underlying philosophy of the measures and helps us understand and select the proper measure for different applications. Moreover, we propose a new measure called Imbalance Ratio to gauge the degree of skewness of a data set. We also discuss the efficient computation of interesting patterns of different null-invariant interestingness measures by proposing an algorithm, GAMiner, which complements previous studies. Experimental evaluation verifies the effectiveness of the unified framework and shows that GAMiner speeds up the state-of-the-art algorithm by an order of magnitude.  相似文献   

16.
Fuzzy utility mining has been an emerging research issue because of its simplicity and comprehensibility. Different from traditional fuzzy data mining, fuzzy utility mining considers not only quantities of items in transactions but also their profits for deriving high fuzzy utility itemsets. In this paper, we introduce a new fuzzy utility measure with the fuzzy minimum operator to evaluate the fuzzy utilities of itemsets. Besides, an effective fuzzy utility upper-bound model based on the proposed measure is designed to provide the downward-closure property in fuzzy sets, thus reducing the search space of finding high fuzzy utility itemsets. A two-phase fuzzy utility mining algorithm, named TPFU, is also proposed and described for solving the problem of fuzzy utility mining. At last, the experimental results on both synthetic and real datasets show that the proposed algorithm has good performance.  相似文献   

17.
Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm.  相似文献   

18.
C4.5算法在国防生素质分析中的应用   总被引:1,自引:0,他引:1  
国防生素质直接影响到依托培养的质量,目前选培办对国防生素质的分析是粗略的或根据经验来判断.本文主要讨论利用数据挖掘中面向属性的归纳和决策树C4.5算法对国防生基本信息进行分析,从中找出影响国防生素质高低的一些规则和模式,帮助选培办针对性地进行教育管理,同时也给国防生选拔工作提供参考依据.  相似文献   

19.
Temporal data mining is still one of important research topic since there are application areas that need knowledge from temporal data such as sequential patterns, similar time sequences, cyclic and temporal association rules, and so on. Although there are many studies for temporal data mining, they do not deal with discovering knowledge from temporal interval data such as patient histories, purchaser histories, and web logs etc. We propose a new temporal data mining technique that can extract temporal interval relation rules from temporal interval data by using Allen’s theory: a preprocessing algorithm designed for the generalization of temporal interval data and a temporal relation algorithm for mining temporal relation rules from the generalized temporal interval data. This technique can provide more useful knowledge in comparison with conventional data mining techniques.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号