首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Polygons provide natural representations for many types of geospatial objects, such as countries, buildings, and pollution hotspots. Thus, polygon-based data mining techniques are particularly useful for mining geospatial datasets. In this paper, we propose a polygon-based clustering and analysis framework for mining multiple geospatial datasets that have inherently hidden relations. In this framework, polygons are first generated from multiple geospatial point datasets by using a density-based contouring algorithm called DCONTOUR. Next, a density-based clustering algorithm called Poly-SNN with novel dissimilarity functions is employed to cluster polygons to create meta-clusters of polygons. Finally, post-processing analysis techniques are proposed to extract interesting patterns and user-guided summarized knowledge from meta-clusters. These techniques employ plug-in reward functions that capture a domain expert’s notion of interestingness to guide the extraction of knowledge from meta-clusters. The effectiveness of our framework is tested in a real-world case study involving ozone pollution events in Texas. The experimental results show that our framework can reveal interesting relationships between different ozone hotspots represented by polygons; it can also identify interesting hidden relations between ozone hotspots and several meteorological variables, such as outdoor temperature, solar radiation, and wind speed.  相似文献   

2.
The evaluation of the process of mining associations is an important and challenging problem in database systems and especially those that store critical data and are used for making critical decisions. Within the context of spatial databases we present an evaluation framework in which we use probability distributions to model spatial regions, and Bayesian networks to model the joint probability distribution and the structural relationships among spatial and non-spatial predicates. We demonstrate the applicability of the proposed framework by evaluating representatives from two well-known approaches that are used for learning associations, i.e., dependency analysis (using statistical tests of independence) and Bayesian methods. By controlling the parameters of the framework we provide extensive comparative results of the performance of the two approaches. We obtain measures of recovery of known associations as a function of the number of samples used, the strength, number and type of associations in the model, the number of spatial predicates associated with a particular non-spatial predicate, the prior probabilities of spatial predicates, the conditional probabilities of the non-spatial predicates, the image registration error, and the parameters that control the sensitivity of the methods. In addition to performance we investigate the processing efficiency of the two approaches.  相似文献   

3.
The science of bioinformatics has been accelerating at a fast pace, introducing more features and handling bigger volumes. However, these swift changes have, at the same time, posed challenges to data mining applications, in particular efficient association rule mining. Many data mining algorithms for high-dimensional datasets have been put forward, but the sheer numbers of these algorithms with varying features and application scenarios have complicated making suitable choices. Therefore, we present a general survey of multiple association rule mining algorithms applicable to high-dimensional datasets. The main characteristics and relative merits of these algorithms are explained, as well, pointing out areas for improvement and optimization strategies that might be better adapted to high-dimensional datasets, according to previous studies. Generally speaking, association rule mining algorithms that merge diverse optimization methods with advanced computer techniques can better balance scalability and interpretability.  相似文献   

4.
基于支持度的关联规则挖掘算法无法找到那些非频繁但效用很高的项集,基于效用的关联规则会漏掉那些效用不高但发生比较频繁、支持度和效用值的积(激励)很大的项集。提出了基于激励的关联规则挖掘问题及一种自下而上的挖掘算法HM-miner。激励综合了支持度与效用的优点,能同时度量项集的统计重要性和语义重要性。HM-miner利用激励的上界特性进行减枝,能有效挖掘高激励项集。  相似文献   

5.
通过对两种传统的CAD数据到GIS数据转换方法的系统研究,分析了转换过程中存在的信息丢失等问题。基于空间数据关联规则挖掘思想,从问题着手,设计了一种全新的CAD的文本数据到GIS的点层数据的转换方案与挖掘算法。最后,以一个实际的例子,实现了对CAD的文本数据的空间关联规则挖掘,提取文本的坐标信息和属性信息,建立GIS空间数据库,并对转换的几何精度和属性精度进行了评价。  相似文献   

6.
This paper introduces a new approach to a problem of data sharing among multiple parties, without disclosing the data between the parties. Our focus is data sharing among parties involved in a data mining task. We study how to share private or confidential data in the following scenario: multiple parties, each having a private data set, want to collaboratively conduct association rule mining without disclosing their private data to each other or any other parties. To tackle this demanding problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define a protocol using homomorphic encryption techniques to exchange the data while keeping it private.  相似文献   

7.
Recently, a utility-based mining approach has emerged as an alternative mechanism to frequency-based mining in an attempt to reflect not only the statistical correlation but also the semantic significance (e.g., price and quantity) of items. However, existing mining trajectories utilizing high-utility itemsets may not offer firms sufficient business insights unless they can precisely assess the value of association rules, which may vary substantially depending on many business parameters included in the assessment. In this study, we propose a utility-based association-rule mining method that valuates association rules by measuring their specific business benefits accruing to firms. Based on previous studies, three key elements (opportunity, effectiveness, and probability) are identified to define and operationalize a users’ preference as a utility function. To apply the utility-based mechanism to the processing of large transaction databases, we constructed functional algorithms, with heightened attention paid to their pruning strategies, and evaluated them based on real-world databases. Experimental results show that the proposed approach can provide users with greater business benefits than the high-utility itemset mining approach, suggesting several important strategic implications for both research and practice.  相似文献   

8.
9.
A spatial co-location pattern represents relationships between spatial features that are frequently located in close proximity to one another. Such a pattern is one of the most important concepts for geographic context awareness of ubiquitous Geographic Information System (GIS). We constructed a framework for co-location pattern mining using the transaction-based approach, which employs maximal cliques as a transaction-type dataset; we first define transaction-type data and verify that the definition satisfies the requirements, and we also propose an efficient way to generate all transaction-type data. The constructed framework can play a role as a theoretical methodology of co-location pattern mining, which supports geographic context awareness of ubiquitous GIS.  相似文献   

10.
随着旅游业的发展,从海量旅行数据中挖掘旅客类型和环境因素之间内在的、隐含的相关性,是分析旅游市场状况、预测对相关行业影响的一种有效方法。结合旅行数据特点,并针对现有约束方法的局限性,提出一种基于关系延展路径约束的关联规则并行挖掘算法。该算法有效结合MapReduce并行机制,在关系延展路径约束下生成事务集,提升后续并行效率;同时利用并行方法改进Apriori算法的逐层搜索,带来“二次”效率提升,从而更好更快地把握旅游业发展动态,调整旅游业宏观政策。  相似文献   

11.
The most computationally demanding aspect of Association Rule Mining is the identification and counting of support of the frequent sets of items that occur together sufficiently often to be the basis of potentially interesting rules. The task increases in difficulty with the scale of the data and also with its density. The greatest challenge is posed by data that is too large to be contained in primary memory, especially when high data density and/or low support thresholds give rise to very large numbers of candidates that must be counted. In this paper, we consider strategies for partitioning the data to deal effectively with such cases. We describe a partitioning approach which organises the data into tree structures that can be processed independently. We present experimental results that show the method scales well for increasing dimensions of data and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds. Shakil Ahmed received a first class BSc (Hons) degree from Dhaka University, Bangladesh, in 1990; and an MSc (first class), also Dhaka University, in 1992. He received his PhD from The University of Liverpool, UK, in 2005. From 2000 onwards he is a member of the Data Mining Group at the Department of Computer Science of the University of Liverpool, UK. His research interests include data mining, Association Rule Mining and pattern recognition. Frans Coenen has been working in the field of Data Mining for many years and has written widely on the subject. He received his PhD from Liverpool Polytechnic in 1989, after which he took up a post as a RA within the Department of Computer Science at the University of Liverpool. In 1997, he took up a lecturing post within the same department. His current Data Mining research interests include Association rule Mining, Classification algorithms and text mining. He is on the programme committee for ICDM'05 and was the chair for the UK KDD symposium (UKKDD'05). Paul Leng is professor of e-Learning at the University of Liverpool and director of the e-Learning Unit, which is responsible for overseeing the University's online degree programmes, leading to degrees of MSc in IT and MBA. Along with e-Learning, his main research interests are in Data Mining, especially in methods of discovering Association Rules. In collaboration with Frans Coenen, he has developed efficient new algorithms for finding frequent sets and is exploring applications in text mining and classification.  相似文献   

12.
《Information Systems》2002,27(1):41-74
Most algorithms for association rule mining are variants of the basic Apriori algorithm (Agarwal and Srikant, Fast algorithms for mining association rules in databases, in: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), Santiago, Chile, 1994, pp. 487–499). One characteristic of these Apriori-based algorithms is that candidate itemsets are generated in rounds, with the size of the itemsets incremented by one per round. The number of database scans required by Apriori-based algorithms thus depends on the size of the biggest frequent itemsets. In this paper, we devise a more general candidate set generation algorithm, LGen, which generates candidate itemsets of multiple sizes during each database scan. We present an algorithm FindLarge which uses LGen to find frequent itemsets. We show that, given a reasonable set of suggested frequent itemsets, FindLarge can significantly reduce the number of I/O passes required. In the best cases, only two passes are sufficient to discover all the frequent itemsets irrespective of the size of the biggest ones.Two I/O-saving algorithms, namely DIC and Pincher-Search, are compared with FindLarge in a series of experiments. We discuss the conditions under which FindLarge significantly outperforms the others in terms of I/O efficiency.  相似文献   

13.
The paper focuses on the adaptive relational association rule mining problem. Relational association rules represent a particular type of association rules which describe frequent relations that occur between the features characterizing the instances within a data set. We aim at re-mining an object set, previously mined, when the feature set characterizing the objects increases. An adaptive relational association rule method, based on the discovery of interesting relational association rules, is proposed. This method, called ARARM (Adaptive Relational Association Rule Mining) adapts the set of rules that was established by mining the data before the feature set changed, preserving the completeness. We aim to reach the result more efficiently than running the mining algorithm again from scratch on the feature-extended object set. Experiments testing the method's performance on several case studies are also reported. The obtained results highlight the efficiency of the ARARM method and confirm the potential of our proposal.  相似文献   

14.
Data structure for association rule mining: T-trees and P-trees   总被引:1,自引:0,他引:1  
Two new structures for association rule mining (ARM), the T-tree, and the P-tree, together with associated algorithms, are described. The authors demonstrate that the structures and algorithms offer significant advantages in terms of storage and execution time.  相似文献   

15.
16.
Association rule mining is one of the most important techniques for intelligent system design and has been widely applied in a large number of real applications. However, classical mining algorithms cannot process very large databases in a reasonable amount of time. The sampling approach that processes a subset of the whole database is a viable alternative. Obviously, such an approach cannot extract perfectly accurate rules. Previous works have tried to improve the accuracy by removing “outliers” from the initial sample based on global statistical properties in the sample. In this paper, we take the view that the initial sample may actually consist of multiple possibly overlapping subsets or clusters. It is more reasonable to apply data clustering techniques to the initial sample before outlier removal is performed on the resulting clusters, so that outliers are removed based on local properties of individual clusters. However, clustering transactional data with very high dimensions is a difficult problem by itself. We solve this problem by interpreting locality sensitive hashing as a means for data clustering. Previously proposed algorithms may be then optionally used to remove the outliers in the individual clusters. We propose several concrete algorithms based on this general strategy. Using an extensive set of synthetic data and real datasets, we evaluate our proposed algorithms and find that our proposals exhibit better accuracy or execution time, or both, than previously proposed algorithms.  相似文献   

17.
This paper presents an investigation into two fuzzy association rule mining models for enhancing prediction performance. The first model (the FCM–Apriori model) integrates Fuzzy C-Means (FCM) and the Apriori approach for road traffic performance prediction. FCM is used to define the membership functions of fuzzy sets and the Apriori approach is employed to identify the Fuzzy Association Rules (FARs). The proposed model extracts knowledge from a database for a Fuzzy Inference System (FIS) that can be used in prediction of a future value. The knowledge extraction process and the performance of the model are demonstrated through two case studies of road traffic data sets with different sizes. The experimental results show the merits and capability of the proposed KD model in FARs based knowledge extraction. The second model (the FCM–MSapriori model) integrates FCM and a Multiple Support Apriori (MSapriori) approach to extract the FARs. These FARs provide the knowledge base to be utilized within the FIS for prediction evaluation. Experimental results have shown that the FCM–MSapriori model predicted the future values effectively and outperformed the FCM–Apriori model and other models reported in the literature.  相似文献   

18.
Recent research has shown that association rules are useful in gene expression data analysis. Interestingness measure plays an important role in the association rule mining on small sample size, high dimensionality, and noisy gene expression data. This work introduces two interestingness measures by exploring prior knowledge contained in open biological databases. They are Max-Pathway-Distance (MaxPD), which explores the gene’s relativity in Kyoto encyclopedia of genes and genomes Pathway, and Max-Chromosomal-Distance (MaxCD), which makes use of the distance among genes in the chromosome. The properties of our proposed interestingness measures are also explored to mine the interesting rules efficiently. Experimental results on four real-life gene expression datasets show the effectiveness of MaxPD and MaxCD in both classification accuracy and biological interpretability.  相似文献   

19.
基于TD-FP-growth的模糊关联规则挖掘算法   总被引:1,自引:0,他引:1  
霍纬纲  邵秀丽 《控制与决策》2009,24(10):1504-1508
提出一种基于TD_FP-growth的模糊关联规则挖掘算法.首先,使用3种t-模算子以及由其产生的蕴涵算子计算模糊频繁项的支持度和规则的蕴涵度,产生的关联规则能表示模糊项间的确定性和渐近性逻辑语义;然后,以事务的惟一标识为键值,散列存储每个事务相对FP-tree中每个结点所表示模糊项的隶属度,使TD-FP-growth适用于模糊频繁项的挖掘,并分析了算法的时间和空间复杂度;最后,实验结果表明该算法比基于apriori的模糊频繁项挖掘算法在时间方面更加有效.
Abstract:
An algorithm based on TD-FP-growth is proposed for mining fuzzy association rule, which uses three kinds of t-norm operator to calculate the support degree of fuzzy frequent items, and adopts corresponding implication operator to measure implication degree of fuzzy association rule.The association rule mined by the algorithm can express the logic semantic of graduality and certainty between fuzzy items.Each transaction's membership degree versus fuzzy item denoted by FP-tree's node is stored by hash technology, and each transaction's identifier is regarded as key value, which adapts TD-FP-growth to mine fuzzy frequent items.The time and space complexity of the algorithm are analyzed.The experimental results show that the algorithm is more effective than the fuzzy frequent item mining algorithm based on apriori in term of time.  相似文献   

20.
关联规则挖掘是数据挖掘问题中一个典型任务。其挖掘响应时间是数据挖掘系统中重要的问题之一。为了高效解决这一问题,给出了关联规则实视图的概念以及相应的代价模型;提出了针对数据挖掘环境的实视图选择算法,以便在存储空间约束的条件下,取得较好的查询性能。实验结果表明,该算法能有效地选取实视图,从而大大提高关联规则挖掘算法的效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号