首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The motivation for regional association rule mining and scoping is driven by the facts that global statistics seldom provide useful insight and that most relationships in spatial datasets are geographically regional, rather than global. Furthermore, when using traditional association rule mining, regional patterns frequently fail to be discovered due to insufficient global confidence and/or support. In this paper, we systematically study this problem and address the unique challenges of regional association mining and scoping: (1) region discovery: how to identify interesting regions from which novel and useful regional association rules can be extracted; (2) regional association rule scoping: how to determine the scope of regional association rules. We investigate the duality between regional association rules and regions where the associations are valid: interesting regions are identified to seek novel regional patterns, and a regional pattern has a scope of a set of regions in which the pattern is valid. In particular, we present a reward-based region discovery framework that employs a divisive grid-based supervised clustering for region discovery. We evaluate our approach in a real-world case study to identify spatial risk patterns from arsenic in the Texas water supply. Our experimental results confirm and validate research results in the study of arsenic contamination, and our work leads to the discovery of novel findings to be further explored by domain scientists.  相似文献   

2.
Polygons provide natural representations for many types of geospatial objects, such as countries, buildings, and pollution hotspots. Thus, polygon-based data mining techniques are particularly useful for mining geospatial datasets. In this paper, we propose a polygon-based clustering and analysis framework for mining multiple geospatial datasets that have inherently hidden relations. In this framework, polygons are first generated from multiple geospatial point datasets by using a density-based contouring algorithm called DCONTOUR. Next, a density-based clustering algorithm called Poly-SNN with novel dissimilarity functions is employed to cluster polygons to create meta-clusters of polygons. Finally, post-processing analysis techniques are proposed to extract interesting patterns and user-guided summarized knowledge from meta-clusters. These techniques employ plug-in reward functions that capture a domain expert’s notion of interestingness to guide the extraction of knowledge from meta-clusters. The effectiveness of our framework is tested in a real-world case study involving ozone pollution events in Texas. The experimental results show that our framework can reveal interesting relationships between different ozone hotspots represented by polygons; it can also identify interesting hidden relations between ozone hotspots and several meteorological variables, such as outdoor temperature, solar radiation, and wind speed.  相似文献   

3.
4.
Data Streams have become ubiquitous in recent years because of advances in hardware technology which have enabled automated recording of large amounts of data. The primary constraint in the effective mining of streams is the large volume of data which must be processed in real time. In many cases, it is desirable to store a summary of the data stream segments in order to perform data mining tasks. Since density estimation provides a comprehensive overview of the probabilistic data distribution of a stream segment, it is a natural choice for this purpose. A direct use of density distributions can however turn out to be an inefficient storage and processing mechanism in practice. In this paper, we introduce the concept of cluster histograms, which provides an efficient way to estimate and summarize the most important data distribution profiles over different stream segments. These profiles can be constructed in a supervised or unsupervised way depending upon the nature of the underlying application. The profiles can also be used for change detection, anomaly detection, segmental nearest neighbor search, or supervised stream segment classification. Furthermore, these techniques can also be used for modeling other kinds of data such as text and categorical data. The flexibility of the tasks which can be performed from the cluster histogram framework follows from its generality in storing the historical density profile of the data stream. As a result, this method provides a holistic framework for density-based mining of data streams. We discuss and test the application of the cluster histogram framework to a variety of interesting data mining applications.  相似文献   

5.
In recent years the integration of spatial data coming from different sources has become a crucial issue for many geographical applications, especially in the process of building and maintaining a Spatial Data Infrastructure (SDI). In such context new methodologies are necessary in order to acquire and update spatial datasets by collecting new measurements from different sources. The traditional approach implemented in GIS systems for updating spatial data does not usually consider the accuracy of these data, but just replaces the old geometries with the new ones. The application of such approach in the case of an SDI, where continuous and incremental updates occur, will lead very soon to an inconsistent spatial dataset with respect to spatial relations and relative distances among objects. This paper addresses such problem and proposes a framework for representing multi-accuracy spatial databases, based on a statistical representation of the objects geometry, together with a method for the incremental and consistent update of the objects, that applies a customized version of the Kalman filter. Moreover, the framework considers also the spatial relations among objects, since they represent a particular kind of observation that could be derived from geometries or be observed independently in the real world. Spatial relations among objects need also to be compared in spatial data integration and we show that they are necessary in order to obtain a correct result in merging objects geometries.  相似文献   

6.
7.
The goal of data mining is to find out interesting and meaningful patterns from large databases. In some real applications, many data are quantitative and linguistic. Fuzzy data mining was thus proposed to discover fuzzy knowledge from this kind of data. In the past, two mining algorithms based on the ant colony systems were proposed to find suitable membership functions for fuzzy association rules. They transformed the problem into a multi-stage graph, with each route representing a possible set of membership functions, and then, used the any colony system to solve it. They, however, searched for solutions in a discrete solution space in which the end points of membership functions could be adjusted only in a discrete way. The paper, thus, extends the original approaches to continuous search space, and a fuzzy mining algorithm based on the continuous ant approach is proposed. The end points of the membership functions may be moved in the continuous real-number space. The encoding representation and the operators are also designed for being suitable in the continuous space, such that the actual global optimal solution is contained in the search space. Besides, the proposed approach does not have fixed edges and nodes in the search process. It can dynamically produce search edges according to the distribution functions of pheromones in the solution space. Thus, it can get a better nearly global optimal solution than the previous two ant-based fuzzy mining approaches. The experimental results show the good performance of the proposed approach as well.  相似文献   

8.
An ACS-based framework for fuzzy data mining   总被引:1,自引:0,他引:1  
Data mining is often used to find out interesting and meaningful patterns from huge databases. It may generate different kinds of knowledge such as classification rules, clusters, association rules, and among others. A lot of researches have been proposed about data mining and most of them focused on mining from binary-valued data. Fuzzy data mining was thus proposed to discover fuzzy knowledge from linguistic or quantitative data. Recently, ant colony systems (ACS) have been successfully applied to optimization problems. However, few works have been done on applying ACS to fuzzy data mining. This thesis thus attempts to propose an ACS-based framework for fuzzy data mining. In the framework, the membership functions are first encoded into binary-bits and then fed into the ACS to search for the optimal set of membership functions. The problem is then transformed into a multi-stage graph, with each route representing a possible set of membership functions. When the termination condition is reached, the best membership function set (with the highest fitness value) can then be used to mine fuzzy association rules from a database. At last, experiments are made to make a comparison with other approaches and show the performance of the proposed framework.  相似文献   

9.
10.
With the high availability of digital video contents on the internet, users need more assistance to access digital videos. Various researches have been done about video summarization and semantic video analysis to help to satisfy these needs. These works are developing condensed versions of a full length video stream through the identification of the most important and pertinent content within the stream. Most of the existing works in these areas are mainly focused on event mining. Event mining from video streams improves the accessibility and reusability of large media collections, and it has been an active area of research with notable recent progress. Event mining includes a wide range of multimedia domains such as surveillance, meetings, broadcast, news, sports, documentary, and films, as well as personal and online media collections. Due to the variety and plenty of Event mining techniques, in this paper we suggest an analytical framework to classify event mining techniques and to evaluate them based on important functional measures. This framework could lead to empirical and technical comparison of event mining methods and development of more efficient structures at future.  相似文献   

11.
Scalable parallel data mining for association rules   总被引:3,自引:0,他引:3  
The authors propose two new parallel formulations of the Apriori algorithm (R. Agrawal and R. Srikant, 1994) that is used for computing association rules. These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD. Unlike the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller communication overhead than DD. But IDD suffers from the added cost due to communication of transactions among processors. HD is a hybrid algorithm that combines the advantages of CD and DD. Experimental results on a 128-processor Cray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing candidate set size  相似文献   

12.
数据挖掘中关联规则挖掘算法比较研究   总被引:15,自引:12,他引:15  
分析数据挖掘中关联规则挖掘算法的研究现状,提出关联规则新的价值衡量方法和关联规则挖掘今后进一步的研究方向。以核心Apfiofi算法为基点,运用文献查询和比较分析方法对典型的关联规则挖掘算法进行了综合研究:Apfiofi法即使进行了优化,一些固有的缺陷仍然无法克服,还需进一步研究;②今后的研究方向将是提高处理极大量数据和非结构化数据算法的效率、与OLAP相结合以及生成结果的可视化。  相似文献   

13.
《国际计算机数学杂志》2012,89(11):2233-2245
A data mining algorithm, such as Apriori, discovers a huge number of association rules (ARs) and therefore efficiently ranking all these rules is an important issue. This paper suggests a data envelopment analysis (DEA) method for ranking the discovered ARs using a maximum discrimination between the interestingness criteria defined for all ARs. It is shown that the proposed DEA model has a unique optimal solution which can be computed efficiently when the maximum discrimination between the criteria, the difference between DEA weights, is considered. The contribution of this study can be explained as follows: First, we show that using the conventional DEA model for ranking ARs may produce an invalid result because the weights corresponding to interestingness criteria would not discriminate between the criteria. This is investigated for a dataset consisting of 46 ARs with four criteria, namely support, confidence, itemset value and cross-selling. The paper also introduces the maximum discrimination between the weights of the criteria and obtains the optimal solution of the corresponding DEA model efficiently without the need of solving the related mathematical models. On the other hand, this model concludes less number of useful rule(s). A comparative analysis is then used to show the advantage of the proposed DEA method.  相似文献   

14.
在基于空间事务的横向关联规则挖掘中,为了能够在海量数据中有效地提取空间拓扑关联规则,提出一种挖掘空间拓扑关联的有效算法,其适合挖掘多层横向空间关联规则.该算法用二进制数存储空间拓扑关系,使空间事务和数字建立对应关系,用数字递增的方法产生候选频繁项.在计算支持数时,算法在用逻辑运算的同时还利用数字特性减少扫描的空间事务数,大大地提高了效率.实验结果表明,在提取多层空间拓扑关联规则时,其比现有的算法更快速更有效.  相似文献   

15.
The complexity of industrial production plants results in difficulties in obtaining an overall view of plant performance and in finding the weak links that deteriorate performance and product quality. While control loop performance assessment is a popular subject, few authors address the problem of combining low-level performance indices into subsystem and plant level indices. This paper presents a performance assessment framework enabling the creation of different views to plant performance. In the proposed method, low-level performance indices are scaled to the same interval so that their interpretation, comparison, and combination are more straightforward. Different methods for combining low-level indices to create subsystem and plant level performance measures are studied. Implementation in a large-scale industrial process is described.  相似文献   

16.
This paper presents an investigation into two fuzzy association rule mining models for enhancing prediction performance. The first model (the FCM–Apriori model) integrates Fuzzy C-Means (FCM) and the Apriori approach for road traffic performance prediction. FCM is used to define the membership functions of fuzzy sets and the Apriori approach is employed to identify the Fuzzy Association Rules (FARs). The proposed model extracts knowledge from a database for a Fuzzy Inference System (FIS) that can be used in prediction of a future value. The knowledge extraction process and the performance of the model are demonstrated through two case studies of road traffic data sets with different sizes. The experimental results show the merits and capability of the proposed KD model in FARs based knowledge extraction. The second model (the FCM–MSapriori model) integrates FCM and a Multiple Support Apriori (MSapriori) approach to extract the FARs. These FARs provide the knowledge base to be utilized within the FIS for prediction evaluation. Experimental results have shown that the FCM–MSapriori model predicted the future values effectively and outperformed the FCM–Apriori model and other models reported in the literature.  相似文献   

17.
The minimal frequency constraint in classical association mining algorithms turns out to be a challenging bottleneck in discovery of large number of infrequent associations that can be potential in knowledge content. A lower choice for threshold frequency not only incurs huge cost of pattern explosion but also cuts reliability of discovered knowledge. The goal of the present paper is to devise a new framework addressing two necessities. The first is discovery of confident associations unconstrained to classical minimal frequency. The second is to ensure quality of the discovered rules. We propose a new property among items, terming it cohesion, and develop cohesion-based scalable algorithms for confident association discovery. In order to assess quality of rules in terms of knowledge content, we propose two new measures, accuracy and predictability based on documented associations. Experiments with market-basket data as well as microarray data establish superiority of cohesion-based technique both in terms of amount and quality of discovered knowledge.  相似文献   

18.
在这个大数据时代,空间数据正在从各个领域飞速累计。空间数据挖掘作为数据挖掘的一部分,现已成为人们研究空间数据的重点学科。主要介绍了空间数据挖掘的基本概念、一般步骤及其最新的挖掘方法,表达了对当前空间数据挖掘的看法。最后对未来空间数据挖掘的研究方向进行了更加深入的探讨。  相似文献   

19.
This paper presents an informatics framework to apply feature-based engineering concept for cost estimation supported with data mining algorithms. The purpose of this research work is to provide a practical procedure for more accurate cost estimation by using the commonly available manufacturing process data associated with ERP systems. The proposed method combines linear regression and data-mining techniques, leverages the unique strengths of the both, and creates a mechanism to discover cost features. The final estimation function takes the user’s confidence level over each member technique into consideration such that the application of the method can phase in gradually in reality by building up the data mining capability. A case study demonstrates the proposed framework and compares the results from empirical cost prediction and data mining. The case study results indicate that the combined method is flexible and promising for determining the costs of the example welding features. With the result comparison between the empirical prediction and five different data mining algorithms, the ANN algorithm shows to be the most accurate for welding operations.  相似文献   

20.
A data warehouse is an important decision support system with cleaned and integrated data for knowledge discovery and data mining systems. In reality, the data warehouse mining system has provided many applicable solutions in industries, yet there are still many problems causing users extra problems in discovering knowledge or even failing to obtain the real and useful knowledge they need. To improve the overall data warehouse mining process, we present an intelligent data warehouse mining approach incorporated with schema ontology, schema constraint ontology, domain ontology and user preference ontology. The structures of these ontologies are illustrated and how they benefit the mining process is also demonstrated by examples utilizing rule mining. Finally, we present a prototype multidimensional association mining system, which with intelligent assistance through the support of the ontologies, can help users build useful data mining models, prevent ineffective pattern generation, discover concept extended rules, and provide an active knowledge re-discovering mechanism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号