首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
空间co-location模式是一组空间特征的子集,它们的实例在邻域内频繁并置出现。通常,空间co-location模式挖掘方法假设空间实例相互独立,并采用空间实例参与到模式实例的频繁性(参与率)来度量空间特征在模式中的重要性,采用空间特征的最小参与率(参与度)来度量模式的有趣程度,忽略了空间特征间的某些重要关系。因此为了揭示空间特征间的主导关系而提出主导特征co-location模式。现有主导特征模式挖掘方法是基于传统频繁模式及其团实例模型进行挖掘,然而,团实例模型可能会忽略非团的空间特征间的主导关系。因此,基于星型实例模型,研究空间亚频繁co-location模式的主导特征挖掘,以更好地揭示空间特征间的主导关系,挖掘更有价值的主导特征模式。首先,定义了两个度量特征主导性的指标;其次,设计了有效的主导特征co-location模式挖掘算法;最后,在合成数据集和真实数据集上通过大量实验验证了所提算法的有效性以及主导特征模式的实用性。  相似文献   

2.
A Joinless Approach for Mining Spatial Colocation Patterns   总被引:9,自引:0,他引:9  
Spatial colocations represent the subsets of features which are frequently located together in geographic space. Colocation pattern discovery presents challenges since spatial objects are embedded in a continuous space, whereas classical data is often discrete. A large fraction of the computation time is devoted to identifying the instances of colocation patterns. We propose a novel joinless approach for efficient colocation pattern mining. The joinless colocation mining algorithm uses an instance-lookup scheme instead of an expensive spatial or an instance join operation for identifying colocation instances. We prove the joinless algorithm is correct and complete in finding colocation rules. We also describe a partial join approach for a spatial data set often clustered in neighborhood areas. We provide the algebraic cost models to characterize the performance dominance zones of the joinless method and the partial join method with a current join-based colocation mining method, and compare their computational complexities. In the experimental evaluation, using synthetic and real-world data sets, our methods performed more efficiently than the join-based method and show more scalability in dense data.  相似文献   

3.
空间co-location(并置)模式是指实例在空间中频繁关联的一组空间特征的子集。在空间数据挖掘中,现有算法主要针对的是正模式的挖掘,而空间中还存在着具有强负相关性的模式,如负co-location模式,这类模式的挖掘在一些应用中同样具有重要的意义。现有的负co-location模式挖掘算法的时间复杂度较高,挖掘到的模式数量巨大。针对该问题,探索了负co-location模式的向上包含性质,提出了极小负co-location模式,证明了极小负co-location模式可推导出所有频繁负co-location模式。在负co-location模式挖掘中,计算模式的表实例是制约挖掘效率的根本因素,为此提出了3个剪枝策略有效地提高了算法的效率。在真实和合成数据集上的大量实验,验证了提出方法的正确性和高效性。特别地,大量实验结果表明极小负co-location模式可将频繁负co-location模式数量压缩80%以上。  相似文献   

4.
置换检验方法在进行对比模式挖掘时,返回结果中存在许多冗余对比模式。利用Charm方法挖掘样本集合中的对比模式,提出基于固定属性置换的FSPRP和FEPRP算法,依次为不同长度的对比模式构建零分布,从而过滤冗余对比模式。FSPRP算法通过生成一定数量的置换样本集合构建零分布,FEPRP算法则通过计算每个模式的对比性度量值分布合并建立零分布。实验结果表明,FSPRP和FEPRP算法相较于比较约束法能够过滤较多数量的冗余对比模式,并且FEPRP算法生成的零分布更接近精确零分布。  相似文献   

5.
空间Co-location模式是一组在空间中频繁并置的空间特征的子集。空间Co-location模式挖掘通常假设空间实例之间相互独立,然而,在实际应用中,不同空间特征、不同实例之间往往相互作用或依赖。空间Co-location关键特征是指对模式具有主导作用的特征。在频繁模式中,识别含关键特征的Co-location模式并摘取模式中的关键特征,为用户提供更精简的挖掘结果,提高Co-location模式的可用性,对Co-location模式挖掘具有重要意义。本文首先定义了含有关键特征的显著频繁Co-location模式新概念,以及一系列度量指标以识别显著频繁Co-location模式中的关键特征;其次,给出了一个挖掘显著频繁Co-location模式和关键特征的算法;最后,在模拟和真实数据集上进行了大量的实验,验证了所提出算法的效果及性能。  相似文献   

6.
空间并置(co-location)模式是指其特征的实例在地理空间中频繁并置出现的一组空间特征的集合。传统co-location模式挖掘通常由用户给定一个邻近阈值来确定实例的邻近关系,使用单一的邻近阈值来判定两个空间实例的邻近性可能会造成邻近关系的缺失,也没有考虑距离大小的不同对邻近关系的影响。同时,传统方法主要利用频繁性阈值来衡量模式的频繁性,存在着算法效率对频繁性阈值较为敏感的问题。由于频繁并置的特征间具有较高的邻近度,因此利用聚类算法可以将其聚集在一起,加之邻近以及特征间的并置都是模糊的概念,因此将模糊集理论与聚类算法相结合,研究了空间co-location模式挖掘中的模糊挖掘技术,在定义模糊邻近关系的基础上,定义了度量特征之间邻近度的函数,基于特征邻近度利用模糊聚类算法挖掘co-location模式,最后通过广泛的实验验证了提出方法的实用性、高效性及鲁棒性。  相似文献   

7.
空间并置(co-location)模式是指其实例在空间邻域内频繁共现的空间特征集的子集。现有的空间co-location模式挖掘的有趣性度量指标,没有充分地考虑特征之间以及同一特征的不同实例之间的差异;另外,传统的基于数据驱动的空间co-location模式挖掘方法的结果常常包含大量无用或是用户不感兴趣的知识。针对上述问题,提出一种更为一般的研究对象--带效用值的空间实例,并定义了新的效用参与度(UPI)作为高效用co-location模式的有趣性度量指标;将领域知识形式化为三种语义规则并应用于挖掘过程中,提出一种领域驱动的多次迭代挖掘框架;最后通过大量实验对比分析不同有趣性度量指标下的挖掘结果在效用占比和频繁性两方面的差异,以及引入基于领域知识的语义规则前后挖掘结果的变化情况。实验结果表明所提出的UPI度量是一种兼顾频繁和效用的更为合理的度量指标;同时,领域驱动的挖掘方法能有效地挖掘到用户真正感兴趣的模式。  相似文献   

8.
Rapid growth of spatial datasets requires methods to find (semi-)automatically spatial knowledge from these sets. Spatial collocation patterns represent subsets of spatial features whose instances are frequently located together in a spatial neighborhood. In recent years, efficient methods for collocation discovery have been developed, however, none of them assume limited size of the operational memory or limited access to memory with short access times. Such restrictions are especially important in the context of the large size of the data structures required for efficient identification of collocation instances. In this work we present and compare three algorithms for collocation pattern mining in a limited memory environment. The first algorithm is based on the well-known joinless method introduced by Shekhar and Yoo. The second and third algorithms are inspired by a tree structure (iCPI-tree) presented by Wang et al. In our experimental evaluation, we have compared the efficiency of the algorithms, both on synthetic and real world datasets.  相似文献   

9.
空间极大co-location模式挖掘研究   总被引:1,自引:0,他引:1  
空间co-location模式代表了一组空间特征的子集,它们的实例在空间中频繁地关联。挖掘空间co-location模式的研究已经有很多,但是针对极大co-location模式挖掘的研究非常少。提出了一种新颖的空间极大co-location模式挖掘算法。首先扫描数据集得到二阶频繁模式,然后将二阶频繁模式转换为图,再通过极大团算法求解得到空间特征极大团,最后使用二阶频繁模式的表实例验证极大团得到空间极大co-location频繁模式。实验表明,该算法能够很好地挖掘空间极大co-location频繁模式。  相似文献   

10.
空间并置(co-location)模式挖掘是指在大量的空间数据中发现一组空间特征的子集,这些特征的实例在地理空间中频繁并置出现.传统的空间并置模式挖掘算法通常采用逐阶递增的挖掘框架,从低阶模式开始生成候选模式并计算其参与度(空间并置模式的频繁性度量指标).虽然这种挖掘框架可以得到正确和完整的结果,但是带来的时间和空间开...  相似文献   

11.
Spatial co-location pattern discovery without thresholds   总被引:2,自引:0,他引:2  
Spatial co-location pattern mining discovers the subsets of features whose events are frequently located together in geographic space. The current research on this topic adopts a threshold-based approach that requires users to specify in advance the thresholds of distance and prevalence. However, in practice, it is not easy to specify suitable thresholds. In this article, we propose a novel iterative mining framework that discovers spatial co-location patterns without predefined thresholds. With the absolute and relative prevalence of spatial co-locations, our method allows users to iteratively select informative edges to construct the neighborhood relationship graph until every significant co-location has enough confidence and eventually to discover all spatial co-location patterns. The experimental results on real world data sets indicate that our framework is effective for prevalent co-locations discovery.  相似文献   

12.
空间数据挖掘旨在从空间数据库中发现和提取有价值的潜在知识.空间co-location(共存)模式挖掘一直以来都是空间数据挖掘领域的重要研究方向之一,其目的 是发现一组频繁邻近出现的空间特征子集,而空间高效用co-location模式挖掘则考虑了特征的效用属性.二者在度量空间实例的邻近关系时一般都需要预先给定一个距离阈值...  相似文献   

13.
With the evolution of geographic information capture and the emergency of volunteered geographic information, it is getting more important to extract spatial knowledge automatically from large spatial datasets. Spatial co-location patterns represent the subsets of spatial features whose objects are often located in close geographic proximity. Such pattern is one of the most important concepts for geographic context awareness of location-based services (LBS). In the literature, most existing methods of co-location mining are used for events taking place in a homogeneous and isotropic space with distance expressed as Euclidean, while the physical movement in LBS is usually constrained by a road network. As a result, the interestingness value of co-location patterns involving network-constrained events cannot be accurately computed. In this paper, we propose a different method for co-location mining with network configurations of the geographical space considered. First, we define the network model with linear referencing and refine the neighborhood of traditional methods using network distances rather than Euclidean ones. Then, considering that the co-location mining in networks suffers from expensive spatial-join operation, we propose an efficient way to find all neighboring object pairs for generating clique instances. By comparison with the previous approaches based on Euclidean distance, this approach can be applied to accurately calculate the probability of occurrence of a spatial co-location on a network. Our experimental results from real and synthetic data sets show that the proposed approach is efficient and effective in identifying co-location patterns which actually rely on a network.  相似文献   

14.
Mining spatial colocation patterns: a different framework   总被引:2,自引:0,他引:2  
Recently, there has been considerable interest in mining spatial colocation patterns from large spatial datasets. Spatial colocation patterns represent the subsets of spatial events whose instances are often located in close geographic proximity. Most studies of spatial colocation mining require the specification of two parameter constraints to find interesting colocation patterns. One is a minimum prevalent threshold of colocations, and the other is a distance threshold to define spatial neighborhood. However, it is difficult for users to decide appropriate threshold values without prior knowledge of their task-specific spatial data. In this paper, we propose a different framework for spatial colocation pattern mining. To remove the first constraint, we propose the problem of finding N-most prevalent colocated event sets, where N is the desired number of colocated event sets with the highest interest measure values per each pattern size. We developed two alternative algorithms for mining the N-most patterns. They reduce candidate events effectively and use a filter-and-refine strategy for efficiently finding colocation instances from a spatial dataset. We prove the algorithms are correct and complete in finding the N-most prevalent colocation patterns. For the second constraint, a distance threshold for spatial neighborhood determination, we present various methods to estimate appropriate distance bounds from user input data. The result can help an user to set a distance for a conceptualization of spatial neighborhood. Our experimental results with real and synthetic datasets show that our algorithmic design is computationally effective in finding the N-most prevalent colocation patterns. The discovered patterns were different depending on the distance threshold, which shows that it is important to select appropriate neighbor distances.  相似文献   

15.
空间co-location模式代表的是一组空间特征的子集,它们的实例在空间中频繁的关联。它是空间数据挖掘的一个重要研究方向。首先给出co-location模式的基本概念;然后描述了针对不同数据领域提出的各种算法,并重点分析了算法提出的思路及主要特点;最后对Co-location模式挖掘未来的研究方向作了探讨。  相似文献   

16.
吴军  欧阳艾嘉  张琳 《计算机工程》2021,47(8):45-53,61
传统的对比序列模式挖掘算法存在一定数量的假阳性对比序列模式,其提供的错误信息会干扰后续任务的决策。设计一种IEP-DSP算法过滤假阳性对比序列模式。运用spade方法和WRAcc对比性度量找到候选对比序列模式和所有置换数据集合中的对比序列模式,通过模拟置换过程,使用独立精确置换检验方法为不同长度的模式建立独立精确零分布,并计算每个候选对比序列模式的精确p-value,运用错误发现率度量将各个长度的假阳性对比序列模式数量控制在置信度为α的统计显著水平下。在真实数据集和仿真数据集上的实验结果表明,IEP-DSP算法够过滤掉大量的假阳性对比序列模式,相比基于统计显著性检验的方法能保留更多的真对比序列模式,验证了独立精确置换检验相较于标准置换检验的优越性。  相似文献   

17.
曾新  李晓伟  杨健 《计算机科学》2018,45(Z6):482-486, 464
在实际应用中,空间特征不仅包含空间信息,其特征实例还伴随着属性信息,这些属性信息对知识发现和科学决策具有重大作用。在现有的co-location模式挖掘算法中,计算两个不同特征实例的邻近距离时并未考虑实例不同属性的取值在邻近距离中所占的权重,导致部分属性权重过大,从而影响co-location模式挖掘的结果。对属性取值进行规范化,赋予所有属性相等的权重,并提出基于join-based的数据规范化算法DNRA;同时,对距离阈值范围难以确定的问题进行了深入研究,推导出DNRA算法中距离阈值的取值范围,为用户选择适当的距离阈值提供帮助。最后,通过大量实验对DNRA算法的性能进行了分析比较。  相似文献   

18.
Knowledge and Information Systems - A co-location pattern indicates a group of spatial features whose instances are frequently located together in proximate geographic area. Spatial co-location...  相似文献   

19.
曾新  李晓伟  杨健 《计算机应用》2018,38(2):491-496
大多数空间co-location模式挖掘将距离阈值作为衡量不同对象实例间邻近关系的标准,进而挖掘出频繁co-location模式,并没有考虑具有邻近关系的实例间的相互影响和模式的增益率问题。在空间co-location模式挖掘过程中,引入实例间的相互作用率和对象的季均收益,定义了对象作用率、套间总收益和增益率等概念,并提出挖掘高增益率co-location模式的基础算法(NAGA)和有效的剪枝算法(NAGA_JZ)。最后通过大量的实验来验证基础算法的正确性和实用性,并对基础算法和剪枝算法的挖掘效率进行了对比,验证了剪枝算法的高效性。  相似文献   

20.
Efficient discovery of interesting statements in databases   总被引:3,自引:0,他引:3  
The Explora system supportsDiscovery in Databases by large scale search for interesting instances of statistical patterns. In this paper we describe how Explora assessesinterestingness and achievescomputational efficiency. These problems arise because of the variety of patterns and the immense combinatorial possibilities of generating instances when studying relations between variables in subsets of data. First, the user must be saved from getting overwhelmed with a deluge of findings. To restrict the search with respect to the analysis goals, the user can focus each discovery task performed during an interactive and iterative exploration process. Some basic organization principles of search can further limit the search effort. One principle is to organize search hierarchically and to evaluate first the statistical or information theoretic evidence of the general hypotheses. Then more special hypotheses can be eliminated from further search, if a more general hypothesis was already verified. But this approach alone has some drawbacks and even in moderately sized data does not prevent large sets of findings. Therefore, in a second evaluation phase, further aspects of interestingness are assessed. A refinement strategy selects the most interesting of the statistically significant statements. A second problem for discovery systems is efficiency. Each hypothesis evaluation requires many data accesses. We describe strategies that reduce data accesses and speed up computation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号