首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
A Joinless Approach for Mining Spatial Colocation Patterns   总被引:9,自引:0,他引:9  
Spatial colocations represent the subsets of features which are frequently located together in geographic space. Colocation pattern discovery presents challenges since spatial objects are embedded in a continuous space, whereas classical data is often discrete. A large fraction of the computation time is devoted to identifying the instances of colocation patterns. We propose a novel joinless approach for efficient colocation pattern mining. The joinless colocation mining algorithm uses an instance-lookup scheme instead of an expensive spatial or an instance join operation for identifying colocation instances. We prove the joinless algorithm is correct and complete in finding colocation rules. We also describe a partial join approach for a spatial data set often clustered in neighborhood areas. We provide the algebraic cost models to characterize the performance dominance zones of the joinless method and the partial join method with a current join-based colocation mining method, and compare their computational complexities. In the experimental evaluation, using synthetic and real-world data sets, our methods performed more efficiently than the join-based method and show more scalability in dense data.  相似文献   

Mining regional co-location patterns with kNNG   总被引:2,自引:0,他引:2  
Spatial co-location pattern mining discovers the subsets of features of which the events are frequently located together in geographic space. The current research on this topic adopts a distance threshold that has limitations in spatial data sets with various magnitudes of neighborhood distances, especially for mining of regional co-location patterns. In this paper, we propose a hierarchical co-location mining framework accounting for both variety of neighborhood distances and spatial heterogeneity. By adopting k-nearest neighbor graph (kNNG) instead of distance threshold, we propose “distance variation coefficient” as a new measure to drive the mining operations and determine an individual neighborhood relationship graph for each region. The proposed mining algorithm outputs a set of regions with each of them an individual set of regional co-location patterns. The experimental results on both synthetic and real world data sets show that our framework is effective to discover these regional co-location patterns.  相似文献   

空间并置(co-location)模式挖掘是指在大量的空间数据中发现一组空间特征的子集,这些特征的实例在地理空间中频繁并置出现.传统的空间并置模式挖掘算法通常采用逐阶递增的挖掘框架,从低阶模式开始生成候选模式并计算其参与度(空间并置模式的频繁性度量指标).虽然这种挖掘框架可以得到正确和完整的结果,但是带来的时间和空间开...  相似文献   

Rapid growth of spatial datasets requires methods to find (semi-)automatically spatial knowledge from these sets. Spatial collocation patterns represent subsets of spatial features whose instances are frequently located together in a spatial neighborhood. In recent years, efficient methods for collocation discovery have been developed, however, none of them assume limited size of the operational memory or limited access to memory with short access times. Such restrictions are especially important in the context of the large size of the data structures required for efficient identification of collocation instances. In this work we present and compare three algorithms for collocation pattern mining in a limited memory environment. The first algorithm is based on the well-known joinless method introduced by Shekhar and Yoo. The second and third algorithms are inspired by a tree structure (iCPI-tree) presented by Wang et al. In our experimental evaluation, we have compared the efficiency of the algorithms, both on synthetic and real world datasets.  相似文献   

空间数据挖掘旨在从空间数据库中发现和提取有价值的潜在知识.空间co-location(共存)模式挖掘一直以来都是空间数据挖掘领域的重要研究方向之一,其目的 是发现一组频繁邻近出现的空间特征子集,而空间高效用co-location模式挖掘则考虑了特征的效用属性.二者在度量空间实例的邻近关系时一般都需要预先给定一个距离阈值...  相似文献   

Mining minimal distinguishing subsequence patterns with gap constraints   总被引:1,自引:4,他引:1  
Discovering contrasts between collections of data is an important task in data mining. In this paper, we introduce a new type of contrast pattern, called a Minimal Distinguishing Subsequence (MDS). An MDS is a minimal subsequence that occurs frequently in one class of sequences and infrequently in sequences of another class. It is a natural way of representing strong and succinct contrast information between two sequential datasets and can be useful in applications such as protein comparison, document comparison and building sequential classification models. Mining MDS patterns is a challenging task and is significantly different from mining contrasts between relational/transactional data. One particularly important type of constraint that can be integrated into the mining process is the gap constraint. We present an efficient algorithm called ConSGapMiner (Contrast Sequences with Gap Miner), to mine all MDSs satisfying a minimum and maximum gap constraint, plus a maximum length constraint. It employs highly efficient bitset and boolean operations, for powerful gap-based pruning within a prefix growth framework. A performance evaluation with both sparse and dense datasets, demonstrates the scalability of ConSGapMiner and shows its ability to mine patterns from high dimensional datasets at low supports.  相似文献   

近年来空间colocation模式挖掘由传统数据扩展到了不确定数据、模糊数据领域,但在模糊数据层面上,只有少量关于对象模糊的研究,而对于模糊空间这一论域的研究还是空白。基于经典的colocation模式挖掘的理论,针对性地提出了面向模糊空间的colocation模式挖掘及相关定义,增加了模糊数据领域内研究的深度和广度,并根据模糊数学理论结合空间colocation挖掘的特点,在模糊距离隶属度函数未知的情况下建立了具有较好适用性的FS基本算法。该算法一改以往在经典数据集上需要验证”团实例”的复杂做法,大大提高了算法性能。在已知模糊距离隶属度函数时,给出一个同时适用于经典数据以及模糊数据的增加数据完整性的通用方法;引进模糊方位,给出完全有别于以往的FS补充算法,增加了数据的完整性,并能实现模糊数据空间向经典数据空间的转换。  相似文献   

Sequential pattern mining has been studied extensively in the data mining community. Most previous studies require the specification of a min_support threshold for mining a complete set of sequential patterns satisfying the threshold. However, in practice, it is difficult for users to provide an appropriate min_support threshold. To overcome this difficulty, we propose an alternative mining task: mining top-k frequent closed sequential patterns of length no less than min_, where k is the desired number of closed sequential patterns to be mined and min_ is the minimal length of each pattern. We mine the set of closed patterns because it is a compact representation of the complete set of frequent patterns. An efficient algorithm, called TSP, is developed for mining such patterns without min_support. Starting at (absolute) min_support=1, the algorithm makes use of the length constraint and the properties of top-k closed sequential patterns to perform dynamic support raising and projected database pruning. Our extensive performance study shows that TSP has high performance. In most cases, it outperforms the efficient closed sequential pattern-mining algorithm, CloSpan, even when the latter is running with the best tuned min_support threshold. Thus, we conclude that, for sequential pattern mining, mining top-k frequent closed sequential patterns without min_support is more preferable than the traditional min_support-based mining.  相似文献   

曾新  李晓伟  杨健 《计算机应用》2018,38(2):491-496
大多数空间co-location模式挖掘将距离阈值作为衡量不同对象实例间邻近关系的标准,进而挖掘出频繁co-location模式,并没有考虑具有邻近关系的实例间的相互影响和模式的增益率问题。在空间co-location模式挖掘过程中,引入实例间的相互作用率和对象的季均收益,定义了对象作用率、套间总收益和增益率等概念,并提出挖掘高增益率co-location模式的基础算法(NAGA)和有效的剪枝算法(NAGA_JZ)。最后通过大量的实验来验证基础算法的正确性和实用性,并对基础算法和剪枝算法的挖掘效率进行了对比,验证了剪枝算法的高效性。  相似文献   

曾新  李晓伟  杨健 《计算机科学》2018,45(Z6):482-486, 464
在实际应用中,空间特征不仅包含空间信息,其特征实例还伴随着属性信息,这些属性信息对知识发现和科学决策具有重大作用。在现有的co-location模式挖掘算法中,计算两个不同特征实例的邻近距离时并未考虑实例不同属性的取值在邻近距离中所占的权重,导致部分属性权重过大,从而影响co-location模式挖掘的结果。对属性取值进行规范化,赋予所有属性相等的权重,并提出基于join-based的数据规范化算法DNRA;同时,对距离阈值范围难以确定的问题进行了深入研究,推导出DNRA算法中距离阈值的取值范围,为用户选择适当的距离阈值提供帮助。最后,通过大量实验对DNRA算法的性能进行了分析比较。  相似文献   

With the evolution of geographic information capture and the emergency of volunteered geographic information, it is getting more important to extract spatial knowledge automatically from large spatial datasets. Spatial co-location patterns represent the subsets of spatial features whose objects are often located in close geographic proximity. Such pattern is one of the most important concepts for geographic context awareness of location-based services (LBS). In the literature, most existing methods of co-location mining are used for events taking place in a homogeneous and isotropic space with distance expressed as Euclidean, while the physical movement in LBS is usually constrained by a road network. As a result, the interestingness value of co-location patterns involving network-constrained events cannot be accurately computed. In this paper, we propose a different method for co-location mining with network configurations of the geographical space considered. First, we define the network model with linear referencing and refine the neighborhood of traditional methods using network distances rather than Euclidean ones. Then, considering that the co-location mining in networks suffers from expensive spatial-join operation, we propose an efficient way to find all neighboring object pairs for generating clique instances. By comparison with the previous approaches based on Euclidean distance, this approach can be applied to accurately calculate the probability of occurrence of a spatial co-location on a network. Our experimental results from real and synthetic data sets show that the proposed approach is efficient and effective in identifying co-location patterns which actually rely on a network.  相似文献   

Success of anomaly detection, similar to other spatial data mining techniques, relies on neighborhood definition. In this paper, we argue that the anomalous behavior of spatial objects in a neighborhood can be truly captured when both (a) spatial autocorrelation (similar behavior of nearby objects due to proximity) and (b) spatial heterogeneity (distinct behavior of nearby objects due to difference in the underlying processes in the region) are taken into consideration for the neighborhood definition. Our approach begins by generating micro neighborhoods around spatial objects encompassing all the information about a spatial object. We selectively merge these based on spatial relationships accounting for autocorrelation and inferential relationships accounting for heterogeneity, forming macro neighborhoods. In such neighborhoods, we then identify (i) spatio-temporal outliers, where individual sensor readings are anomalous, (ii) spatial outliers, where the entire sensor is an anomaly, and (iii) spatio-temporally coalesced outliers, where a group of spatio-temporal outliers in the macro neighborhood are separated by a small time lag indicating the traversal of the anomaly. We demonstrate the effectiveness of our approach in neighborhood formation and anomaly detection with experimental results in (i) water monitoring and (ii) highway traffic monitoring sensor datasets. We also compare the results of our approach with an existing approach for spatial anomaly detection.  相似文献   

Discovering colocation patterns from spatial data sets: a general approach   总被引:12,自引:0,他引:12  
Given a collection of Boolean spatial features, the colocation pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology data set may reveal symbiotic species. The spatial colocation rule problem is different from the association rule problem since there is no natural notion of transactions in spatial data sets which are embedded in continuous geographic space. We provide a transaction-free approach to mine colocation patterns by using the concept of proximity neighborhood. A new interest measure, a participation index, is also proposed for spatial colocation patterns. The participation index is used as the measure of prevalence of a colocation for two reasons. First, this measure is closely related to the cross-K function, which is often used as a statistical measure of interaction among pairs of spatial features. Second, it also possesses an antimonotone property which can be exploited for computational efficiency. Furthermore, we design an algorithm to discover colocation patterns. This algorithm includes a novel multiresolution pruning technique. Finally, experimental results are provided to show the strength of the algorithm and design decisions related to performance tuning.  相似文献   

Multi-way set enumeration in weight tensors   总被引:2,自引:0,他引:2  
The analysis of n-ary relations receives attention in many different fields, for instance biology, web mining, and social studies. In the basic setting, there are n sets of instances, and each observation associates n instances, one from each set. A common approach to explore these n-way data is the search for n-set patterns, the n-way equivalent of itemsets. More precisely, an n-set pattern consists of specific subsets of the n instance sets such that all possible associations between the corresponding instances are observed in the data. In contrast, traditional itemset mining approaches consider only two-way data, namely items versus transactions. The n-set patterns provide a higher-level view of the data, revealing associative relationships between groups of instances. Here, we generalize this approach in two respects. First, we tolerate missing observations to a certain degree, that means we are also interested in n-sets where most (although not all) of the possible associations have been recorded in the data. Second, we take association weights into account. In fact, we propose a method to enumerate all n-sets that satisfy a minimum threshold with respect to the average association weight. Technically, we solve the enumeration task using a reverse search strategy, which allows for effective pruning of the search space. In addition, our algorithm provides a ranking of the solutions and can consider further constraints. We show experimental results on artificial and real-world datasets from different domains.  相似文献   

空间并置(co-location)模式挖掘旨在发现空间特征间的关联关系,是空间数据挖掘的重要研究方向。基于列计算的空间并置模式挖掘方法(CPM-Col算法)避开挖掘过程中最耗时的表实例生成操作,直接搜索模式的参与实例,成为当前高效的方法之一。然而,回溯法搜索参与实例仍是该方法的瓶颈,尤其在稠密数据和长模式下。为加速参与实例的搜索,充分利用CPM-Col算法搜索参与实例时得到的行实例,在不增加额外计算的前提下对CPM-Col算法进行两点改进。首先,将CPM-Col算法搜索到的行实例存储为部分表实例,利用子模式的部分表实例快速确定参与实例,避免了大量实例的回溯计算。其次,在CPM-Col算法获得一条行实例后,利用行实例的子团反作用于第一个特征,得到第一个特征的参与实例,避免了这些实例的回溯搜索。由此,提出了基于改进列计算的空间并置模式挖掘算法(CPM-iCol算法),并讨论了算法的复杂度、正确性和完备性。在合成数据和真实数据集上进行了实验,与经典的传统算法join-less和CPM-Col算法对比,CPM-iCol算法明显降低了挖掘的时间,减少了回溯的次数。实验结果表明,该算法比CPM-Col具有更好的性能和可扩展性,特别在稠密数据集中效果更加明显。  相似文献   

Spatial co-location pattern discovery without thresholds   总被引:2,自引:0,他引:2  
Spatial co-location pattern mining discovers the subsets of features whose events are frequently located together in geographic space. The current research on this topic adopts a threshold-based approach that requires users to specify in advance the thresholds of distance and prevalence. However, in practice, it is not easy to specify suitable thresholds. In this article, we propose a novel iterative mining framework that discovers spatial co-location patterns without predefined thresholds. With the absolute and relative prevalence of spatial co-locations, our method allows users to iteratively select informative edges to construct the neighborhood relationship graph until every significant co-location has enough confidence and eventually to discover all spatial co-location patterns. The experimental results on real world data sets indicate that our framework is effective for prevalent co-locations discovery.  相似文献   

空间co-location(并置)模式是指实例在空间中频繁关联的一组空间特征的子集.在空间数据挖掘中,现有算法主要针对的是正模式的挖掘,而空间中还存在着具有强负相关性的模式,如负co-location模式,这类模式的挖掘在一些应用中同样具有重要的意义.现有的负co-location模式挖掘算法的时间复杂度较高,挖掘到的...  相似文献   

The multivehicle covering tour problem (m‐CTP) is a transportation problem with different kinds of locations, where a set of locations must be visited while another set must be close enough to planned routes. Given two sets of vertices V and W, where V represents the set of vertices that may be visited and W is a set of vertices that must be covered by up to m vehicles, the m‐CTP problem is to minimize vehicle routes on a subset of V including T, which represents the subset of vertices that must be visited through the use of potential locations in V. The variant of m‐CTP without a route‐length constraint is treated in this paper. To tackle this problem, we propose a variable neighborhood search heuristic based on variable neighborhood descent method. Experiments were conducted using the datasets based on traveling salesman problem library instances.  相似文献   

Spatial data mining algorithms heavily depend on the efficient processing of neighborhood relations since the neighbors of many objects have to be investigated in a single run of a typical algorithm. Therefore, providing general concepts for neighborhood relations as well as an efficient implementation of these concepts will allow a tight integration of spatial data mining algorithms with a spatial database management system. This will speed up both, the development and the execution of spatial data mining algorithms. In this paper, we define neighborhood graphs and paths and a small set of database primitives for their manipulation. We show that typical spatial data mining algorithms are well supported by the proposed basic operations. For finding significant spatial patterns, only certain classes of paths “leading away” from a starting object are relevant. We discuss filters allowing only such neighborhood paths which will significantly reduce the search space for spatial data mining algorithms. Furthermore, we introduce neighborhood indices to speed up the processing of our database primitives. We implemented the database primitives on top of a commercial spatial database management system. The effectiveness and efficiency of the proposed approach was evaluated by using an analytical cost model and an extensive experimental study on a geographic database.  相似文献   

杨皓  段磊  胡斌  邓松  王文韬  秦攀 《软件学报》2015,26(11):2994-3009
对比序列模式能够表达序列数据集合间的差异,在商品推荐、用户行为分析和电力供应预测等领域有广泛的应用.已有的对比序列模式挖掘算法需要用户设定正例支持度阈值和负例支持度阈值.在不具备足够先验知识的情况下,用户难以设定恰当的支持度阈值,从而可能错失一些对比显著的模式.为此,提出了带间隔约束的top-k对比序列模式挖掘算法kDSP-Miner(top-k distinguishing sequential patterns with gap constraint miner).kDSP-Miner中用户只需设置期望发现的对比最显著的模式个数,从而避免了直接设置对比支持度阈值.相应地,挖掘算法更容易使用,并且结果更易于解释.同时,为了提高算法执行效率,设计了若干剪枝策略和启发策略.进一步设计了kDSP-Miner的多线程版本,以提高其对高维序列元素情况的处理能力.通过在真实世界数据集上的详实实验,验证了算法的有效性和执行效率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号