首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Research and applications on georeferenced multimedia: a survey   总被引:1,自引:0,他引:1  
In recent years, the emergence of georeferenced media, like geotagged photos, on the Internet has opened up a new world of possibilities for geographic related research and applications. Despite of its short history, georeferenced media has been attracting attentions from several major research communities of Computer Vision, Multimedia, Digital Libraries and KDD. This paper provides a comprehensive survey on recent research and applications on online georeferenced media. Specifically, the survey focuses on four aspects: (1) organizing and browsing georeferenced media resources, (2) mining semantic/social knowledge from georeferenced media, (3) learning landmarks in the world, and (4) estimating geographic location of a photo. Furthermore, based on the current technical achievements, open research issues and challenges are identified, and directions that can lead to compelling applications are suggested.  相似文献   

2.
针对传统图像检索系统通过关键字搜索图像时缺乏语义主题多样性的问题,提出了一种基于互近邻一致性和近邻传播的代表性图像选取算法,为每个查询选取与其相关的不同语义主题的图像集合. 该算法利用互近邻一致性调整图像间的相似度,再进行近邻传播(AP)聚类将图像集分为若干簇,最后通过簇排序选取代表性图像簇并从中选取中心图像为代表性图像. 实验表明,本文方法的性能超过基于K-means的方法和基于Greedy K-means的方法,所选图像能直观有效地概括源图像集的内容,并且在语义上多样化.  相似文献   

3.
Mining geo-tagged social photo media has received large amounts of attention from researchers recently. Points of interest (POI) mining from a collection of geo-tagged photos is one of these problems. POI mining refers to the processes of pattern recognition (namely clustering), extraction and semantic annotation. However, based on unsupervised clustering methods, many POIs might not be mined. Additionally, there is a great challenge for the proper semantic annotation to data clusters after clustering. In practice, there are many applications which require the accuracy of semantic annotation and high quality of pattern recognition such as POI recommendation. In this paper, we study POI mining from a collection of geo-tagged photos in combination with proper semantic annotation by using additional POI information from high coverage external POI databases. We propose a novel POI mining framework by using two-level clustering, random walk and constrained clustering. In random walk clustering step, we separate a large-scale collection of geo-tagged photos into many clusters. In the constrained clustering step, we continue to divide the clusters that include many POIs into many sub-clusters, where the geo-tagged photos in a sub-cluster associate with a particular POI. Experimental results on two datasets of geo-tagged Flickr photos of two cities in California, USA have shown that the proposed method substantially outperforms existing approaches that are adapted to handle the problem.  相似文献   

4.
Clustering is a useful data mining technique which groups data points such that the points within a single group have similar characteristics, while the points in different groups are dissimilar. Density-based clustering algorithms such as DBSCAN and OPTICS are one kind of widely used clustering algorithms. As there is an increasing trend of applications to deal with vast amounts of data, clustering such big data is a challenging problem. Recently, parallelizing clustering algorithms on a large cluster of commodity machines using the MapReduce framework have received a lot of attention.In this paper, we first propose the new density-based clustering algorithm, called DBCURE, which is robust to find clusters with varying densities and suitable for parallelizing the algorithm with MapReduce. We next develop DBCURE-MR, which is a parallelized DBCURE using MapReduce. While traditional density-based algorithms find each cluster one by one, our DBCURE-MR finds several clusters together in parallel. We prove that both DBCURE and DBCURE-MR find the clusters correctly based on the definition of density-based clusters. Our experimental results with various data sets confirm that DBCURE-MR finds clusters efficiently without being sensitive to the clusters with varying densities and scales up well with the MapReduce framework.  相似文献   

5.
基于GIS的空间位置关系聚类研究与应用   总被引:6,自引:0,他引:6  
李宁宁  刘玉树 《微机发展》2004,14(6):8-9,12
聚类分析是空间数据挖掘的一种方法,聚类算法能从空间数据库中直接发现一些有用的聚类结构。为实现空间复杂地理对象的聚类分析,文中给出了GIS及空间聚类分析技术.介绍了一种基于GIS的空间位置关系聚类分析算法。该算法按照空间相邻关系,将空间相邻的空间目标聚类成一类。在具体应用实例中利用本聚类算法将空间上相邻的大片阵地地域连成整块地域,并去掉了不符合条件的小块地域,形成阵地聚类,取得了满意的效果.实现了能够发现任意形状、并满足特定约束条件的聚类。  相似文献   

6.
7.
Increasing geotagged social media data has become a potential repository used to find common trajectory patterns. Various spatial trajectory behaviors have been studied in previous work. In this paper, we extract common trajectory patterns on semantic level. We enrich trajectories with additional contextual semantic annotations and propose a density-based method to find semantic common trajectory patterns with a novel similarity measure method. Real geotagged photo data are used in our experiments. Experimental results demonstrate that our methods are able to generate semantic common trajectory patterns.  相似文献   

8.
As one of the most fundamental yet important methods of data clustering, center-based partitioning approach clusters the dataset into k subsets, each of which is represented by a centroid or medoid. In this paper, we propose a new medoid-based k-partitions approach called Clustering Around Weighted Prototypes (CAWP), which works with a similarity matrix. In CAWP, each cluster is characterized by multiple objects with different representative weights. With this new cluster representation scheme, CAWP aims to simultaneously produce clusters of improved quality and a set of ranked representative objects for each cluster. An efficient algorithm is derived to alternatingly update the clusters and the representative weights of objects with respect to each cluster. An annealing-like optimization procedure is incorporated to alleviate the local optimum problem for better clustering results and at the same time to make the algorithm less sensitive to parameter setting. Experimental results on benchmark document datasets show that, CAWP achieves favorable effectiveness and efficiency in clustering, and also provides useful information for cluster-specified analysis.  相似文献   

9.
Clustering in Dynamic Spatial Databases   总被引:2,自引:0,他引:2  
Efficient clustering in dynamic spatial databases is currently an open problem with many potential applications. Most traditional spatial clustering algorithms are inadequate because they do not have an efficient support for incremental clustering.In this paper, we propose DClust, a novel clustering technique for dynamic spatial databases. DClust is able to provide multi-resolution view of the clusters, generate arbitrary shapes clusters in the presence of noise, generate clusters that are insensitive to ordering of input data and support incremental clustering efficiently. DClust utilizes the density criterion that captures arbitrary cluster shapes and sizes to select a number of representative points, and builds the Minimum Spanning Tree (MST) of these representative points, called R-MST. After the initial clustering, a summary of the cluster structure is built. This summary enables quick localization of the effect of data updates on the current set of clusters. Our experimental results show that DClust outperforms existing spatial clustering methods such as DBSCAN, C2P, DENCLUE, Incremental DBSCAN and BIRCH in terms of clustering time and accuracy of clusters found.  相似文献   

10.
Major challenges of clustering geo-referenced data include identifying arbitrarily shaped clusters, properly utilizing spatial information, coping with diverse extrinsic characteristics of clusters and supporting region discovery tasks. The goal of region discovery is to identify interesting regions in geo-referenced datasets based on a domain expert’s notion of interestingness. Almost all agglomerative clustering algorithms only focus on the first challenge. The goal of the proposed work is to develop agglomerative clustering frameworks that deal with all four challenges. In particular, we propose a generic agglomerative clustering framework for geo-referenced datasets (GAC-GEO) generalizing agglomerative clustering by allowing for three plug-in components. GAC-GEO agglomerates neighboring clusters maximizing a plug-in fitness function that capture the notion of interestingness of clusters. It enhances typical agglomerative clustering algorithms in two ways: fitness functions support task-specific clustering, whereas generic neighboring relationships increase the number of merging candidates. We also demonstrate that existing agglomerative clustering algorithms can be considered as specific cases of GAC-GEO. We evaluate the proposed framework on an artificial dataset and two real-world applications involving region discovery. The experimental results show that GAC-GEO is capable of identifying arbitrarily shaped hotspots for different data mining tasks.  相似文献   

11.
聚类作为一种无监督的学习方法,通常需要人为地提供聚类的簇数。在先验知识缺乏的情况下,通过人为指定聚类参数是不合实际的。近年来研究的聚类有效性函数(Cluster Validity Index) 用于估计簇的数目及聚类效果的优劣。本文提出了一种新的基于有效性指数的聚类算法,无需提供聚类的参数。算法每步合并两个簇,使有效性指数值增加最大或减小最少。本文运用引力模型度量相似度,对可能出现的异常点情况作均匀化的处理。实验表明,本文的算法能正确发现特定数据的簇个数,和其它聚类方法比较,聚类结果具有较低的错误率,并在效率上优于一般的基于有效性指数的聚类算法。  相似文献   

12.
“Best K”: critical clustering structures in categorical datasets   总被引:2,自引:2,他引:0  
The demand on cluster analysis for categorical data continues to grow over the last decade. A well-known problem in categorical clustering is to determine the best K number of clusters. Although several categorical clustering algorithms have been developed, surprisingly, none has satisfactorily addressed the problem of best K for categorical clustering. Since categorical data does not have an inherent distance function as the similarity measure, traditional cluster validation techniques based on geometric shapes and density distributions are not appropriate for categorical data. In this paper, we study the entropy property between the clustering results of categorical data with different K number of clusters, and propose the BKPlot method to address the three important cluster validation problems: (1) How can we determine whether there is significant clustering structure in a categorical dataset? (2) If there is significant clustering structure, what is the set of candidate “best Ks”? (3) If the dataset is large, how can we efficiently and reliably determine the best Ks?  相似文献   

13.
K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed to overcome this problem and has been shown to have better accuracy and computational efficiency than k-means. In many clustering problems though – such as when classifying georeferenced data for mapping applications – standardization of clustering methodology, specifically, the ability to arrive at the same cluster assignment for every run of the method i.e. replicability of the methodology, may be of greater significance than any perceived measure of accuracy, especially when the solution is known to be non-unique, as in the case of k-means clustering. Here we propose a simple initial seed selection algorithm for k-means clustering along one attribute that draws initial cluster boundaries along the “deepest valleys” or greatest gaps in dataset. Thus, it incorporates a measure to maximize distance between consecutive cluster centers which augments the conventional k-means optimization for minimum distance between cluster center and cluster members. Unlike existing initialization methods, no additional parameters or degrees of freedom are introduced to the clustering algorithm. This improves the replicability of cluster assignments by as much as 100% over k-means and k-means++, virtually reducing the variance over different runs to zero, without introducing any additional parameters to the clustering process. Further, the proposed method is more computationally efficient than k-means++ and in some cases, more accurate.  相似文献   

14.
Clustering is a standard approach for achieving efficient and scalable performance in wireless sensor networks. Traditionally, clustering algorithms aim at generating a number of disjoint clusters that satisfy some criteria. In this paper, we formulate a novel clustering problem that aims at generating overlapping multihop clusters. Overlapping clusters are useful in many sensor network applications, including intercluster routing, node localization, and time synchronization protocols. We also propose a randomized, distributed multihop clustering algorithm (KOCA) for solving the overlapping clustering problem. KOCA aims at generating connected overlapping clusters that cover the entire sensor network with a specific average overlapping degree. Through analysis and simulation experiments, we show how to select the different values of the parameters to achieve the clustering process objectives. Moreover, the results show that KOCA produces approximately equal-sized clusters, which allow distributing the load evenly over different clusters. In addition, KOCA is scalable; the clustering formation terminates in a constant time regardless of the network size.  相似文献   

15.
针对聚类分析算法在数据挖掘应用中存在的问题,该文结合遗传算法,对传统K均值聚类算法进行了改进,提出了混合类型数据聚类新算法,扩展了聚类分析的应用范围。实验结果表明,该算法具有较好的聚类性能。  相似文献   

16.
In this paper we introduce a class of fuzzy clusterwise regression models with LR fuzzy response variable and numeric explanatory variables, which embodies fuzzy clustering, into a fuzzy regression framework. The model bypasses the heterogeneity problem that could arise in fuzzy regression by subdividing the dataset into homogeneous clusters and performing separate fuzzy regression on each cluster. The integration of the clustering model into the regression framework allows us to simultaneously estimate the regression parameters and the membership degree of each observation to each cluster by optimizing a single objective function. The class of models proposed here includes, as special cases, the fuzzy clusterwise linear regression model and the fuzzy clusterwise polynomial regression model. We also introduce a set of goodness of fit indices to evaluate the fit of the regression model within each cluster as well as in the whole dataset. Finally, we consider some cluster validity criteria that are useful in identifying the “optimal” number of clusters. Several applications are provided in order to illustrate the approach.  相似文献   

17.
聚类是大数据分析与数据挖掘的基础问题。刊登在2014年《Science》杂志上的文章《Clustering by fast search and find of density peaks》提出一种快速搜索密度峰值的聚类算法,算法简单实用,但聚类结果依赖于参数dc的经验选择。论文提出一种改进的搜索密度峰值的聚类算法,引入密度估计熵自适应优化算法参数。对比实验结果表明,改进方法不仅可以较好地解决原算法的参数人为确定的不足,而且具有相对更好的聚类性能。  相似文献   

18.
不平衡数据分类问题是数据挖掘领域的关键挑战之一。过抽样方法是解决不平衡分类问题的一种有效手段。传统过抽样方法没有考虑类内不平衡,为此提出基于改进谱聚类的过抽样方法。该方法首先自动确定聚类簇数,并对少数类样本进行谱聚类,再根据各类内包含样本数与总少数类样本数之比,确定在类内合成的样本数量,最后通过在类内进行过抽样,获得平衡的新数据集。在4个实际数据集上验证了算法的有效性。并在二维合成数据集上对比k均值聚类和改进谱聚类的结果,解释基于两种不同聚类的过抽样算法性能差异的原因。  相似文献   

19.
External validation indexes allow similarities between two clustering solutions to be quantified. With classical external indexes, it is possible to quantify how similar two disjoint clustering solutions are, where each object can only belong to a single cluster. However, in practical applications, it is common for an object to have more than one label, thereby belonging to overlapped clusters; for example, subjects that belong to multiple communities in social networks. In this study, we propose a new index based on an intuitive probabilistic approach that is applicable to overlapped clusters. Given that recently there has been a remarkable increase in the analysis of data with naturally overlapped clusters, this new index allows to comparing clustering algorithms correctly. After presenting the new index, experiments with artificial and real datasets are shown and analyzed. Results over a real social network are also presented and discussed. The results indicate that the new index can correctly measure the similarity between two partitions of the dataset when there are different levels of overlap in the analyzed clusters.  相似文献   

20.
Image content clustering is an effective way to organize large databases thereby making the content based image retrieval process much easier. However, clustering of images with varied background and foreground is quite challenging. In this paper, we propose a novel image content clustering paradigm suitable for clustering large and diverse image databases. In our approach images are represented in a continuous domain based on a probabilistic Gaussian Mixture Model (GMM) with the images modeled as mixture of Gaussian distributions in the selected feature space. The distance metric between the Gaussian distributions is defined in the sense of Kullback–Leibler (KL) divergence. The clustering is done using a semi-supervised learning framework where labeled data in the form of cluster templates is used to classify the unlabelled data. The clusters are formed around initially chosen seeds and are updated in the due course based on user inputs. In our clustering approach the user interaction is done in a structured way as to get maximum inputs from the user in a limited time. We propose two methods to carry out the structured user interaction using which the cluster templates are updated to improve the quality of the clusters formed. The proposed method is experimentally evaluated on benchmark datasets that are specifically chosen to include a wide variation of images around a common theme that is typically encountered in applications like photo-summarization and poses a major semantic gap challenge to conventional clustering approaches. The experimental results presented demonstrate the effectiveness of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号