首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Clustering entities into dense parts is an important issue in social network analysis. Real social networks usually evolve over time and it remains a problem to efficiently cluster dynamic social networks. In this paper, a dynamic social network is modeled as an initial graph with an infinite change stream, called change stream model, which naturally eliminates the parameter setting problem of snapshot graph model. Based on the change stream model, the incremental version of a well known k-clique clustering problem is studied and incremental k-clique clustering algorithms are proposed based on local DFS (depth first search) forest updating technique. It is theoretically proved that the proposed algorithms outperform corresponding static ones and incremental spectral clustering algorithm in terms of time complexity. The practical performances of our algorithms are extensively evaluated and compared with the baseline algorithms on ENRON and DBLP datasets. Experimental results show that incremental k-clique clustering algorithms are much more efficient than corresponding static ones, and have no accumulating errors that incremental spectral clustering algorithm has and can capture the evolving details of the clusters that snapshot graph model based algorithms miss.  相似文献   

2.
Finding k nearest neighbor objects in spatial databases is a fundamental problem in many geospatial systems and the direction is one of the key features of a spatial object. Moreover, the recent tremendous growth of sensor technologies in mobile devices produces an enormous amount of spatio-directional (i.e., spatially and directionally encoded) objects such as photos. Therefore, an efficient and proper utilization of the direction feature is a new challenge. Inspired by this issue and the traditional k nearest neighbor search problem, we devise a new type of query, called the direction-constrained k nearest neighbor (DCkNN) query. The DCkNN query finds k nearest neighbors from the location of the query such that the direction of each neighbor is in a certain range from the direction of the query. We develop a new index structure called MULTI, to efficiently answer the DCkNN query with two novel index access algorithms based on the cost analysis. Furthermore, our problem and solution can be generalized to deal with spatio-circulant dimensional (such as a direction and circulant periods of time such as an hour, a day, and a week) objects. Experimental results show that our proposed index structure and access algorithms outperform two adapted algorithms from existing kNN algorithms.  相似文献   

3.
We establish a refined search tree technique for the parameterized DOMINATING SET problem on planar graphs. Here, we are given an undirected graph and we ask for a set of at most k vertices such that every other vertex has at least one neighbor in this set. We describe algorithms with running times O(8kn) and O(8kk+n3), where n is the number of vertices in the graph, based on bounded search trees. We describe a set of polynomial time data-reduction rules for a more general “annotated” problem on black/white graphs that asks for a set of k vertices (black or white) that dominate all the black vertices. An intricate argument based on the Euler formula then establishes an efficient branching strategy for reduced inputs to this problem. In addition, we give a family examples showing that the bound of the branching theorem is optimal with respect to our reduction rules. Our final search tree algorithm is easy to implement; its analysis, however, is involved.  相似文献   

4.
Traditional fast k-nearest neighbor search algorithms based on pyramid structures need either many extra memories or long search time. This paper proposes a fast k-nearest neighbor search algorithm based on the wavelet transform, which exploits the important information hiding in the transform coefficients to reduce the computational complexity. The study indicates that the Haar wavelet transform brings two kinds of important pyramids. Two elimination criteria derived from the transform coefficients are used to reject those impossible candidates. Experimental results on texture classification verify the effectiveness of the proposed algorithm.  相似文献   

5.
In this paper, we present and study a class of graph partitioning algorithms that reduces the size of the graph by collapsing vertices and edges, we find ak-way partitioning of the smaller graph, and then we uncoarsen and refine it to construct ak-way partitioning for the original graph. These algorithms compute ak-way partitioning of a graphG= (V,E) inO(|E|) time, which is faster by a factor ofO(logk) than previously proposed multilevel recursive bisection algorithms. A key contribution of our work is in finding a high-quality and computationally inexpensive refinement algorithm that can improve upon an initialk-way partitioning. We also study the effectiveness of the overall scheme for a variety of coarsening schemes. We present experimental results on a large number of graphs arising in various domains including finite element methods, linear programming, VLSI, and transportation. Our experiments show that this new scheme produces partitions that are of comparable or better quality than those produced by the multilevel bisection algorithm and requires substantially smaller time. Graphs containing up to 450,000 vertices and 3,300,000 edges can be partitioned in 256 domains in less than 40 s on a workstation such as SGI's Challenge. Compared with the widely used multilevel spectral bisection algorithm, our new algorithm is usually two orders of magnitude faster and produces partitions with substantially smaller edge-cut.  相似文献   

6.
In this paper, a new approach called ‘instance variant nearest neighbor’ approximates a regression surface of a function using the concept of k nearest neighbor. Instead of fixed k neighbors for the entire dataset, our assumption is that there are optimal k neighbors for each data instance that best approximates the original function by fitting the local regions. This approach can be beneficial to noisy datasets where local regions form data characteristics that are different from the major data clusters. We formulate the problem of finding such k neighbors for each data instance as a combinatorial optimization problem, which is solved by a particle swarm optimization. The particle swarm optimization is extended with a rounding scheme that rounds up or down continuous-valued candidate solutions to integers, a number of k neighbors. We apply our new approach to five real-world regression datasets and compare its prediction performance with other function approximation algorithms, including the standard k nearest neighbor, multi-layer perceptron, and support vector regression. We observed that the instance variant nearest neighbor outperforms these algorithms in several datasets. In addition, our new approach provides consistent outputs with five datasets where other algorithms perform poorly.  相似文献   

7.
Tabucol   is a tabu search algorithm that tries to determine whether the vertices of a given graph can be colored with a fixed number kk of colors such that no edge has both endpoints with the same color. This algorithm was proposed in 1987, one year after Fred Glover's article that launched tabu search. While more performing local search algorithms have now been proposed, Tabucol remains very popular and is often chosen as a subroutine in hybrid algorithms that combine a local search with a population based method. In order to explain this unfailing success, we make a thorough survey of local search techniques for graph coloring problems, and we point out the main differences between all these techniques.  相似文献   

8.
K nearest neighbor and Bayesian methods are effective methods of machine learning. Expectation maximization is an effective Bayesian classifier. In this work a data elimination approach is proposed to improve data clustering. The proposed method is based on hybridization of k nearest neighbor and expectation maximization algorithms. The k nearest neighbor algorithm is considered as the preprocessor for expectation maximization algorithm to reduce the amount of training data making it difficult to learn. The suggested method is tested on well-known machine learning data sets iris, wine, breast cancer, glass and yeast. Simulations are done in MATLAB environment and performance results are concluded.  相似文献   

9.
In this paper, we propose a modified version of the k-nearest neighbor (kNN) algorithm. We first introduce a new affinity function for distance measure between a test point and a training point which is an approach based on local learning. A new similarity function using this affinity function is proposed next for the classification of the test patterns. The widely used convention of k, i.e., k = [√N] is employed, where N is the number of data used for training purpose. The proposed modified kNN algorithm is applied on fifteen numerical datasets from the UCI machine learning data repository. Both 5-fold and 10-fold cross-validations are used. The average classification accuracy, obtained from our method is found to exceed some well-known clustering algorithms.  相似文献   

10.
Recently, negative databases (NDBs) are proposed for privacy protection. Similar to the traditional databases, some basic operations could be conducted over the NDBs, such as select, intersection, update, delete and so on. However, both classifying and clustering in negative databases have not yet been studied. Therefore, two algorithms, i.e., a k nearest neighbor (kNN) classification algorithm and a k-means clustering algorithm in NDBs, are proposed in this paper, respectively. The core of these two algorithms is a novelmethod for estimating the Hamming distance between a binary string and an NDB. Experimental results demonstrate that classifying and clustering in NDBs are promising.  相似文献   

11.
Due to the advancement of wireless internet and mobile positioning technology, the application of location-based services (LBSs) has become popular for mobile users. Since users have to send their exact locations to obtain the service, it may lead to several privacy threats. To solve this problem, a cloaking method has been proposed to blur users’ exact locations into a cloaked spatial region with a required privacy threshold (k). With the cloaked region, an LBS server can carry out a k-nearest neighbor (k-NN) search algorithm. Some recent studies have proposed methods to search k-nearest POIs while protecting a user’s privacy. However, they have at least one major problem, such as inefficiency on query processing or low precision of retrieved result. To resolve these problems, in this paper, we propose a novel k-NN query processing algorithm for a cloaking region to satisfy both requirements of fast query processing time and high precision of the retrieved result. To achieve fast query processing time, we propose a new pruning technique based on a 2D-coodinate scheme. In addition, we make use of a Voronoi diagram for retrieving the nearest POIs efficiently. To satisfy the requirement of high precision of the retrieved result, we guarantee that our k-NN query processing algorithm always contains the exact set of k nearest neighbors. Our performance analysis shows that our algorithm achieves better performance in terms of query processing time and the number of candidate POIs compared with other algorithms.  相似文献   

12.
Similarity search in graph databases has been widely investigated. It is worthwhile to develop a fast algorithm to support similarity search in large-scale graph databases. In this paper, we investigate a k-NN (k-Nearest Neighbor) similarity search problem by locality sensitive hashing (LSH). We propose an innovative fast graph search algorithm named LSH-GSS, which first transforms complex graphs into vectorial representations based on prototypes in the database and later accelerates a query in Euclidean space by employing LSH. Because images can be represented as attributed graphs, we propose an approach to transform attributed graphs into n-dimensional vectors and apply LSH-GSS to execute further image retrieval. Experiments on three real graph datasets and two image datasets show that our methods are highly accurate and efficient.  相似文献   

13.
With fast growing number of images on photo-sharing websites such as Flickr and Picasa, it is in urgent need to develop scalable multi-label propagation algorithms for image indexing, management and retrieval. It has been well acknowledged that analysis in semantic region level may greatly improve image annotation performance compared to that in the holistic image level. However, region level approach increases the data scale to several orders of magnitude and proposes new challenges to most existing algorithms. In this work, we present a novel framework to effectively compute pairwise image similarity by accumulating the information of semantic image regions. Firstly, each image is encoded as Bag-of-Regions based on multiple image segmentations. Secondly, all image regions are separated into buckets with efficient locality-sensitive hashing (LSH) method, which guarantees high collision probabilities for similar regions. The k-nearest neighbors of each image and the corresponding similarities can be efficiently approximated with these indexed patches. Lastly, the sparse and region-aware image similarity matrix is fed into the multi-label extension of the entropic graph regularized semi-supervised learning algorithm [1]. In combination they naturally yield the capability of handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets validate the effectiveness and efficiency of our proposed framework for region-aware and scalable multi-label propagation.  相似文献   

14.
Baljeet  Ioanis  Janelle   《Computer Networks》2008,52(13):2582-2593
Target tracking is an important application for wireless sensor networks. One important aspect of tracking is target classification. Classification helps in selecting particular target(s) of interest. In this paper, we address the problem of classification of moving ground vehicles. The basis of classification are the audible signals produced by these vehicles. We present a distributed framework to classify vehicles based on features extracted from acoustic signals of vehicles. The main features used in our study are based on FFT (fast Fourier transform) and PSD (power spectral density). We propose three distributed algorithms for classification that are based on the k-nearest neighbor (k-NN) classification method. An experimental study has been conducted using real acoustic signals of different vehicles recorded in the city of Edmonton. We compare our proposed algorithms with a naive distributed implementation of the k-NN algorithm. Performance results reveal that our proposed algorithms are energy efficient, and thus suitable for sensor network deployment.  相似文献   

15.
We study in this paper the problem of finding in a graph a subset of k edges whose deletion causes the largest increase in the weight of a minimum spanning tree. We propose for this problem an explicit enumeration algorithm whose complexity, when compared to the current best algorithm, is better for general k but very slightly worse for fixed k. More interestingly, unlike in the previous algorithms, we can easily adapt our algorithm so as to transform it into an implicit enumeration algorithm based on a branch and bound scheme. We also propose a mixed integer programming formulation for this problem. Computational results show a clear superiority of the implicit enumeration algorithm both over the explicit enumeration algorithm and the mixed integer program.  相似文献   

16.
Mining regional co-location patterns with kNNG   总被引:2,自引:0,他引:2  
Spatial co-location pattern mining discovers the subsets of features of which the events are frequently located together in geographic space. The current research on this topic adopts a distance threshold that has limitations in spatial data sets with various magnitudes of neighborhood distances, especially for mining of regional co-location patterns. In this paper, we propose a hierarchical co-location mining framework accounting for both variety of neighborhood distances and spatial heterogeneity. By adopting k-nearest neighbor graph (kNNG) instead of distance threshold, we propose “distance variation coefficient” as a new measure to drive the mining operations and determine an individual neighborhood relationship graph for each region. The proposed mining algorithm outputs a set of regions with each of them an individual set of regional co-location patterns. The experimental results on both synthetic and real world data sets show that our framework is effective to discover these regional co-location patterns.  相似文献   

17.
In this paper, we consider how to recover the structure of a Bayesian network from a moral graph. We present a more accurate characterization of moral edges, based on which a complete subset (i.e., a separator) contained in the neighbor set of one vertex of the putative moral edge in some prime block of the moral graph can be chosen. This results in a set of separators needing to be searched generally smaller than the sets required by some existing algorithms. A so-called structure-finder algorithm is proposed for structural learning. The complexity analysis of the proposed algorithm is discussed and compared with those for several existing algorithms. We also demonstrate how to construct the moral graph locally from, separately, the Markov blanket, domain knowledge and d-separation trees. Simulation studies are used to evaluate the performances of various strategies for structural learning. We also analyze a gene expression data set by using the structure-finder algorithm.  相似文献   

18.
Recent development of wireless communication technologies and the popularity of smart phones are making location-based services (LBS) popular. However, requesting queries to LBS servers with users’ exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an effcient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To effciently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.  相似文献   

19.
In multimedia databases, k-nearest neighbor queries are popular and frequently contain non-spatial predicates. Among the available techniques for such queries, the incremental nearest neighbor algorithm proposed by Hjaltason and Samet is known as the most useful algorithm [16]. The reason is that if k > k neighbors are needed, it can provide the next neighbor for the upper operator without restarting the query from scratch. However, the R-tree in their algorithm has no facility capable of partially pruning tuple candidates that will turn out not to satisfy the remaining predicates, leading their algorithm to inefficiency. In this paper, we propose an RS-tree-based incremental nearest neighbor algorithm complementary to their algorithm. The RS-tree used in our algorithm is a hybrid of the R-tree and the S-tree, as its buddy tree, based on the hierarchical signature file. Experimental results show that our RS-tree enhances the performance of Hjaltason and Samet's algorithm.  相似文献   

20.
High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over 100 processors and indicate that similar speedup can be obtained with several hundred processors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号