期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Effective protocols for kNN search on broadcast multi-dimensional index trees

Chuan-Ming Liu Shu-Yu Fu 《Information Systems》2008

In a wireless mobile environment, data broadcasting provides an efficient way to disseminate data. Via data broadcasting, a server can provide location-based services to a large client population in a wireless environment. Among different location-based services, the k nearest neighbors (kNN) search is important and is used to find the k closest objects to a given point. However, the kNN search in a broadcast environment is particularly challenging due to the sequential access to the data on a broadcast channel. We propose efficient protocols for the kNN search on a broadcast R-tree, which is a popular multi-dimensional index tree, in a wireless broadcast environment in terms of latency and tuning time as well as memory usage. We investigate how a server schedules the broadcast and provide the corresponding kNN search algorithms at the mobile clients. One of our kNN search protocols further allows a kNN search to start at an arbitrary time instance and it can skip the waiting time for the beginning of a broadcast cycle, thereby reducing the latency. The experimental results validate that our mechanisms achieve the objectives. 相似文献

2.

Exact k-NN queries on clustered SVD datasets

Alexander Thomasian Yue Li 《Information Processing Letters》2005,94(6):247-252

Clustered SVD-CSVD, which combines clustering and singular value decomposition (SVD), outperforms SVD applied globally, without first applying clustering. Datasets of feature vectors in various application domains exhibit local correlations, which allow CSVD to attain a higher dimensionality reduction than SVD for the same normalized mean square error. We specify an exact method for processing k-nearest-neighbor queries for CSVD, which ensures 100% recall and is experimentally shown to require less CPU processing time than the approximate method originally specified for CSVD. 相似文献

3.

Continuous reverse k nearest neighbor monitoring on moving objects in road networks

Li Guohui Li Yanhong Li Jianjun LihChyun Shu Yang Fumin 《Information Systems》2010

Continuous reverse k nearest neighbor (CRkNN) monitoring in road networks has recently received increasing attentions. However, there is still a lack of efficient CRkNN algorithms in road networks up to now. In road networks, moving query objects and data objects are restricted by the connectivity of the road network and both the object–query distance and object–object distance updates affect the result of CRkNN queries. In this paper, we present a novel algorithm for continuous and incremental evaluation of CRkNN queries in road networks. Our method is based on a novel data structure called dual layer multiway tree (DLM tree) we proposed to represent the whole monitoring region of a CRkNN query q. We propose several lemmas to reduce the monitoring region of q and the number of candidate objects as much as possible. Moreover, by associating a variable NN_count with each candidate object, we can simplify the monitoring of candidate objects. There are a large number of objects roaming in a road network and many of them are irrelevant to a specific CRkNN query of a query object q. To minimize the processing extension, for a road in the network, we give an IQL list and an IQCL list to specify the set of query objects and data objects whose location updates should be maintained for CRkNN processing of query objects. Our CRkNN method consists of two phase: the initial result generating phase and incremental maintenance phase. In each phase, algorithms with high performance are proposed to make our CRkNN method more efficient. Extensive simulation experiments are conducted and the result shows that our proposed approach is efficient and scalable in processing CRkNN queries in road networks. 相似文献

4.

A direct boosting algorithm for the k-nearest neighbor classifier via local warping of the distance metric

Toh Koon Charlie Neo Dan Ventura 《Pattern recognition letters》2012,33(1):92-102

Though the k-nearest neighbor (k-NN) pattern classifier is an effective learning algorithm, it can result in large model sizes. To compensate, a number of variant algorithms have been developed that condense the model size of the k-NN classifier at the expense of accuracy. To increase the accuracy of these condensed models, we present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted on 10 standard databases from the UCI repository shows that this new Boosted k-NN algorithm has increased generalization accuracy in the majority of the datasets and never performs worse than standard k-NN. 相似文献

5.

Minimizing the sum of the k largest functions in linear time

Wlodzimierz Ogryczak 《Information Processing Letters》2003,85(3):117-122

Given a collection of n functions defined on , and a polyhedral set , we consider the problem of minimizing the sum of the k largest functions of the collection over Q. Specifically we focus on collections of linear functions and several classes of convex, piecewise linear functions which are defined by location models. We present simple linear programming formulations for these optimization models which give rise to linear time algorithms when the dimension d is fixed. Our results improve complexity bounds of several problems reported recently by Tamir [Discrete Appl. Math. 109 (2001) 293-307], Tokuyama [Proc. 33rd Annual ACM Symp. on Theory of Computing, 2001, pp. 75-84] and Kalcsics, Nickel, Puerto and Tamir [Oper. Res. Lett. 31 (1984) 114-127]. 相似文献

6.

An algorithm for computing simple k-factors

Henk Meijer David Rappaport 《Information Processing Letters》2009,109(12):620-625

A k-factor of graph G is defined as a k-regular spanning subgraph of G. For instance, a 2-factor of G is a set of cycles that span G. 2-factors have multiple applications in Graph Theory, Computer Graphics, and Computational Geometry. We define a simple 2-factor as a 2-factor without degenerate cycles. In general, simple k-factors are defined as k-regular spanning subgraphs where no edge is used more than once. We propose a new algorithm for computing simple k-factors for all values of k?2. 相似文献

7.

A Fan-type result on k-ordered graphs

Ruijuan Li 《Information Processing Letters》2010,110(16):651-654

For a positive integer k, a graph G is k-ordered hamiltonian if for every ordered sequence of k vertices there is a hamiltonian cycle that encounters the vertices of the sequence in the given order. In this paper, we show that if G is a ⌊3k/2⌋-connected graph of order n?100k, and d(u)+d(v)?n for any two vertices u and v with d(u,v)=2, then G is k-ordered hamiltonian. Our result implies the theorem of G. Chen et al. [Ars Combin. 70 (2004) 245-255] [1], which requires the degree sum condition for all pairs of non-adjacent vertices, not just those distance 2 apart. 相似文献

8.

The MinMax k-Means clustering algorithm

Grigorios Tzortzis Aristidis Likas 《Pattern recognition》2014

Applying k-Means to minimize the sum of the intra-cluster variances is the most popular clustering approach. However, after a bad initialization, poor local optima can be easily obtained. To tackle the initialization problem of k-Means, we propose the MinMax k-Means algorithm, a method that assigns weights to the clusters relative to their variance and optimizes a weighted version of the k-Means objective. Weights are learned together with the cluster assignments, through an iterative procedure. The proposed weighting scheme limits the emergence of large variance clusters and allows high quality solutions to be systematically uncovered, irrespective of the initialization. Experiments verify the effectiveness of our approach and its robustness over bad initializations, as it compares favorably to both k-Means and other methods from the literature that consider the k-Means initialization problem. 相似文献

9.

A k-populations algorithm for clustering categorical data

Dae-Won Kim KiYoung Lee Kwang H. Lee 《Pattern recognition》2005,38(7):1131-1134

In this paper, the conventional k-modes-type algorithms for clustering categorical data are extended by representing the clusters of categorical data with k-populations instead of the hard-type centroids used in the conventional algorithms. Use of a population-based centroid representation makes it possible to preserve the uncertainty inherent in data sets as long as possible before actual decisions are made. The k-populations algorithm was found to give markedly better clustering results through various experiments. 相似文献

10.

Almost k-wise independence versus k-wise independence

Noga Alon Oded Goldreich Yishay Mansour 《Information Processing Letters》2003,88(3):107-110

We say that a distribution over {0,1}ⁿ is (ε,k)-wise independent if its restriction to every k coordinates results in a distribution that is ε-close to the uniform distribution. A natural question regarding (ε,k)-wise independent distributions is how close they are to some k-wise independent distribution. We show that there exist (ε,k)-wise independent distributions whose statistical distance is at least n^O(k)·ε from any k-wise independent distribution. In addition, we show that for any (ε,k)-wise independent distribution there exists some k-wise independent distribution, whose statistical distance is n^O(k)·ε. 相似文献

11.

k-Anonymous data collection

Sheng Zhong Zhiqiang Yang 《Information Sciences》2009,179(17):2948-2963

To protect individual privacy in data mining, when a miner collects data from respondents, the respondents should remain anonymous. The existing technique of Anonymity-Preserving Data Collection partially solves this problem, but it assumes that the data do not contain any identifying information about the corresponding respondents. On the other hand, the existing technique of Privacy-Enhancing k-Anonymization can make the collected data anonymous by eliminating the identifying information. However, it assumes that each respondent submits her data through an unidentified communication channel. In this paper, we propose k-Anonymous Data Collection, which has the advantages of both Anonymity-Preserving Data Collection and Privacy-Enhancing k-Anonymization but does not rely on their assumptions described above. We give rigorous proofs for the correctness and privacy of our protocol, and experimental results for its efficiency. Furthermore, we extend our solution to the fully malicious model, in which a dishonest participant can deviate from the protocol and behave arbitrarily. 相似文献

12.

Upper signed k-domination in a general graph

Dejan Deli? 《Information Processing Letters》2010,110(16):662-665

Let k be a positive integer, and let G=(V,E) be a graph with minimum degree at least k−1. A function f:V→{−1,1} is said to be a signed k-dominating function (SkDF) if _∑u∈N[v]f(u)?k for every v∈V. An SkDF f of a graph G is minimal if there exists no SkDF g such that g≠f and g(v)?f(v) for every v∈V. The maximum of the values of _∑v∈Vf(v), taken over all minimal SkDFs f, is called the upper signed k-domination numberΓkS(G). In this paper, we present a sharp upper bound on this number for a general graph. 相似文献

13.

Fast modified global k-means algorithm for incremental cluster construction

Adil M. Bagirov^{Author Vitae} Julien Ugon Author VitaeAuthor Vitae 《Pattern recognition》2011,44(4):866-876

The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the modified global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. However, they require storing the whole affinity matrix or computing this matrix at each iteration. This makes both algorithms time consuming and memory demanding for clustering even moderately large datasets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points lying in different parts of the dataset. We exploit information gathered in previous iterations of the incremental algorithm to eliminate the need of computing or storing the whole affinity matrix and thereby to reduce computational effort and memory usage. Results of numerical experiments on six standard datasets demonstrate that the new algorithm is more efficient than the global and the modified global k-means algorithms. 相似文献

14.

The global k-means clustering algorithm

Aristidis LikasAuthor Vitae Nikos VlassisAuthor VitaeJakob J. VerbeekAuthor Vitae 《Pattern recognition》2003,36(2):451-461

We present the global k-means algorithm which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) executions of the k-means algorithm from suitable initial positions. We also propose modifications of the method to reduce the computational load without significantly affecting solution quality. The proposed clustering methods are tested on well-known data sets and they compare favorably to the k-means algorithm with random restarts. 相似文献

15.

Inclusion-exclusion for k-CNF formulas

Kazuyuki Amano 《Information Processing Letters》2003,87(2):111-117

We show that the number of satisfying assignments of a k-CNF formula is determined uniquely from the numbers of unsatisfying assignments for clause-sets of size up to ⌊logk⌋+2. This amount of information is also shown to be necessary. 相似文献

16.

A time-efficient pattern reduction algorithm for k-means clustering

Ming-Chao Chiang Author Vitae Chun-Wei Tsai Author Vitae Chu-Sing Yang Author Vitae 《Information Sciences》2011,181(4):716-3410

This paper presents an efficient algorithm, called pattern reduction (PR), for reducing the computation time of k-means and k-means-based clustering algorithms. The proposed algorithm works by compressing and removing at each iteration patterns that are unlikely to change their membership thereafter. Not only is the proposed algorithm simple and easy to implement, but it can also be applied to many other iterative clustering algorithms such as kernel-based and population-based clustering algorithms. Our experiments—from 2 to 1000 dimensions and 150 to 10,000,000 patterns—indicate that with a small loss of quality, the proposed algorithm can significantly reduce the computation time of all state-of-the-art clustering algorithms evaluated in this paper, especially for large and high-dimensional data sets. 相似文献

17.

Augmented k-ary n-cubes

Yonghong Xiang 《Information Sciences》2011,181(1):239-256

We define an interconnection network AQ_n,k which we call the augmented k-ary n-cube by extending a k-ary n-cube in a manner analogous to the existing extension of an n-dimensional hypercube to an n-dimensional augmented cube. We prove that the augmented k-ary n-cube AQ_n,k has a number of attractive properties (in the context of parallel computing). For example, we show that the augmented k-ary n-cube AQ_n,k: is a Cayley graph, and so is vertex-symmetric, but not edge-symmetric unless n = 2; has connectivity 4n − 2 and wide-diameter at most max{(n − 1)k − (n − 2), k + 7}; has diameter , when n = 2; and has diameter at most , for n ? 3 and k even, and at most , for n ? 3 and k odd. 相似文献

18.

Model-based prediction error uncertainty estimation for k-nn method

Hyon-Jung Kim Erkki Tomppo 《Remote sensing of environment》2006,104(3):257-263

The k-nearest neighbour estimation method is one of the main tools used in multi-source forest inventories. It is a powerful non-parametric method for which estimates are easy to compute and relatively accurate. One downside of this method is that it lacks an uncertainty measure for predicted values and for areas of an arbitrary size. We present a method to estimate the prediction uncertainty based on the variogram model which derives the necessary formula for the k-nn method. A data application is illustrated for multi-source forest inventory data, and the results are compared at pixel level to the conventional RMSE method. We find that the variogram model-based method which is analytic, is competitive with the RMSE method. 相似文献

19.

Improving the performance of k-means for color quantization

M. Emre Celebi 《Image and vision computing》2011,29(4):260-271

Color quantization is an important operation with many applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. However, despite its popularity as a general purpose clustering algorithm, k-means has not received much respect in the color quantization literature because of its high computational requirements and sensitivity to initialization. In this paper, we investigate the performance of k-means as a color quantizer. We implement fast and exact variants of k-means with several initialization schemes and then compare the resulting quantizers to some of the most popular quantizers in the literature. Experiments on a diverse set of images demonstrate that an efficient implementation of k-means with an appropriate initialization strategy can in fact serve as a very effective color quantizer. 相似文献

20.

Distributed computation of the knn graph for large high-dimensional point sets

Erion Plaku Lydia E. Kavraki 《Journal of Parallel and Distributed Computing》2007

High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over 100 processors and indicate that similar speedup can be obtained with several hundred processors. 相似文献