期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Probabilistic enhancement of approximate indexing in metric spaces

Takao Murakami Kenta Takahashi Susumu Serita Yasuhiro Fujii 《Information Systems》2013

Some approximate indexing schemes have been recently proposed in metric spaces which sort the objects in the database according to pseudo-scores. It is known that (1) some of them provide a very good trade-off between response time and accuracy, and (2) probability-based pseudo-scores can provide an optimal trade-off in range queries if the probabilities are correctly estimated. Based on these facts, we propose a probabilistic enhancement scheme which can be applied to any pseudo-score based scheme. Our scheme computes probability-based pseudo-scores using pseudo-scores obtained from a pseudo-score based scheme. In order to estimate the probability-based pseudo-scores, we use the object-specific parameters in logistic regression and learn the parameters using MAP (Maximum a Posteriori) estimation and the empirical Bayes method. We also propose a technique which speeds up learning the parameters using pseudo-scores. We applied our scheme to the two state-of-the-art schemes: the standard pivot-based scheme and the permutation-based scheme, and evaluated them using various kinds of datasets from the Metric Space Library. The results showed that our scheme outperformed the conventional schemes, with regard to both the number of distance computations and the CPU time, in all the datasets. 相似文献

2.

Quicker range- and k-NN joins in metric spaces

《Information Systems》2015

相似文献

3.

CM-tree: A dynamic clustered index for similarity search in metric databases

Lior Israel 《Data & Knowledge Engineering》2007,63(3):919-946

Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs. 相似文献

4.

Exploiting distance coherence to speed up range queries in metric indexes

Kimmo Fredriksson 《Information Processing Letters》2005,95(1):287-292

相似文献

5.

Classification of gene-expression data: The manifold-based metric learning way

Jianguo Changshui 《Pattern recognition》2006,39(12):2450-2463

Classification of microarray gene-expression data can potentially help medical diagnosis, and becomes an important topic in bioinformatics. However, microarray data sets are usually of small sample size relative to an overwhelming number of genes. This makes the classification problem fairly challenging. Instance-based learning (IBL) algorithms, such as nearest neighbor (k-NN), are usually the baseline algorithm due to their simplicity. However, practices show that k-NN performs not very well in this field. This paper introduces manifold-based metric learning to improve the performance of IBL methods. A novel metric learning algorithm is proposed by utilizing both local manifold structural information and local discriminant information. In addition, a random subspace extension is also presented. We apply the proposed algorithm to the gene-classification problem in three ways: one in the original feature space, another in the reduced feature space, and the third via the random subspace extension. Statistical evaluation shows that the proposed algorithm can achieve promising results, and gain significant performance improvement over traditional IBL algorithms. 相似文献

6.

Region–based theory of discrete spaces: A proximity approach

Ivo Düntsch Dimiter Vakarelov 《Annals of Mathematics and Artificial Intelligence》2007,49(1-4):5-14

We introduce Boolean proximity algebras as a generalization of Efremovič proximities which are suitable in reasoning about discrete regions. Following Stone’s representation theorem for Boolean algebras, it is shown that each such algebra is isomorphic to a substructure of a complete and atomic Boolean proximity algebra. Co-operation was supported by EC COST Action 274 “Theory and Applications of Relational Structures as Knowledge Instruments” (TARSKI), , and NATO Collaborative Linkage Grant PST.CLG 977641. 相似文献

7.

Ptolemaic access methods: Challenging the reign of the metric space model

Magnus Lie Hetland Tomáš Skopal Jakub Lokoč Christian Beecks 《Information Systems》2013

Metric indexing is the state of the art in general distance-based retrieval. Relying on the triangular inequality, metric indexes achieve significant online speed-up beyond a linear scan. Recently, the idea of Ptolemaic indexing was introduced, which substitutes Ptolemy's inequality for the triangular one, potentially yielding higher efficiency for the distances where it applies. In this paper we have adapted several metric indexes to support Ptolemaic indexing, thus establishing a class of Ptolemaic access methods (PtoAM). In particular, we include Ptolemaic Pivot tables, Ptolemaic PM-Trees and the Ptolemaic M-Index. We also show that the most important and promising family of distances suitable for Ptolemaic indexing is the signature quadratic form distance, an adaptive similarity measure which can cope with flexible content representations of multimedia data, among other things. While this distance has shown remarkable qualities regarding the search effectiveness, its high computational complexity underscores the need for efficient search methods. We show that these distances are Ptolemaic metrics and present a study where we apply Ptolemaic indexing methods on real-world image databases, resolving exact queries nearly four times as fast as the state-of-the-art metric solution, and up to three orders of magnitude times as fast as sequential scan. 相似文献

8.

Improving the space cost of <Emphasis Type="Italic">k</Emphasis>-NN search in metric spaces by using distance estimators

Benjamin Bustos Gonzalo Navarro 《Multimedia Tools and Applications》2009,41(2):215-233

Similarity searching in metric spaces has a vast number of applications in several fields like multimedia databases, text retrieval, computational biology, and pattern recognition. In this context, one of the most important similarity queries is the k nearest neighbor (k-NN) search. The standard best-first k-NN algorithm uses a lower bound on the distance to prune objects during the search. Although optimal in several aspects, the disadvantage of this method is that its space requirements for the priority queue that stores unprocessed clusters can be linear in the database size. Most of the optimizations used in spatial access methods (for example, pruning using MinMaxDist) cannot be applied in metric spaces, due to the lack of geometric properties. We propose a new k-NN algorithm that uses distance estimators, aiming to reduce the storage requirements of the search algorithm. The method stays optimal, yet it can significantly prune the priority queue without altering the output of the query. Experimental results with synthetic and real datasets confirm the reduction in storage space of our proposed algorithm, showing savings of up to 80% of the original space requirement.

Gonzalo NavarroEmail:

Benjamin Bustos is an assistant professor in the Department of Computer Science at the University of Chile. He is also a researcher at the Millennium Nucleus Center for Web Research. His research interests are similarity searching and multimedia information retrieval. He has a doctoral degree in natural sciences from the University of Konstanz, Germany. Contact him at bebustos@dcc.uchile.cl. Gonzalo Navarro earned his PhD in Computer Science at the University of Chile in 1998, where he is now Full Professor. His research interests include similarity searching, text databases, compression, and algorithms and data structures in general. He has coauthored a book on string matching and around 200 international papers. He has (co)chaired international conferences SPIRE 2001, SCCC 2004, SPIRE 2005, SIGIR Posters 2005, IFIP TCS 2006, and ENC 2007 Scalable Pattern Recognition track; and belongs to the Editorial Board of Information Retrieval Journal. He is currently Head of the Department of Computer Science at University of Chile, and Head of the Millenium Nucleus Center for Web Research, the largest Chilean project in Computer Science research. 相似文献

9.

Slicing the metric space to provide quick indexing of complex data in the main memory

Caio César Mori Carélo Ives Renê Venturini Pola Ricardo Rodrigues Ciferri Agma Juci Machado Traina Caetano Traina Jr Cristina Dutra de Aguiar Ciferri 《Information Systems》2011

Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper, we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. 相似文献

10.

Distributed computation of the knn graph for large high-dimensional point sets

Erion Plaku Lydia E. Kavraki 《Journal of Parallel and Distributed Computing》2007

High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over 100 processors and indicate that similar speedup can be obtained with several hundred processors. 相似文献

11.

Randomization for robot tasks: Using dynamic programming in the space of knowledge states

Michael Erdmann 《Algorithmica》1993,10(2-4):248-291

This paper explores the use of randomization as a primitive action in the solution of robot tasks. An example of randomization is the strategy of shaking a bin containing a part in order to orient the part in a desired stable state with some high probability. Further examples include tapping, vibrating, twirling, and random search. For instance, it is sometimes beneficial for a system to execute random motions purposefully when the precise motions required to perform an operation are unknown, as when they lie below the available sensor resolution.The purpose of this paper is to provide a theoretical framework for the planning and execution of randomized strategies for robot tasks. This framework is built on the standard backchaining approach of dynamic programming. Specifically, a randomizing planner backchains from the goal in a state space whose states describe the knowledge available to the system at run-time. By choosing random actions in a principled manner at run-time, a system can sometimes obtain a probabilistic strategy for accomplishing a task even when no guaranteed strategy exists for accomplishing that task. In other cases, the system may be able to obtain a speedup over an existing guaranteed strategy.The main result of this paper consists of two examples. One example shows that randomization can sometimes speed up task completion from exponential time to polynomial time. The other example shows that such a speedup is not always possible.The author is now with the School of Computer Science and the Robotics Institute at Carnegie-Mellon University. The work reported here was performed while at the MIT Artificial Intelligence Laboratory. This work was supported in part by the Office of Naval Research under the University Research Initiative Program through Contract N00014-86-K-0685, by an NSF Presidential Young Investigator Award to Tomás Lozano-Pérez, by a fellowship from NASA's Jet Propulsion Laboratory, and by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research Contract N00014-85-K-0124. Additional support at CMU was provided under NSF grant IRI-9010686. 相似文献

12.

Rethinking the design of real-coded evolutionary algorithms: Making discrete choices in continuous search domains

William E. Hart 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2005,9(4):225-235

Although real-coded evolutionary algorithms (EAs) have been applied to optimization problems for over thirty years, the convergence properties of these methods remain poorly understood. We discuss the use of discrete random variables to perform search in real-valued EAs. Although most real-valued EAs perform mutation with continuous random variables, we argue that EAs using discrete random variables for mutation can be much easier to analyze. In particular, we present and analyze two simple EAs that make discrete choices of mutation steps. 相似文献

13.

Path planning in construction sites: performance evaluation of the Dijkstra, A, and GA search algorithms 总被引：1，自引：0，他引：1

A. R. Soltani H. Tawfik J. Y. Goulermas T. Fernando 《Advanced Engineering Informatics》2002,16(4)

This paper presents the application of path planning in construction sites according to multiple objectives. It quantitatively evaluates the performance of three optimisation algorithms namely: Dijkstra, A^*, and Genetic algorithms that are used to find multi-criteria paths in construction sites based on transportation and safety-related cost. During a construction project, site planners need to select paths for site operatives and vehicles, which are characterised by short distance, low risks and high visibility. These path evaluation criteria are combined using a multi-objective approach. The criteria can be optimised to present site planners with the shortest path, the safest path, the most visible path or a path that reflects a combination of short distance, low risk and high visibility. The accuracy of the path solutions and the time complexities of the optimisation algorithms are compared and critically analysed. 相似文献