期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Persistent clustered main memory index for accelerating <Emphasis Type="Italic">k</Emphasis>-NN queries on high dimensional datasets

Alexander Thomasian Lijuan Zhang 《Multimedia Tools and Applications》2008,38(2):253-270

Similarity search implemented via k-nearest neighbor— k-NN queries on multidimensional indices is an extremely useful paradigm for content-based image retrieval. As the dimensionality of feature vectors increases the curse of dimensionality sets in, i.e., the performance of k-NN search of disk-resident indices in the R-tree family degrades rapidly due to the overlap in index pages in high dimensions. This problem is dealt with in this study by utilizing the double filtering effect of clustering and indexing. The clustering algorithm ensures that the largest cluster fits into main memory and that only clusters closest to a query point need to be searched and hence loaded into main memory. We organize the data in each cluster according to the ordered-partition—OP-tree main memory resident index, which is not prone to the curse of dimensionality and highly efficient for processing k-NN queries. We serialize an OP-tree by writing its dynamically allocated nodes into contiguous memory locations, optimize its parameters, and make it persistent by writing it to disk. The time to read and write clusters constituting an OP-tree with a single sequential access to disk benefits from higher data transfer rates of modern disk drives. The performance of the index is further improved by applying the Karhunen–Loève transformation—KLT to the dataset, since this results in a more efficient computation of distances for k-NN queries. We compare OP-trees and sequential scans with and without a KL-transformation and with and without using a shortcut method in calculating Euclidean distances. A comparison against the OMNI-sequential scan is also reported. We finally compare a clustered and persistent version of the OP-tree against a clustered version of the SR-tree and the VA-file method. CPU time is measured and elapsed time is estimated in this study. It is observed that the OP-tree index outperforms the other two methods and that the improvement increases with the number of dimensions.

Lijuan ZhangEmail:

相似文献

2.

Indexing expensive functions for efficient multi-dimensional similarity search 总被引：2，自引：2，他引：0

Hanxiong Chen Jianquan Liu Kazutaka Furuse Jeffrey Xu Yu Nobuo Ohbo 《Knowledge and Information Systems》2011,27(2):165-192

Similarity search is important in information retrieval applications where objects are usually represented as vectors of high dimensionality. This leads to the increasing need for supporting the indexing of high-dimensional data. On the other hand, indexing structures based on space partitioning are powerless because of the well-known “curse of dimensionality”. Linear scan of the data with approximation is more efficient in the high-dimensional similarity search. However, approaches so far have concentrated on reducing I/O, and ignored the computation cost. For an expensive distance function such as L _p norm with fractional p, the computation cost becomes the bottleneck. We propose a new technique to address expensive distance functions by “indexing the function” by pre-computing some key values of the function once. Then, the values are used to develop the upper/lower bounds of the distance between a data vector and the query vector. The technique is extremely efficient since it avoids most of the distance function computations; moreover, it does not involve any extra secondary storage because no index is constructed and stored. The efficiency is confirmed by cost analysis, as well as experiments on synthetic and real data. 相似文献

3.

Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets

Artak Amirbekyan Vladimir Estivill-Castro 《Knowledge and Information Systems》2009,21(3):327-363

Finding the nearest k objects to a query object is a fundamental operation for many data mining algorithms. With the recent interest in privacy, it is not surprising that there is strong interest in k-NN queries to enable clustering, classification and outlier-detection tasks. However, previous approaches to privacy-preserving k-NN have been costly and can only be realistically applied to small data sets. In this paper, we provide efficient solutions for k-NN queries for vertically partitioned data. We provide the first solution for the L _∞ (or Chessboard) metric as well as detailed privacy-preserving computation of all other Minkowski metrics. We enable privacy-preserving L _∞ by providing a practical approach to the Yao’s millionaires problem with more than two parties. This is based on a pragmatic and implementable solution to Yao’s millionaires problem with shares. We also provide privacy-preserving algorithms for combinations of local metrics into a global metric that handles the large dimensionality and diversity of attributes common in vertically partitioned data. To manage very large data sets, we provide a privacy-preserving SASH (a very successful data structure for associative queries in high dimensions). Besides providing a theoretical analysis, we illustrate the efficiency of our approach with an empirical evaluation. 相似文献

4.

Robust intelligent backstepping tracking control for wheeled inverted pendulum

Chih-Hui Chiu Ya-Fu Peng You-Wei Lin 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(10):2029-2040

In this study, a robust intelligent backstepping tracking control (RIBTC) system combined with adaptive output recurrent cerebellar model articulation controller (AORCMAC) and H ^∞ control technique is proposed for wheeled inverted pendulums (WIPs) with unknown system dynamics and external disturbance. The AORCMAC is a nonlinear adaptive system with simple computation, good generalization capability and fast learning property. Therefore, the WIP can stand upright when it moves to a designed position stably. In the proposed control system, an AORCMAC is used to copy an ideal backstepping control, and a robust H ^∞ controller is designed to attenuate the effect of the residual approximation errors and external disturbances with desired attenuation level. Moreover, the all adaptation laws of the RIBTC system are derived based on the Lyapunov stability analysis, the Taylor linearization technique and H ^∞ control theory, so that the stability of the closed-loop system and H ^∞ tracking performance can be guaranteed. The proposed control scheme is practical and efficacious for WIPs by simulation results. 相似文献

5.

Finding superior skyline points for multidimensional recommendation applications

Jing Yang Gabriel Pui Cheong Fung Wei Lu Xiaofang Zhou Hong Chen Xiaoyong Du 《World Wide Web》2012,15(1):33-60

In a typical Web recommendation system, objects are often described by many attributes. It also needs to serve many users with a diversified range of preferences. In other words, it must be capable to efficiently support high dimensional preference queries that allow the user to explore the data space effectively without imposing specific preference weightings for each dimension. The skyline query, which can produce a set of objects guaranteed to contain all top ranked objects for any linear attribute preference combination, has been proposed to support this type of recommendation applications. However, it suffers from the problem known as ‘dimensionality curse’ as the size of skyline query result set can grow exponentially with the number of dimensions. Therefore, when the dimensionality is high, a large percentage of objects can become skyline points. This problem makes such a recommendation system less usable for users. In this paper, we propose a stronger type of skyline query, called core skyline query, that adopts a new quality measure called vertical dominance to return only an interesting subset of the traditional skyline points. An efficient query processing method is proposed to find core skyline points using a novel indexing structure called Linked Multiple B’-trees (LMB). Our approach can find such superior skyline points progressively without the need of computing the entire set of skyline points first. 相似文献

6.

An Asymptotic Scaling Analysis of LQ Performance for an Approximate Adaptive Control Design

M. French Cs. Szepesvári E. Rogers 《Mathematics of Control, Signals, and Systems (MCSS)》2002,15(2):145-176

We consider the adaptive tracking problem for a chain of integrators, where the uncertainty is static and functional. The uncertainty is specified by L ²/L ^∞ or weighted L ²/L ^∞ norm bounds. We analyse a standard Lyapunov-based adaptive design which utilises a function approximator to induce a parametric uncertainty, on which the adaptive design is completed. Performance is measured by a modified LQ cost functional, penalising both the tracking error transient and the control effort. With such a cost functional, it is shown that a standard control design has divergent performance when the resolution of a “mono-resolution” approximator is increased. The class of “mono-resolution” approximators includes models popular in applications. A general construction of a class of approximators and their associated controllers which have a uniformly bounded performance independent of the resolution of the approximator is given. Date received: 20 April 1999. Date revised: 29 October 2001. 相似文献

7.

Data Mining with Sparse Grids

J. Garcke M. Griebel M. Thess 《Computing》2001,67(3):225-253

O (h _n ⁻¹ n ^d ⁻¹) instead of O(h _n ^−d) grid points and unknowns are involved. Here d denotes the dimension of the feature space and h _n = 2⁻ⁿ gives the mesh size. To be precise, we suggest to use the sparse grid combination technique [42] where the classification problem is discretized and solved on a certain sequence of conventional grids with uniform mesh sizes in each coordinate direction. The sparse grid solution is then obtained from the solutions on these different grids by linear combination. In contrast to other sparse grid techniques, the combination method is simpler to use and can be parallelized in a natural and straightforward way. We describe the sparse grid combination technique for the classification problem in terms of the regularization network approach. We then give implementational details and discuss the complexity of the algorithm. It turns out that the method scales only linearly with the number of instances, i.e. the amount of data to be classified. Finally we report on the quality of the classifier built by our new method. Here we consider standard test problems from the UCI repository and problems with huge synthetical data sets in up to 9 dimensions. It turns out that our new method achieves correctness rates which are competitive to that of the best existing methods. Received April 25, 2001 相似文献

8.

Precision,Local Search and Unimodal Functions

Martin Dietzfelbinger Jonathan E. Rowe Ingo Wegener Philipp Woelfel 《Algorithmica》2011,59(3):301-322

We investigate the effects of precision on the efficiency of various local search algorithms on 1-D unimodal functions. We present a (1+1)-EA with adaptive step size which finds the optimum in O(log n) steps, where n is the number of points used. We then consider binary (base-2) and reflected Gray code representations with single bit mutations. The standard binary method does not guarantee locating the optimum, whereas using the reflected Gray code does so in Θ((log n)²) steps. A(1+1)-EA with a fixed mutation probability distribution is then presented which also runs in O((log n)²). Moreover, a recent result shows that this is optimal (up to some constant scaling factor), in that there exist unimodal functions for which a lower bound of Ω((log n)²) holds regardless of the choice of mutation distribution. For continuous multimodal functions, the algorithm also locates the global optimum in O((log n)²). Finally, we show that it is not possible for a black box algorithm to efficiently optimise unimodal functions for two or more dimensions (in terms of the precision used). 相似文献

9.

CVA file: an index structure for high-dimensional datasets

Jiyuan An Hanxiong Chen Kazutaka Furuse Nobuo Ohbo 《Knowledge and Information Systems》2005,7(3):337-357

Similarity search is important in information-retrieval applications where objects are usually represented as vectors of high dimensionality. This paper proposes a new dimensionality-reduction technique and an indexing mechanism for high-dimensional datasets. The proposed technique reduces the dimensions for which coordinates are less than a critical value with respect to each data vector. This flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high-dimensional datasets that are in skewed distributions in all coordinates. To apply the proposed technique to information retrieval, a CVA file (compact VA file), which is a revised version of the VA file is developed. By using a CVA file, the size of index files is reduced further, while the tightness of the index bounds is held maximally. The effectiveness is confirmed by synthetic and real data. 相似文献

10.

Robust Prewhitening for ICA by Minimizing <Emphasis Type="Italic">β</Emphasis>-Divergence and Its Application to FastICA

Md Nurul Haque Mollah Shinto Eguchi Mihoko Minami 《Neural Processing Letters》2007,25(2):91-110

Many estimation methods for independent component analysis (ICA) requires prewhitening of observed signals. This paper proposes a new method of prewhitening named β-prewhitening by minimizing the empirical β-divergence over the space of all the Gaussian distributions. The value of the tuning parameter β plays the key role in the performance of our current proposal. An attempt is made to propose an adaptive selection procedure for the tuning parameter β for this algorithm. At last, a measure of performance index is proposed for assessing prewhitening procedures. Simulation results show that β-prewhitening efficiently improves the performance over the standard prewhitening when outliers exist; it keeps equal performance otherwise. Performance of the proposed method is compared with the standard prewhitening by both FastICA and our proposed performance index. 相似文献