共查询到10条相似文献,搜索用时 187 毫秒
1.
Similarity search implemented via k-nearest neighbor—
k-NN queries on multidimensional indices is an extremely useful paradigm for content-based image retrieval. As the dimensionality
of feature vectors increases the curse of dimensionality sets in, i.e., the performance of k-NN search of disk-resident indices in the R-tree family degrades rapidly due to the overlap in index pages in high dimensions.
This problem is dealt with in this study by utilizing the double filtering effect of clustering and indexing. The clustering
algorithm ensures that the largest cluster fits into main memory and that only clusters closest to a query point need to be
searched and hence loaded into main memory. We organize the data in each cluster according to the ordered-partition—OP-tree
main memory resident index, which is not prone to the curse of dimensionality and highly efficient for processing k-NN queries. We serialize an OP-tree by writing its dynamically allocated nodes into contiguous memory locations, optimize
its parameters, and make it persistent by writing it to disk. The time to read and write clusters constituting an OP-tree
with a single sequential access to disk benefits from higher data transfer rates of modern disk drives. The performance of
the index is further improved by applying the Karhunen–Loève transformation—KLT to the dataset, since this results in a more efficient computation of distances for k-NN queries. We compare OP-trees and sequential scans with and without a KL-transformation and with and without using a shortcut
method in calculating Euclidean distances. A comparison against the OMNI-sequential scan is also reported. We finally compare
a clustered and persistent version of the OP-tree against a clustered version of the SR-tree and the VA-file method. CPU time
is measured and elapsed time is estimated in this study. It is observed that the OP-tree index outperforms the other two methods
and that the improvement increases with the number of dimensions.
相似文献
Lijuan ZhangEmail: |
2.
Hanxiong Chen Jianquan Liu Kazutaka Furuse Jeffrey Xu Yu Nobuo Ohbo 《Knowledge and Information Systems》2011,27(2):165-192
Similarity search is important in information retrieval applications where objects are usually represented as vectors of high
dimensionality. This leads to the increasing need for supporting the indexing of high-dimensional data. On the other hand,
indexing structures based on space partitioning are powerless because of the well-known “curse of dimensionality”. Linear
scan of the data with approximation is more efficient in the high-dimensional similarity search. However, approaches so far
have concentrated on reducing I/O, and ignored the computation cost. For an expensive distance function such as L
p
norm with fractional p, the computation cost becomes the bottleneck. We propose a new technique to address expensive distance functions by “indexing
the function” by pre-computing some key values of the function once. Then, the values are used to develop the upper/lower
bounds of the distance between a data vector and the query vector. The technique is extremely efficient since it avoids most
of the distance function computations; moreover, it does not involve any extra secondary storage because no index is constructed
and stored. The efficiency is confirmed by cost analysis, as well as experiments on synthetic and real data. 相似文献
3.
Finding the nearest k objects to a query object is a fundamental operation for many data mining algorithms. With the recent interest in privacy,
it is not surprising that there is strong interest in k-NN queries to enable clustering, classification and outlier-detection tasks. However, previous approaches to privacy-preserving
k-NN have been costly and can only be realistically applied to small data sets. In this paper, we provide efficient solutions
for k-NN queries for vertically partitioned data. We provide the first solution for the L
∞ (or Chessboard) metric as well as detailed privacy-preserving computation of all other Minkowski metrics. We enable privacy-preserving
L
∞ by providing a practical approach to the Yao’s millionaires problem with more than two parties. This is based on a pragmatic
and implementable solution to Yao’s millionaires problem with shares. We also provide privacy-preserving algorithms for combinations
of local metrics into a global metric that handles the large dimensionality and diversity of attributes common in vertically
partitioned data. To manage very large data sets, we provide a privacy-preserving SASH (a very successful data structure for associative queries in high dimensions). Besides providing a theoretical analysis,
we illustrate the efficiency of our approach with an empirical evaluation. 相似文献
4.
Chih-Hui Chiu Ya-Fu Peng You-Wei Lin 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(10):2029-2040
In this study, a robust intelligent backstepping tracking control (RIBTC) system combined with adaptive output recurrent cerebellar
model articulation controller (AORCMAC) and H
∞ control technique is proposed for wheeled inverted pendulums (WIPs) with unknown system dynamics and external disturbance.
The AORCMAC is a nonlinear adaptive system with simple computation, good generalization capability and fast learning property.
Therefore, the WIP can stand upright when it moves to a designed position stably. In the proposed control system, an AORCMAC
is used to copy an ideal backstepping control, and a robust H
∞ controller is designed to attenuate the effect of the residual approximation errors and external disturbances with desired
attenuation level. Moreover, the all adaptation laws of the RIBTC system are derived based on the Lyapunov stability analysis,
the Taylor linearization technique and H
∞ control theory, so that the stability of the closed-loop system and H
∞ tracking performance can be guaranteed. The proposed control scheme is practical and efficacious for WIPs by simulation results. 相似文献
5.
Jing Yang Gabriel Pui Cheong Fung Wei Lu Xiaofang Zhou Hong Chen Xiaoyong Du 《World Wide Web》2012,15(1):33-60
In a typical Web recommendation system, objects are often described by many attributes. It also needs to serve many users
with a diversified range of preferences. In other words, it must be capable to efficiently support high dimensional preference
queries that allow the user to explore the data space effectively without imposing specific preference weightings for each
dimension. The skyline query, which can produce a set of objects guaranteed to contain all top ranked objects for any linear
attribute preference combination, has been proposed to support this type of recommendation applications. However, it suffers
from the problem known as ‘dimensionality curse’ as the size of skyline query result set can grow exponentially with the number
of dimensions. Therefore, when the dimensionality is high, a large percentage of objects can become skyline points. This problem
makes such a recommendation system less usable for users. In this paper, we propose a stronger type of skyline query, called
core skyline query, that adopts a new quality measure called vertical dominance to return only an interesting subset of the traditional skyline points. An efficient query processing method is proposed to find core skyline points using
a novel indexing structure called Linked Multiple B’-trees (LMB). Our approach can find such superior skyline points progressively without the need of computing the entire set of skyline
points first. 相似文献
6.
M. French Cs. Szepesvári E. Rogers 《Mathematics of Control, Signals, and Systems (MCSS)》2002,15(2):145-176
We consider the adaptive tracking problem for a chain of integrators, where the uncertainty is static and functional. The
uncertainty is specified by L
2/L
∞ or weighted L
2/L
∞ norm bounds. We analyse a standard Lyapunov-based adaptive design which utilises a function approximator to induce a parametric
uncertainty, on which the adaptive design is completed. Performance is measured by a modified LQ cost functional, penalising
both the tracking error transient and the control effort. With such a cost functional, it is shown that a standard control
design has divergent performance when the resolution of a “mono-resolution” approximator is increased. The class of “mono-resolution”
approximators includes models popular in applications. A general construction of a class of approximators and their associated
controllers which have a uniformly bounded performance independent of the resolution of the approximator is given.
Date received: 20 April 1999. Date revised: 29 October 2001. 相似文献
7.
O (h
n
−1
n
d
−1) instead of O(h
n
−d
) grid points and unknowns are involved. Here d denotes the dimension of the feature space and h
n
= 2
−n
gives the mesh size. To be precise, we suggest to use the sparse grid combination technique [42] where the classification
problem is discretized and solved on a certain sequence of conventional grids with uniform mesh sizes in each coordinate direction.
The sparse grid solution is then obtained from the solutions on these different grids by linear combination. In contrast to
other sparse grid techniques, the combination method is simpler to use and can be parallelized in a natural and straightforward
way. We describe the sparse grid combination technique for the classification problem in terms of the regularization network
approach. We then give implementational details and discuss the complexity of the algorithm. It turns out that the method
scales only linearly with the number of instances, i.e. the amount of data to be classified. Finally we report on the quality
of the classifier built by our new method. Here we consider standard test problems from the UCI repository and problems with
huge synthetical data sets in up to 9 dimensions. It turns out that our new method achieves correctness rates which are competitive
to that of the best existing methods.
Received April 25, 2001 相似文献
8.
Martin Dietzfelbinger Jonathan E. Rowe Ingo Wegener Philipp Woelfel 《Algorithmica》2011,59(3):301-322
We investigate the effects of precision on the efficiency of various local search algorithms on 1-D unimodal functions. We
present a (1+1)-EA with adaptive step size which finds the optimum in O(log n) steps, where n is the number of points used. We then consider binary (base-2) and reflected Gray code representations with single bit mutations.
The standard binary method does not guarantee locating the optimum, whereas using the reflected Gray code does so in Θ((log n)2) steps. A(1+1)-EA with a fixed mutation probability distribution is then presented which also runs in O((log n)2). Moreover, a recent result shows that this is optimal (up to some constant scaling factor), in that there exist unimodal
functions for which a lower bound of Ω((log n)2) holds regardless of the choice of mutation distribution. For continuous multimodal functions, the algorithm also locates
the global optimum in O((log n)2). Finally, we show that it is not possible for a black box algorithm to efficiently optimise unimodal functions for two or
more dimensions (in terms of the precision used). 相似文献
9.
Jiyuan An Hanxiong Chen Kazutaka Furuse Nobuo Ohbo 《Knowledge and Information Systems》2005,7(3):337-357
Similarity search is important in information-retrieval applications where objects are usually represented as vectors of high dimensionality. This paper proposes a new dimensionality-reduction technique and an indexing mechanism for high-dimensional datasets. The proposed technique reduces the dimensions for which coordinates are less than a critical value with respect to each data vector. This flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high-dimensional datasets that are in skewed distributions in all coordinates. To apply the proposed technique to information retrieval, a CVA file (compact VA file), which is a revised version of the VA file is developed. By using a CVA file, the size of index files is reduced further, while the tightness of the index bounds is held maximally. The effectiveness is confirmed by synthetic and real data. 相似文献
10.
Many estimation methods for independent component analysis (ICA) requires prewhitening of observed signals. This paper proposes
a new method of prewhitening named β-prewhitening by minimizing the empirical β-divergence over the space of all the Gaussian distributions. The value of the tuning parameter β plays the key role in the performance of our current proposal. An attempt is made to propose an adaptive selection procedure
for the tuning parameter β for this algorithm. At last, a measure of performance index is proposed for assessing prewhitening procedures. Simulation
results show that β-prewhitening efficiently improves the performance over the standard prewhitening when outliers exist; it keeps equal performance
otherwise. Performance of the proposed method is compared with the standard prewhitening by both FastICA and our proposed
performance index. 相似文献