首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Reverse Nearest Neighbor Search in Metric Spaces   总被引:7,自引:0,他引:7  
Given a set {cal D} of objects, a reverse nearest neighbor (RNN) query returns the objects o in {cal D} such that o is closer to a query object q than to any other object in {cal D}, according to a certain similarity metric. The existing RNN solutions are not sufficient because they either 1) rely on precomputed information that is expensive to maintain in the presence of updates or 2) are applicable only when the data consists of "Euclidean objects” and similarity is measured using the L_2 norm. In this paper, we present the first algorithms for efficient RNN search in generic metric spaces. Our techniques require no detailed representations of objects, and can be applied as long as their mutual distances can be computed and the distance metric satisfies the triangle inequality. We confirm the effectiveness of the proposed methods with extensive experiments.  相似文献   

2.
Manifold learning is a well-known dimensionality reduction scheme which can detect intrinsic low-dimensional structures in non-linear high-dimensional data. It has been recently widely employed in data analysis, pattern recognition, and machine learning applications. Isomap is one of the most promising manifold learning algorithms, which extends metric multi-dimensional scaling by using approximate geodesic distance. However, when Isomap is conducted on real-world applications, it may have some difficulties in dealing with noisy data. Although many applications represent a special sample by multiple feature vectors in different spaces, Isomap employs samples in unique observation space. In this paper, two extended versions of Isomap to multiple feature spaces problem, namely fusion of dissimilarities and fusion of geodesic distances, are presented. We have employed the advantages of several spaces and depicted the Euclidean distance on learned manifold that is more compatible to the semantic distance. To show the effectiveness and validity of the proposed method, some experiments have been carried out on the application of shape analysis on MPEG7 CE Part B and Fish data sets.  相似文献   

3.
2D-Shape Analysis Using Conformal Mapping   总被引:1,自引:0,他引:1  
The study of 2D shapes and their similarities is a central problem in the field of vision. It arises in particular from the task of classifying and recognizing objects from their observed silhouette. Defining natural distances between 2D shapes creates a metric space of shapes, whose mathematical structure is inherently relevant to the classification task. One intriguing metric space comes from using conformal mappings of 2D shapes into each other, via the theory of Teichmüller spaces. In this space every simple closed curve in the plane (a “shape”) is represented by a ‘fingerprint’ which is a diffeomorphism of the unit circle to itself (a differentiable and invertible, periodic function). More precisely, every shape defines to a unique equivalence class of such diffeomorphisms up to right multiplication by a Möbius map. The fingerprint does not change if the shape is varied by translations and scaling and any such equivalence class comes from some shape. This coset space, equipped with the infinitesimal Weil-Petersson (WP) Riemannian norm is a metric space. In this space, the shortest path between each two shapes is unique, and is given by a geodesic connecting them. Their distance from each other is given by integrating the WP-norm along that geodesic. In this paper we concentrate on solving the “welding” problem of “sewing” together conformally the interior and exterior of the unit circle, glued on the unit circle by a given diffeomorphism, to obtain the unique 2D shape associated with this diffeomorphism. This will allow us to go back and forth between 2D shapes and their representing diffeomorphisms in this “space of shapes”. We then present an efficient method for computing the unique shortest path, the geodesic of shape morphing between each two end-point shapes. The group of diffeomorphisms of S1 acts as a group of isometries on the space of shapes and we show how this can be used to define shape transformations, like for instance ‘adding a protruding limb’ to any shape.  相似文献   

4.
This paper proposes a new unsupervised classification approach for automatic analysis of polarimetric synthetic aperture radar (SAR) image. Classification of the information in multi-dimensional polarimetric SAR data space by dynamic clustering is addressed as an optimization problem and two recently proposed techniques based on particle swarm optimization (PSO) are applied to find optimal (number of) clusters in a given input data space, distance metric and a proper validity index function. The first technique, so-called multi-dimensional (MD) PSO, re-forms the native structure of swarm particles in such a way that they can make inter-dimensional passes with a dedicated dimensional PSO process. Therefore, in a multi-dimensional search space where the optimum dimension is unknown, swarm particles can seek both positional and dimensional optima. Nevertheless, MD PSO is still susceptible to premature convergence due to lack of divergence. To address this problem, fractional global best formation (FGBF) technique is then presented, which basically collects all promising dimensional components and fractionally creates an artificial global-best particle (aGB) that has the potential to be a better “guide” than the PSO’s native gbest particle. In this study, the proposed dynamic clustering process based on MD-PSO and FGBF techniques is applied to automatically classify the color-coded representations of the polarimetric SAR information (i.e. the type of scattering, backscattering power) extracted by means of the Pauli or the Cloude–Pottier decomposition algorithms. The performance of the proposed method is evaluated based on fully polarimetric SAR data of the San Francisco Bay acquired by the NASA/Jet Propulsion Laboratory Airborne SAR (AIRSAR) at L-band. The proposed unsupervised technique determines the number of classes within polarimetric SAR image for optimal classification performance while preserving spatial resolution and textural information in the classified results. Additionally, it is possible to further apply the proposed dynamic clustering technique to higher dimensional (N-D) feature spaces of fully polarimetric SAR data.  相似文献   

5.
3D model hashing can be very useful for the authentication, indexing, copy detection, and watermarking of 3D content, in a manner similar to image hashing. 3D models can be easily modified by graphics editing while preserving the geometric shape, and the modeling representations are not regular, unlike an image with a fixed pixel array. A 3D model must be authenticated, indexed, or watermarked while being robust against graphics attacks and irregular representations. For these purposes, this paper presents a 3D mesh model hashing based on object feature vectors with the robustness, security, and uniqueness. The proposed hashing groups the distances from feature objects with the highest surface area in a 3D model that consists of a number of objects, permutes indices of groups in feature objects, and generates a binary hash through the binarization of feature values that are calculated by two combinations of group values and a random key. The robustness of a hash can be improved by group coefficients that are obtained from the distribution of vertex distances in feature objects, and the security and uniqueness can be improved by both the permutation of groups, feature vectors, and random key. Experimental results verified that the proposed hashing is robust against various perceptual geometrical and topological attacks and has the security and uniqueness of a hash.  相似文献   

6.
An Index Structure for Data Mining and Clustering   总被引:2,自引:0,他引:2  
In this paper we present an index structure, called MetricMap, that takes a set of objects and a distance metric and then maps those objects to a k-dimensional space in such a way that the distances among objects are approximately preserved. The index structure is a useful tool for clustering and visualization in data-intensive applications, because it replaces expensive distance calculations by sum-of-square calculations. This can make clustering in large databases with expensive distance metrics practical. We compare the index structure with another data mining index structure, FastMap, recently proposed by Faloutsos and Lin, according to two criteria: relative error and clustering accuracy. For relative error, we show that (i) FastMap gives a lower relative error than MetricMap for Euclidean distances, (ii) MetricMap gives a lower relative error than FastMap for non-Euclidean distances (i.e., general distance metrics), and (iii) combining the two reduces the error yet further. A similar result is obtained when comparing the accuracy of clustering. These results hold for different data sizes. The main qualitative conclusion is that these two index structures capture complementary information about distance metrics and therefore can be used together to great benefit. The net effect is that multi-day computations can be done in minutes. Received February 1998 / Revised July 1999 / Accepted in revised form September 1999  相似文献   

7.
Image and video classification tasks often suffer from the problem of high-dimensional feature space. How to discover the meaningful, low-dimensional representations of such high-order, high-dimensional observations remains a fundamental challenge. In this paper, we present a unified framework for tensor based dimensionality reduction including a new tensor distance (TD) metric and a novel multilinear globality preserving embedding (MGPE) strategy. Different with the traditional Euclidean distance, which is constrained by orthogonality assumption, TD measures the distance between data points by considering the relationships among different coordinates of high-order data. To preserve the natural tensor structure in low-dimensional space, MGPE directly works on the high-order form of input data and employs an iterative strategy to learn the transformation matrices. To provide faithful global representation for datasets, MGPE intends to preserve the distances between all pairs of data points. According to the proposed TD metric and MGPE strategy, we further derive two algorithms dubbed tensor distance based multilinear multidimensional scaling (TD-MMDS) and tensor distance based multilinear isometric embedding (TD-MIE). TD-MMDS finds the transformation matrices by keeping the TDs between all pairs of input data in the embedded space, while TD-MIE intends to preserve all pairwise distances calculated according to TDs along shortest paths in the neighborhood graph. By integrating tensor distance into tensor based embedding, TD-MMDS and TD-MIE perform tensor based dimensionality reduction through the whole learning procedure and achieve obvious performance improvement on various standard datasets.  相似文献   

8.
大多数超椭球聚类(hyper-ellipsoidal clustering,HEC)算法都使用马氏距离作为距离度量,已经证明在该条件下划分聚类的代价函数是常量,导致HEC无法实现椭球聚类.本文说明了使用改进高斯核的HEC算法可以解释为寻找体积和密度都紧凑的椭球分簇,并提出了一种实用HEC算法-K-HEC,该算法能够有效地处理椭球形、不同大小和不同密度的分簇.为实现更复杂形状数据集的聚类,使用定义在核特征空间的椭球来改进K-HEC算法的能力,提出了EK-HEC算法.仿真实验证明所提出算法在聚类结果和性能上均优于K-means算法、模糊C-means算法、GMM-EM算法和基于最小体积椭球(minimum-volume ellipsoids,MVE)的马氏HEC算法,从而证明了本文算法的可行性和有效性.  相似文献   

9.
A fundamental goal of unsupervised feature selection is denoising, which aims to identify and reduce noisy features that are not discriminative. Due to the lack of information about real classes, denoising is a challenging task. The noisy features can disturb the reasonable distance metric and result in unreasonable feature spaces, i.e., the feature spaces in which common clustering algorithms cannot effectively find real classes. To overcome the problem, we make a primary observation that the relevance of features is intrinsic and independent of any metric scaling on the feature space. This observation implies that feature selection should be invariant, at least to some extent, with respect to metric scaling. In this paper, we clarify the necessity of considering the metric invariance in unsupervised feature selection and propose a novel model incorporating metric invariance. Our proposed method is motivated by the following observations: if the statistic that guides the unsupervised feature selection process is invariant with respect to possible metric scaling, the solution of this model will also be invariant. Hence, if a metric-invariant model can distinguish discriminative features from noisy ones in a reasonable feature space, it will also work on the unreasonable counterpart transformed from the reasonable one by metric scaling. A theoretical justification of the metric invariance of our proposed model is given and the empirical evaluation demonstrates its promising performance.  相似文献   

10.
Properties of embedding methods for similarity searching in metric spaces   总被引:2,自引:0,他引:2  
Complex data types-such as images, documents, DNA sequences, etc.-are becoming increasingly important in modern database applications. A typical query in many of these applications seeks to find objects that are similar to some target object, where (dis)similarity is defined by some distance function. Often, the cost of evaluating the distance between two objects is very high. Thus, the number of distance evaluations should be kept at a minimum, while (ideally) maintaining the quality of the result. One way to approach this goal is to embed the data objects in a vector space so that the distances of the embedded objects approximates the actual distances. Thus, queries can be performed (for the most part) on the embedded objects. We are especially interested in examining the issue of whether or not the embedding methods will ensure that no relevant objects are left out. Particular attention is paid to the SparseMap, FastMap, and MetricMap embedding methods. SparseMap is a variant of Lipschitz embeddings, while FastMap and MetricMap are inspired by dimension reduction methods for Euclidean spaces. We show that, in general, none of these embedding methods guarantee that queries on the embedded objects have no false dismissals, while also demonstrating the limited cases in which the guarantee does hold. Moreover, we describe a variant of SparseMap that allows queries with no false dismissals. In addition, we show that with FastMap and MetricMap, the distances of the embedded objects can be much greater than the actual distances. This makes it impossible (or at least impractical) to modify FastMap and MetricMap to guarantee no false dismissals.  相似文献   

11.
12.
Multi-label classification aims to assign a set of proper labels for each instance, where distance metric learning can help improve the generalization ability of instance-based multi-label classification models. Existing multi-label metric learning techniques work by utilizing pairwise constraints to enforce that examples with similar label assignments should have close distance in the embedded feature space. In this paper, a novel distance metric learning approach for multi-label classification is proposed by modeling structural interactions between instance space and label space. On one hand, compositional distance metric is employed which adopts the representation of a weighted sum of rank-1 PSD matrices based on component bases. On the other hand, compositional weights are optimized by exploiting triplet similarity constraints derived from both instance and label spaces. Due to the compositional nature of employed distance metric, the resulting problem admits quadratic programming formulation with linear optimization complexity w.r.t. the number of training examples.We also derive the generalization bound for the proposed approach based on algorithmic robustness analysis of the compositional metric. Extensive experiments on sixteen benchmark data sets clearly validate the usefulness of compositional metric in yielding effective distance metric for multi-label classification.  相似文献   

13.
In pattern recognition field, objects are usually represented by multiple features (multimodal features). For example, to characterize a natural scene image, it is essential to extract a set of visual features representing its color, texture, and shape information. However, integrating multimodal features for recognition is challenging because: (1) each feature has its specific statistical property and physical interpretation, (2) huge number of features may result in the curse of dimensionality (When data dimension is high, the distances between pairwise objects in the feature space become increasingly similar due to the central limit theory. This phenomenon influences negatively to the recognition performance), and (3) some features may be unavailable. To solve these problems, a new multimodal feature selection algorithm, termed Grassmann manifold feature selection (GMFS), is proposed. In particular, by defining a clustering criterion, the multimodal features are transformed into a matrix, and further treated as a point on the Grassmann manifold in Hamm and Lee (Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th international conference on machine learning (ICML), pp. 376–383, Helsinki, Finland [2008]). To deal with the unavailable features, L2-Hausdorff distance, a metric between different-sized matrices, is computed and the kernel is obtained accordingly. Based on the kernel, we propose supervised/unsupervised feature selection algorithms to achieve a physically meaningful embedding of the multimodal features. Experimental results on eight data sets validate the effectiveness the proposed approach.  相似文献   

14.
《Image and vision computing》2002,20(9-10):701-707
We address the problem of using scale-orientation pixel signatures to characterise local structure in X-ray mammograms, though the method we report is of general application. When signatures are treated as vectors for statistical analysis, the Euclidean metric is not well behaved. We have previously described a Best Partial Match (BPM) metric that measures signature similarity more naturally, but at high computational cost. We present a method for transforming signatures into a new space in which Euclidean distance approximates BPM distance, allowing BPM distance to be estimated at low computational cost. The new space is constructed using multi-dimensional scaling. The nonlinear transformation between the old and new spaces is learned using support vector regression. We present experimental results for mammographic data.  相似文献   

15.
最近特征空间嵌入NFSE方法在训练过程中选取最近特征空间时采用传统的欧氏距离度量会导致类内离散度和类间离散度变化同步;测试时,最近邻规则也使用欧氏距离度量,而高维空间样本间直线距离具有趋同性。这些都会降低识别率,为解决此问题,提出了基于非线性距离和夹角组合的最近特征空间嵌入方法。在训练阶段,该方法使用非线性距离度量选取最近特征空间,使类内离散度的变化速度远小于类间离散度的变化速度,从而使转换空间中同类样本距离更小,不同类样本距离更大。在匹配阶段,使用结合夹角度量的最近邻分类器,充分利用样本相似性与样本夹角的关系,更适合高维空间中样本分类。仿真实验表明,基于非线性距离和夹角组合的最近特征空间嵌入方法的性能总体上优于对比算法。  相似文献   

16.
17.
Analysis of planar shapes using geodesic paths on shape spaces   总被引:11,自引:0,他引:11  
For analyzing shapes of planar, closed curves, we propose differential geometric representations of curves using their direction functions and curvature functions. Shapes are represented as elements of infinite-dimensional spaces and their pairwise differences are quantified using the lengths of geodesics connecting them on these spaces. We use a Fourier basis to represent tangents to the shape spaces and then use a gradient-based shooting method to solve for the tangent that connects any two shapes via a geodesic. Using the Surrey fish database, we demonstrate some applications of this approach: 1) interpolation and extrapolations of shape changes, 2) clustering of objects according to their shapes, 3) statistics on shape spaces, and 4) Bayesian extraction of shapes in low-quality images.  相似文献   

18.
大多数的空间聚类算法主要针对欧几何空间中的数据对象.然而在大多真实的应用中,空间对象的访问主要受限于空间网络(如道路网络),因此,对道路网络中的对象进行聚类分析更具有现实意义.道路网络中对象之间的距离度量需要通过基于网络的最短路径距离来重新定义,其计算代价高,这使得已有的基于欧几何距离的聚类算法不能直接运用到这种环境中.因此,通过开发道路网络的特征提出了两种新的聚类算法.算法使用网络中的边和结点信息来缩减搜索空间,避免了一些不必要的距离计算.实验结果表明,算法对于真实道路网络中的对象聚类是高效的.  相似文献   

19.
20.
传统多维度文本聚类一般是从文本内容中提取特征,而很少考虑数据中用户与文本的交互信息(如:点赞、转发、评论、关注、引用等行为信息),且传统的多维度文本聚类主要是将多个空间维度线性结合,没能深入考虑每个维度中属性间的关系。为有效利用与文本相关的用户行为信息,提出一种结合用户行为信息的多维度文本聚类模型(MTCUBC)。根据文本间的相似性在不同空间上应该保持一致的原则,该模型将用户行为信息作为文本内容聚类的约束来调节相似度,然后结合度量学习方法来改善文本间的距离,从而提高聚类效果。通过实验表明,与线性结合的多维度聚类相比,MTCUBC模型在高维稀疏数据中表现出明显的优势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号