首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In multimedia information retrieval, multimedia data are represented as vectors in high-dimensional space. To search these vectors efficiently, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high-dimensional space into vectors in low-dimensional space before the data are indexed. This paper proposes a novel method for dimensionality reduction based on a function that approximates the Euclidean distance based on the norm and angle components of a vector. First, we identify the causes of, and discuss basic solutions to, errors in angle approximation during the approximation of the Euclidean distance. Then, this paper propose a new method for dimensionality reduction that extracts a set of subvectors from a feature vector and maintains only the norm and the approximated angle for every subvector. The selection of a good reference vector is crucial for accurate approximation of the angle component. We present criteria for being a good reference vector, and propose a method that chooses a good reference vector. Also, we define a novel distance function using the norm and angle components, and formally prove that the distance function consistently lower-bounds the Euclidean distance. This implies information retrieval with this function does not incur any false dismissals. Finally, the superiority of the proposed approach is verified via extensive experiments with synthetic and real-life data sets.
Byung-Uk ChoiEmail:
  相似文献   

2.
There is a great interest in dimensionality reduction techniques for tackling the problem of high-dimensional pattern classification. This paper addresses the topic of supervised learning of a linear dimension reduction mapping suitable for classification problems. The proposed optimization procedure is based on minimizing an estimation of the nearest neighbor classifier error probability, and it learns a linear projection and a small set of prototypes that support the class boundaries. The learned classifier has the property of being very computationally efficient, making the classification much faster than state-of-the-art classifiers, such as SVMs, while having competitive recognition accuracy. The approach has been assessed through a series of experiments, showing a uniformly good behavior, and competitive compared with some recently proposed supervised dimensionality reduction techniques.  相似文献   

3.
改进的概念语义相似度计算   总被引:2,自引:0,他引:2  
在相似度计算中,本体能够将各种概念及相互关系明确地,形式化地表达,因而发挥着重要的作用.为了使相似度计算结果更为精确,考虑更全面的利用本体中的关系,和相似度计算在特定领域中应用的特点,提出一个改进的相似度计算模型.利用上下位关系计算相似度,非上下位关系计算相关度,将二者合成,并同时考虑语义检索领域中,相似度计算的不对称性.经过实验验证了该方法有效且精确.  相似文献   

4.
In this paper, we propose a novel method named Mixed Kernel CCA (MKCCA) to achieve easy yet accurate implementation of dimensionality reduction. MKCCA consists of two major steps. First, the high dimensional data space is mapped into the reproducing kernel Hilbert space (RKHS) rather than the Hilbert space, with a mixture of kernels, i.e. a linear combination between a local kernel and a global kernel. Meanwhile, a uniform design for experiments with mixtures is also introduced for model selection. Second, in the new RKHS, Kernel CCA is further improved by performing Principal Component Analysis (PCA) followed by CCA for effective dimensionality reduction. We prove that MKCCA can actually be decomposed into two separate components, i.e. PCA and CCA, which can be used to better remove noises and tackle the issue of trivial learning existing in CCA or traditional Kernel CCA. After this, the proposed MKCCA can be implemented in multiple types of learning, such as multi-view learning, supervised learning, semi-supervised learning, and transfer learning, with the reduced data. We show its superiority over existing methods in different types of learning by extensive experimental results.  相似文献   

5.
Dimensionality reduction with adaptive graph   总被引:1,自引:1,他引:0  
Graph-based dimensionality reduction (DR) methods have been applied successfully in many practical problems, such as face recognition, where graphs play a crucial role in modeling the data distribution or structure. However, the ideal graph is, in practice, difficult to discover. Usually, one needs to construct graph empirically according to various motivations, priors, or assumptions; this is independent of the subsequent DR mapping calculation. Different from the previous works, in this paper, we attempt to learn a graph closely linked with the DR process, and propose an algorithm called dimensionality reduction with adaptive graph (DRAG), whose idea is to, during seeking projection matrix, simultaneously learn a graph in the neighborhood of a prespecified one. Moreover, the pre-specified graph is treated as a noisy observation of the ideal one, and the square Frobenius divergence is used to measure their difference in the objective function. As a result, we achieve an elegant graph update formula which naturally fuses the original and transformed data information. In particular, the optimal graph is shown to be a weighted sum of the pre-defined graph in the original space and a new graph depending on transformed space. Empirical results on several face datasets demonstrate the effectiveness of the proposed algorithm.  相似文献   

6.
Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern affect the success of subsequent classification. Feature extraction is the process of deriving new features from original features to reduce the cost of feature measurement, increase classifier efficiency, and allow higher accuracy. Many feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and classification efficiency, it does not necessarily reduce the number of features to be measured since each new feature may be a linear combination of all of the features in the original pattern vector. Here, we present a new approach to feature extraction in which feature selection and extraction and classifier training are performed simultaneously using a genetic algorithm. The genetic algorithm optimizes a feature weight vector used to scale the individual features in the original pattern vectors. A masking vector is also employed for simultaneous selection of a feature subset. We employ this technique in combination with the k nearest neighbor classification rule, and compare the results with classical feature selection and extraction techniques, including sequential floating forward feature selection, and linear discriminant analysis. We also present results for the identification of favorable water-binding sites on protein surfaces  相似文献   

7.
句子语义相似度计算   总被引:1,自引:0,他引:1       下载免费PDF全文
句子或文本片段相似度计算在与Web相关的任务中起着越来越重要的作用。在基于概念之间的语义相似度基础之上,提出一种句子语义相似度的计算方法SSBS并进行了相关的实验。与其他方法相比,SSBS方法在特征的量化过程中不仅考虑两个句子的概念对之间的语义相似度和字符串编辑距离,还考虑了不同词性的概念对句子相似度的影响。  相似文献   

8.
概念与文档的语义相似度计算   总被引:1,自引:0,他引:1       下载免费PDF全文
将本体作为背景知识引入到概念之间相似度和文档之间相似度的计算中。通过图模型表示本体中概念以及概念之间的语义关系,用来将一个概念和一个文档扩展为一个语义模糊集,并计算模糊集合之间的相似度。文档相似度的计算是在概念相似度计算的基础之上。在概念相似度的计算过程中引入了语义相似度矩阵以及基于共信息理论的模糊相似度方法。  相似文献   

9.
This paper describes the usage of dimensionality reduction techniques for computer facial animation. Techniques such as Principal Components Analysis (PCA), Expectation-Maximization (EM) algorithm for PCA, Multidimensional Scaling (MDS), and Locally Linear Embedding (LLE) are compared for the purpose of facial animation of different emotions. The experimental results on our facial animation data demonstrate the usefulness of dimensionality reduction techniques for both space and time reduction. In particular, the EMPCA algorithm performed especially well in our dataset, with negligible error of only 1-2%.  相似文献   

10.
The transfer function matrix of a large multi-input/multi-output linear stochastic time-invariant control system may be computationally difficult to estimate and, further, may not provide a good interpretation of the underlying structure of the system. Priestley, Rao and Tong (1073) have suggested a method, using information theoretic and statistical criteria, for reducing the dimensions of such a system, and thus, to some extent, overcoming these difficulties. The authors have undertaken a computational study of the application of this theory, illuminating the problems encountered, and demonstrating the feasibility of the method in practice.  相似文献   

11.
Dimensionality reduction of clustered data sets   总被引:1,自引:0,他引:1  
We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets.  相似文献   

12.
Dimensionality reduction (DR) methods based on sparse representation as one of the hottest research topics have achieved remarkable performance in many applications in recent years. However, it’s a challenge for existing sparse representation based methods to solve nonlinear problem due to the limitations of seeking sparse representation of data in the original space. Motivated by kernel tricks, we proposed a new framework called empirical kernel sparse representation (EKSR) to solve nonlinear problem. In this framework, nonlinear separable data are mapped into kernel space in which the nonlinear similarity can be captured, and then the data in kernel space is reconstructed by sparse representation to preserve the sparse structure, which is obtained by minimizing a ?1 regularization-related objective function. EKSR provides new insights into dimensionality reduction and extends two models: 1) empirical kernel sparsity preserving projection (EKSPP), which is a feature extraction method based on sparsity preserving projection (SPP); 2) empirical kernel sparsity score (EKSS), which is a feature selection method based on sparsity score (SS). Both of the two methods can choose neighborhood automatically as the natural discriminative power of sparse representation. Compared with several existing approaches, the proposed framework can reduce computational complexity and be more convenient in practice.  相似文献   

13.
针对目前本体映射过程中相似度计算存在的问题 ,提出了一种综合的相似度计算方法。首先判断不同本体之间是否存在相关性 ,若相关 ,则充分考虑各种相关因素 ,从语义和概念两个层面来进行比较 ;然后给出本体的综合相似度计算方法 ;最后采用两组测试数据对该方法进行实验 ,并与 GLUE系统的概率统计方法进行了实验对比。实验结果表明 ,该方法能够有效确保相似度计算的准确性。  相似文献   

14.
15.
In this paper, a decomposition method for binary tensors, generalized multi-linear model for principal component analysis (GMLPCA) is proposed. To the best of our knowledge at present there is no other principled systematic framework for decomposition or topographic mapping of binary tensors. In the model formulation, we constrain the natural parameters of the Bernoulli distributions for each tensor element to lie in a sub-space spanned by a reduced set of basis (principal) tensors. We evaluate and compare the proposed GMLPCA technique with existing real-valued tensor decomposition methods in two scenarios: (1) in a series of controlled experiments involving synthetic data; (2) on a real-world biological dataset of DNA sub-sequences from different functional regions, with sequences represented by binary tensors. The experiments suggest that the GMLPCA model is better suited for modelling binary tensors than its real-valued counterparts. Furthermore, we extended our GMLPCA model to the semi-supervised setting by forcing the model to search for a natural parameter subspace that represents a user-specified compromise between the modelling quality and the degree of class separation.  相似文献   

16.
Functional Data Analysis deals with samples where a whole function is observed for each individual. A relevant case of FDA is when the observed functions are density functions. Among the particular characteristics of density functions, the most of the fact that they are an example of infinite dimensional compositional data (parts of some whole which only carry relative information) is made. Several dimensionality reduction methods for this particular type of data are compared: functional principal components analysis with or without a previous data transformation, and multidimensional scaling for different inter-density distances, one of them taking into account the compositional nature of density functions. The emphasis is on the steps previous and posterior to the application of a particular dimensionality reduction method: care must be taken in choosing the right density function transformation and/or the appropriate distance between densities before performing dimensionality reduction; subsequently the graphical representation of dimensionality reduction results must take into account that the observed objects are density functions. The different methods are applied1 to artificial and real data (population pyramids for 223 countries in year 2000). As a global conclusion, the use of multidimensional scaling based on compositional distance is recommended.  相似文献   

17.
18.
Automatic acoustic-based vehicle detection is a common task in security and surveillance systems. Usually, a recording device is placed in a designated area and a hardware/software system processes the sounds that are intercepted by this recording device to identify vehicles only as they pass by. An algorithm, which is suitable for online automatic detection of vehicles, which is based on their online acoustic recordings, is proposed. The scheme uses dimensionality reduction methodologies such as random projections instead of using traditional signal processing methods to extract features. It uncovers characteristic features of the recorded sounds without any assumptions about the structure of the signal. The set of features is classified by the application of PCA. The microphone is opened all the time and the algorithm filtered out many background noises such as wind, steps, speech, airplanes, etc. The introduced algorithm is generic and can be applied to various signal types for solving different detection and classification problems.  相似文献   

19.
Wan  Xiaoji  Li  Hailin  Zhang  Liping  Wu  Yenchun Jim 《The Journal of supercomputing》2022,78(7):9862-9878

A multivariate time series is one of the most important objects of research in data mining. Time and variables are two of its distinctive characteristics that add the complication of the algorithms applied to data mining. Reduction in the dimensionality is often regarded as an effective way to address these issues. In this paper, we propose a method based on principal component analysis (PCA) to effectively reduce the dimensionality. We call it “piecewise representation based on PCA” (PPCA), which segments multivariate time series into several sequences, calculates the covariance matrix for each of them in terms of the variables, and employs PCA to obtain the principal components in an average covariance matrix. The results of the experiments, including retained information analysis, classification, and a comparison of the central processing unit time consumption, demonstrate that the PPCA method used to reduce the dimensionality in multivariate time series is superior to the prior methods.

  相似文献   

20.
针对目前词语相似度算法中普遍存在的信息源单一化,计算结果非线性偏高,以及计算性能和效率的不一致的缺陷,提出了一种基于边权重的WordNet词语相似度的计算方法。该方法在路径与深度的基础上,通过边权重改善WordNet结构中的层次不均匀性,引入编码概念唯一标识两个概念间的相似度,并利用余弦函数修正计算结果的非线性偏差。实验结果表明,对于MC30和RG65测试集,使用该方法计算的词语相似度值与人工判定值计算得到的Pearson相关系数均达到0.87;此外,该方法在计算性能和效率上均保持较高水平。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号