共查询到20条相似文献,搜索用时 15 毫秒
1.
In multimedia information retrieval, multimedia data are represented as vectors in high-dimensional space. To search these
vectors efficiently, a variety of indexing methods have been proposed. However, the performance of these indexing methods
degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high-dimensional
space into vectors in low-dimensional space before the data are indexed. This paper proposes a novel method for dimensionality
reduction based on a function that approximates the Euclidean distance based on the norm and angle components of a vector. First, we identify the causes of, and discuss basic solutions
to, errors in angle approximation during the approximation of the Euclidean distance. Then, this paper propose a new method for dimensionality reduction that
extracts a set of subvectors from a feature vector and maintains only the norm and the approximated angle for every subvector.
The selection of a good reference vector is crucial for accurate approximation of the angle component. We present criteria
for being a good reference vector, and propose a method that chooses a good reference vector. Also, we define a novel distance
function using the norm and angle components, and formally prove that the distance function consistently lower-bounds the
Euclidean distance. This implies information retrieval with this function does not incur any false dismissals. Finally, the
superiority of the proposed approach is verified via extensive experiments with synthetic and real-life data sets.
相似文献
2.
There is a great interest in dimensionality reduction techniques for tackling the problem of high-dimensional pattern classification. This paper addresses the topic of supervised learning of a linear dimension reduction mapping suitable for classification problems. The proposed optimization procedure is based on minimizing an estimation of the nearest neighbor classifier error probability, and it learns a linear projection and a small set of prototypes that support the class boundaries. The learned classifier has the property of being very computationally efficient, making the classification much faster than state-of-the-art classifiers, such as SVMs, while having competitive recognition accuracy. The approach has been assessed through a series of experiments, showing a uniformly good behavior, and competitive compared with some recently proposed supervised dimensionality reduction techniques. 相似文献
3.
在相似度计算中,本体能够将各种概念及相互关系明确地,形式化地表达,因而发挥着重要的作用.为了使相似度计算结果更为精确,考虑更全面的利用本体中的关系,和相似度计算在特定领域中应用的特点,提出一个改进的相似度计算模型.利用上下位关系计算相似度,非上下位关系计算相关度,将二者合成,并同时考虑语义检索领域中,相似度计算的不对称性.经过实验验证了该方法有效且精确. 相似文献
4.
In this paper, we propose a novel method named Mixed Kernel CCA (MKCCA) to achieve easy yet accurate implementation of dimensionality reduction. MKCCA consists of two major steps. First, the high dimensional data space is mapped into the reproducing kernel Hilbert space (RKHS) rather than the Hilbert space, with a mixture of kernels, i.e. a linear combination between a local kernel and a global kernel. Meanwhile, a uniform design for experiments with mixtures is also introduced for model selection. Second, in the new RKHS, Kernel CCA is further improved by performing Principal Component Analysis (PCA) followed by CCA for effective dimensionality reduction. We prove that MKCCA can actually be decomposed into two separate components, i.e. PCA and CCA, which can be used to better remove noises and tackle the issue of trivial learning existing in CCA or traditional Kernel CCA. After this, the proposed MKCCA can be implemented in multiple types of learning, such as multi-view learning, supervised learning, semi-supervised learning, and transfer learning, with the reduced data. We show its superiority over existing methods in different types of learning by extensive experimental results. 相似文献
5.
Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern affect the success of subsequent classification. Feature extraction is the process of deriving new features from original features to reduce the cost of feature measurement, increase classifier efficiency, and allow higher accuracy. Many feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and classification efficiency, it does not necessarily reduce the number of features to be measured since each new feature may be a linear combination of all of the features in the original pattern vector. Here, we present a new approach to feature extraction in which feature selection and extraction and classifier training are performed simultaneously using a genetic algorithm. The genetic algorithm optimizes a feature weight vector used to scale the individual features in the original pattern vectors. A masking vector is also employed for simultaneous selection of a feature subset. We employ this technique in combination with the k nearest neighbor classification rule, and compare the results with classical feature selection and extraction techniques, including sequential floating forward feature selection, and linear discriminant analysis. We also present results for the identification of favorable water-binding sites on protein surfaces 相似文献
6.
We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets. 相似文献
7.
This paper describes the usage of dimensionality reduction techniques for computer facial animation. Techniques such as Principal Components Analysis (PCA), Expectation-Maximization (EM) algorithm for PCA, Multidimensional Scaling (MDS), and Locally Linear Embedding (LLE) are compared for the purpose of facial animation of different emotions. The experimental results on our facial animation data demonstrate the usefulness of dimensionality reduction techniques for both space and time reduction. In particular, the EMPCA algorithm performed especially well in our dataset, with negligible error of only 1-2%. 相似文献
9.
In this paper, a decomposition method for binary tensors, generalized multi-linear model for principal component analysis (GMLPCA) is proposed. To the best of our knowledge at present there is no other principled systematic framework for decomposition or topographic mapping of binary tensors. In the model formulation, we constrain the natural parameters of the Bernoulli distributions for each tensor element to lie in a sub-space spanned by a reduced set of basis (principal) tensors. We evaluate and compare the proposed GMLPCA technique with existing real-valued tensor decomposition methods in two scenarios: (1) in a series of controlled experiments involving synthetic data; (2) on a real-world biological dataset of DNA sub-sequences from different functional regions, with sequences represented by binary tensors. The experiments suggest that the GMLPCA model is better suited for modelling binary tensors than its real-valued counterparts. Furthermore, we extended our GMLPCA model to the semi-supervised setting by forcing the model to search for a natural parameter subspace that represents a user-specified compromise between the modelling quality and the degree of class separation. 相似文献
10.
Functional Data Analysis deals with samples where a whole function is observed for each individual. A relevant case of FDA is when the observed functions are density functions. Among the particular characteristics of density functions, the most of the fact that they are an example of infinite dimensional compositional data (parts of some whole which only carry relative information) is made. Several dimensionality reduction methods for this particular type of data are compared: functional principal components analysis with or without a previous data transformation, and multidimensional scaling for different inter-density distances, one of them taking into account the compositional nature of density functions. The emphasis is on the steps previous and posterior to the application of a particular dimensionality reduction method: care must be taken in choosing the right density function transformation and/or the appropriate distance between densities before performing dimensionality reduction; subsequently the graphical representation of dimensionality reduction results must take into account that the observed objects are density functions. The different methods are applied 1 to artificial and real data (population pyramids for 223 countries in year 2000). As a global conclusion, the use of multidimensional scaling based on compositional distance is recommended. 相似文献
11.
Automatic acoustic-based vehicle detection is a common task in security and surveillance systems. Usually, a recording device
is placed in a designated area and a hardware/software system processes the sounds that are intercepted by this recording
device to identify vehicles only as they pass by. An algorithm, which is suitable for online automatic detection of vehicles,
which is based on their online acoustic recordings, is proposed. The scheme uses dimensionality reduction methodologies such
as random projections instead of using traditional signal processing methods to extract features. It uncovers characteristic
features of the recorded sounds without any assumptions about the structure of the signal. The set of features is classified
by the application of PCA. The microphone is opened all the time and the algorithm filtered out many background noises such
as wind, steps, speech, airplanes, etc. The introduced algorithm is generic and can be applied to various signal types for
solving different detection and classification problems. 相似文献
12.
A multivariate time series is one of the most important objects of research in data mining. Time and variables are two of its distinctive characteristics that add the complication of the algorithms applied to data mining. Reduction in the dimensionality is often regarded as an effective way to address these issues. In this paper, we propose a method based on principal component analysis (PCA) to effectively reduce the dimensionality. We call it “piecewise representation based on PCA” (PPCA), which segments multivariate time series into several sequences, calculates the covariance matrix for each of them in terms of the variables, and employs PCA to obtain the principal components in an average covariance matrix. The results of the experiments, including retained information analysis, classification, and a comparison of the central processing unit time consumption, demonstrate that the PPCA method used to reduce the dimensionality in multivariate time series is superior to the prior methods. 相似文献
13.
基于领域本体综合考虑属性、语义距离、层次深度和调节因子等多种因素对词语相似度的影响,提出计算词语相似度的方法.实验证明,该方法充分利用了领域本体中概念的层次关系和属性特点,并把它们结合起来,利用词语之间的相似度对文本的向量空间模型进行扩展,达到了较好的文本分类效果. 相似文献
14.
提出了基于语义相似度和相关度的综合概念相似度计算方法.语义相似度考虑了语义距离和本体库特征,加入概念的信息量、概念的深度、概念的密度和不对称因子的辅助影响;语义相关度从直接相关、间接相关、直接继承和间接继承几个方面考虑.通过实验和两种传统的语义相似度计算方法进行对比,本方法能更好地区分本体树中不同关系的概念对,验证了该方法的有效性. 相似文献
16.
Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets. 相似文献
17.
This paper introduces a novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection. Our proposal is based on the assumption that, in the absence of labels reflecting the cluster membership of each case of the database, those features that exhibit low correlation with the rest of the features can be considered irrelevant for the learning process. Thus, we suggest performing this process using only the relevant features. Then, every irrelevant feature is added to the learned model to obtain an explanatory model for the original database which is our primary goal. A simple and, thus, efficient measure to assess the relevance of the features for the learning process is presented. Additionally, the form of this measure allows us to calculate a relevance threshold to automatically identify the relevant features. The experimental results reported for synthetic and real-world databases show the ability of our proposal to distinguish between relevant and irrelevant features and to accelerate learning, while still obtaining good explanatory models for the original database 相似文献
18.
通过对领域本体参照下传统概念的3种语义相似度的计算模型研究,针对这3种计算模型的优缺点和领域本体所特有的性质,提出了一种改进的基于领域本体的概念语义相似度计算模型.实验结果表明,该计算模型通过定量的分析利用本体构词所描述的概念、特性之间的相似度,可以指导基于领域知识本体的语义查询中概念集扩充和查询结果排序,为概念之间的语义关系提供一种有效的量化. 相似文献
19.
The recent trends in collecting huge and diverse datasets have created a great challenge in data analysis. One of the characteristics of these gigantic datasets is that they often have significant amounts of redundancies. The use of very large multi-dimensional data will result in more noise, redundant data, and the possibility of unconnected data entities. To efficiently manipulate data represented in a high-dimensional space and to address the impact of redundant dimensions on the final results, we propose a new technique for the dimensionality reduction using Copulas and the LU-decomposition (Forward Substitution) method. The proposed method is compared favorably with existing approaches on real-world datasets: Diabetes, Waveform, two versions of Human Activity Recognition based on Smartphone, and Thyroid Datasets taken from machine learning repository in terms of dimensionality reduction and efficiency of the method, which are performed on statistical and classification measures. 相似文献
|