共查询到20条相似文献,搜索用时 15 毫秒
1.
Claudio Ceruti Simone Bassis Alessandro Rozza Gabriele Lombardi Elena Casiraghi Paola Campadelli 《Pattern recognition》2014
In the past decade the development of automatic intrinsic dimensionality estimators has gained considerable attention due to its relevance in several application fields. However, most of the proposed solutions prove to be not robust on noisy datasets, and provide unreliable results when the intrinsic dimensionality of the input dataset is high and the manifold where the points are assumed to lie is nonlinearly embedded in a higher dimensional space. In this paper we propose a novel intrinsic dimensionality estimator (DANCo) and its faster variant (FastDANCo), which exploit the information conveyed both by the normalized nearest neighbor distances and by the angles computed on couples of neighboring points. The effectiveness and robustness of the proposed algorithms are assessed by experiments on synthetic and real datasets, by the comparative evaluation with state-of-the-art methodologies, and by significance tests. 相似文献
2.
An evaluation of intrinsic dimensionality estimators 总被引:4,自引:0,他引:4
Verveer P.J. Duin R.P.W. 《IEEE transactions on pattern analysis and machine intelligence》1995,17(1):81-86
The intrinsic dimensionality of a data set may be useful for understanding the properties of classifiers applied to it and thereby for the selection of an optimal classifier. In this paper the authors compare the algorithms for two estimators of the intrinsic dimensionality of a given data set and extend their capabilities. One algorithm is based on the local eigenvalues of the covariance matrix in several small regions in the feature space. The other estimates the intrinsic dimensionality from the distribution of the distances from an arbitrary data vector to a selection of its neighbors. The characteristics of the two estimators are investigated and the results are compared. It is found that both can be applied successfully, but that they might fail in certain cases. The estimators are compared and illustrated using data generated from chromosome banding profiles 相似文献
3.
Information retrieval today is much more challenging than traditional small document retrieval. The main difference is the importance of correlations between related concepts in complex data structures. As collections of data grow and contain more entries, they require more complex relationships, links, and groupings between individual entries. This paper introduces two novel methods for estimating data intrinsic dimensionality based on the singular value decomposition (SVD). The average standard estimator (ASE) and the multi-criteria decision weighted model are used to estimate matrix intrinsic dimensionality for large document collections. The multi-criteria weighted model calculates the sum of weighted values of matrix dimensions which demonstrated best performance using all possible dimensions [1]. ASE estimates the level of significance for singular values that resulted from the singular value decomposition. ASE assumes that those variables with deep relations have sufficient correlation and that only those relationships with high singular values are significant and should be maintained [1]. Experimental results indicate that ASE improves precision and relative relevance for MEDLINE document collection by 10.2% and 12.9% respectively compared to the percentage of variance dimensionality estimation. Results based on testing three document collections over all possible dimensions using selected performance measures indicate that ASE improved matrix intrinsic dimensionality estimation by including the effect of both singular values magnitude of decrease and random noise distracters. The multi-criteria weighted model with dimensionality reduction provides a more efficient implementation for information retrieval than using a full rank model. 相似文献
4.
Recently, a great deal of research work has been devoted to the development of algorithms to estimate the intrinsic dimensionality (id) of a given dataset, that is the minimum number of parameters needed to represent the data without information loss. id estimation is important for the following reasons: the capacity and the generalization capability of discriminant methods depend on it; id is a necessary information for any dimensionality reduction technique; in neural network design the number of hidden units in the encoding middle layer should be chosen according to the id of data; the id value is strongly related to the model order in a time series, that is crucial to obtain reliable time series predictions. Although many estimation techniques have been proposed in the literature, most of them fail on noisy data, or compute underestimated values when the id is sufficiently high. In this paper, after reviewing some of the most important id estimators related to our work, we provide a theoretical motivation of the bias that causes the underestimation effect, and we present two id estimators based on the statistical properties of manifold neighborhoods, which have been developed in order to reduce this effect. We exhaustively evaluate the proposed techniques on synthetic and real datasets, by employing an objective evaluation measure to compare their performance with those achieved by state of the art algorithms; the results show that the proposed methods are promising, and produce reliable estimates also in the difficult case of datasets drawn from non-linearly embedded manifolds, characterized by high?id. 相似文献
5.
Laurent Amsaleg Oussama Chelly Teddy Furon Stéphane Girard Michael E. Houle Ken-ichi Kawarabayashi Michael Nett 《Data mining and knowledge discovery》2018,32(6):1768-1805
This paper is concerned with the estimation of a local measure of intrinsic dimensionality (ID) recently proposed by Houle. The local model can be regarded as an extension of Karger and Ruhl’s expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. This form of intrinsic dimensionality can be particularly useful in search, classification, outlier detection, and other contexts in machine learning, databases, and data mining, as it has been shown to be equivalent to a measure of the discriminative power of similarity functions. Several estimators of local ID are proposed and analyzed based on extreme value theory, using maximum likelihood estimation, the method of moments, probability weighted moments, and regularly varying functions. An experimental evaluation is also provided, using both real and artificial data. 相似文献
6.
7.
The white noise deconvolution or input white noise estimation problem has important applications in oil seismic exploration, communication and signal processing. By the modern time series analysis method, based on the autoregressive moving average (ARMA) innovation model, a new information fusion white noise deconvolution estimator is presented for the general multisensor systems with different local dynamic models and correlated noises. It can handle the input white noise fused filtering, prediction and smoothing problems, and it is applicable to systems with colored measurement noises. It is locally optimal, and is globally suboptimal. The accuracy of the fuser is higher than that of each local white noise estimator. In order to compute the optimal weights, the formula computing the local estimation error cross-covariances is given. A Monte Carlo simulation example for the system with Bernoulli-Gaussian input white noise shows the effectiveness and performances. 相似文献
8.
Many real-world problems are dynamic,requiring optimization algorithms being able to continuously track changing optima(optimum) over time.This paper proposes an improved differential evolutionary algorithm using the notion of the near-neighbor effect to determine one individuals neighborhoods,for tracking multiple optima in the dynamic environment.A new mutation strategy using the near-neighbor effect is also presented.It creates individuals by utilizing the stored memory point in its neighborhood,and utilizing the differential vector produced by the ’nearneighbor -superior’ and ’near-neighbor-inferior’.Taking inspirations from the biological immune system,an immune system based scheme is presented for rapidly detecting and responding to the environmental changes.In addition,a differencerelated multidirectional amplification scheme is presented to integrate valuable information from different dimensions for effectively and rapidly finding the promising optimum in the search space.Experiments on dynamic scenarios created by the typical dynamic test instance—moving peak problem,have demonstrated that the near-neighbor and immune system based differential evolution algorithm(NIDE) is effective in dealing with dynamic optimization functions. 相似文献
9.
This paper presents a review on the fundamental performance limitations and design tradeoffs of feedback control systems, ranging from classical performance tradeoff issues to the more recent information-theoretic analysis, and from conventional feedback systems to networked control systems, with an attempt to document some of the key achievements in more than seventy years of intellectual inquiries into control performance limitation studies, as so embodied by the timeless contributions of Bode known as the Bode integral relations. 相似文献
10.
Support vector machines (SVM) has achieved great success in multi-class classification. However, with the increase in dimension, the irrelevant or redundant features may degrade the generalization performances of the SVM classifiers, which make dimensionality reduction (DR) become indispensable for high-dimensional data. At present, most of the DR algorithms reduce all data points to the same dimension for multi-class datasets, or search the local latent dimension for each class, but they neglect the fact that different class pairs also have different local latent dimensions. In this paper, we propose an adaptive class pairwise dimensionality reduction algorithm (ACPDR) to improve the generalization performances of the multi-class SVM classifiers. In the proposed algorithm, on the one hand, different class pairs are reduced to different dimensions; on the other hand, a tabu strategy is adopted to select adaptively a suitable embedding dimension. Five popular DR algorithms are employed in our experiment, and the numerical results on some benchmark multi-class datasets show that compared with the traditional DR algorithms, the proposed ACPDR can improve the generalization performances of the multi-class SVM classifiers, and also verify that it is reasonable to consider the different class pairs have different local dimensions. 相似文献
11.
An adaptive freeway traffic state estimator 总被引:1,自引:0,他引:1
Yibing Wang Author Vitae Markos Papageorgiou Author Vitae Albert Messmer Author Vitae Pierluigi Coppola Author Vitae Athina Tzimitsi Author Vitae Agostino Nuzzolo Author Vitae 《Automatica》2009,45(1):10-24
Real-data testing results of a real-time nonlinear freeway traffic state estimator are presented with a particular focus on its adaptive features. The pursued general approach to the real-time adaptive estimation of complete traffic state in freeway stretches or networks is based on stochastic nonlinear macroscopic traffic flow modeling and extended Kalman filtering. One major innovative aspect of the estimator is the real-time joint estimation of traffic flow variables (flows, mean speeds, and densities) and some important model parameters (free speed, critical density, and capacity), which leads to four significant features of the traffic state estimator: (i) avoidance of prior model calibration; (ii) automatic adaptation to changing external conditions (e.g. weather and lighting conditions, traffic composition, control measures); (iii) enabling of incident alarms; (iv) enabling of detector fault alarms. The purpose of the reported real-data testing is, first, to demonstrate feature (i) by investigating some basic properties of the estimator and, second, to explore some adaptive capabilities of the estimator that enable features (ii)-(iv). The achieved testing results are quite satisfactory and promising for further work and field applications. 相似文献
12.
13.
We prove that the standard nonparametric mean estimator for judgment post-stratification is inadmissible under squared error loss within a certain class of linear estimators. We derive alternate estimators that are admissible in this class, and we show that one of them is always better than the standard estimator. The reduction in mean squared error from using this alternate estimator can be as large as 10% for small set sizes and small sample sizes. 相似文献
14.
A linear time-invariant optimal estimator for uncertain output matrix systems is considered. Uncertainty of the output matrix is represented by a set of multiple parameters, and the optimal estimator is introduced by minimizing the expected cost function of the estimation error. When the number of measured variables is small, the order of the optimal estimator does not depend on the number of models, and meeting the computational demand becomes quite feasible 相似文献
15.
W.K. Wong 《Pattern recognition》2012,45(4):1511-1523
How to define sparse affinity weight matrices is still an open problem in existing manifold learning algorithms. In this paper, we propose a novel unsupervised learning method called Non-negative Sparseness Preserving Embedding (NSPE) for linear dimensionality reduction. Differing from the manifold learning-based subspace learning methods such as Locality Preserving Projections (LPP), Neighbor Preserving Embedding (NPE) and the recently proposed sparse representation based Sparsity Preserving Projections (SPP); NSPE preserves the non-negative sparse reconstruction relationships in low-dimensional subspace. Another novelty of NSPE is the sparseness constraint, which is directly added to control the non-negative sparse representation coefficients. This gives a more ground truth model to imitate the actions of the active neuron cells of V1 of the primate visual cortex on information processing. Although labels are not used in the training steps, the non-negative sparse representation can still discover the latent discriminant information and thus provides better measure coefficients and significant discriminant abilities for feature extraction. Moreover, NSPE is more efficient than the recently proposed sparse representation based SPP algorithm. Comprehensive comparison and extensive experiments show that NSPE has the competitive performance against the unsupervised learning algorithms such as classical PCA and the state-of-the-art techniques: LPP, NPE and SPP. 相似文献
16.
An improved Hoschek intrinsic parametrization 总被引:3,自引:0,他引:3
Smoothing a set of points pi with a B-spline curve is an usual CAGD application, which remains an open problem due to the choice of the parameter values. J. Hoschek proposed one of the first iterative solution called intrinsic parametrization. This idea has been improved several times by introducing different parameter corrections. This paper deals with a new improvement of Hoschek's method providing better results with a higher speed of convergence. Examples are proposed and compared with the different approaches. 相似文献
17.
By combining Histogram of Oriented Gradient (HOG), which is based on evaluating well-normalized local histograms of image gradient orientations in a dense grid, with Local Gabor Binary Pattern Histogram Sequence (LGBPHS), which concatenate the histograms of all the local regions of all the local Gabor magnitude binary pattern maps, as a feature set, we proposed a novel human detection feature. We employ Partial Least Squares (PLS) analysis, an efficient dimensionality reduction technique, to project the feature onto a much lower dimensional subspace (9 dimensions, reduced from the original over 12000). We test the new feature in INRIA person dataset by using a linear SVM, and it yields an error rate of 1.35% with a false negatives (FN) rate of 0.46% and a false positive (FP) rate of 0.89%, while the error rate of HOG is 7.11% with a FN rate of 4.09% and a FP rate of 3.02%, and the error rate of LGBPHS is 13.55% with a FN rate of 4.94% and a FP rate of 8.61%. 相似文献
18.
The Principal Component Analysis is one of most applied dimensionality reduction techniques for process monitoring and fault diagnosis in industrial process. This work proposes a procedure based on the discriminant information contained in the principal components to determine the most significant ones in fault separability. The Tennessee Eastman Process industrial benchmark is used to illustrate the effectiveness of the proposal. The use of statistical hypothesis tests as a separability measure between multiple failures is proposed for the selection of the principal components. The classifier profile concept has been introduced for comparison purposes. Results show an improvement in the classification process when compared with traditional techniques and the StepWise selection. This has resulted in a better classification for a fixed number of components, or a smaller number of required components to obtain a prefixed error rate. In addition, the computational advantage is demonstrated. 相似文献
19.
Sang-Woon Kim 《Pattern recognition letters》2011,32(6):816-823
This paper presents an empirical evaluation on the methods of reducing the dimensionality of dissimilarity spaces for optimizing dissimilarity-based classifications (DBCs). One problem of DBCs is the high dimensionality of the dissimilarity spaces. To address this problem, two kinds of solutions have been proposed in the literature: prototype selection (PS) based methods and dimension reduction (DR) based methods. Although PS-based and DR-based methods have been explored separately by many researchers, not much analysis has been done on the study of comparing the two. Therefore, this paper aims to find a suitable method for optimizing DBCs by a comparative study. Our empirical evaluation, obtained with the two approaches for an artificial and three real-life benchmark databases, demonstrates that DR-based methods, such as principal component analysis (PCA) and linear discriminant analysis (LDA) based methods, generally improve the classification accuracies more than PS-based methods. Especially, the experimental results demonstrate that PCA is more useful for the well-represented data sets, while LDA is more helpful for the small sample size problems. 相似文献
20.
Heng Tao Shen Xiaofang Zhou Aoying Zhou 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(2):219-234
The notorious “dimensionality curse” is a well-known phenomenon for any multi-dimensional indexes attempting to scale up to
high dimensions. One well-known approach to overcome degradation in performance with respect to increasing dimensions is to
reduce the dimensionality of the original dataset before constructing the index. However, identifying the correlation among
the dimensions and effectively reducing them are challenging tasks. In this paper, we present an adaptive Multi-level Mahalanobis-based Dimensionality Reduction (MMDR) technique for high-dimensional indexing. Our MMDR technique has four notable features compared to existing methods.
First, it discovers elliptical clusters for more effective dimensionality reduction by using only the low-dimensional subspaces.
Second, data points in the different axis systems are indexed using a single B
+-tree. Third, our technique is highly scalable in terms of data size and dimension. Finally, it is also dynamic and adaptive
to insertions. An extensive performance study was conducted using both real and synthetic datasets, and the results show that
our technique not only achieves higher precision, but also enables queries to be processed efficiently. 相似文献