首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
A central issue in dimension reduction is choosing a sensible number of dimensions to be retained. This work demonstrates the surprising result of the asymptotic consistency of the maximum likelihood criterion for determining the intrinsic dimension of a dataset in an isotropic version of probabilistic principal component analysis (PPCA). Numerical experiments on simulated and real datasets show that the maximum likelihood criterion can actually be used in practice and outperforms existing intrinsic dimension selection criteria in various situations. This paper exhibits and outlines the limits of the maximum likelihood criterion. It leads to recommend the use of the AIC criterion in specific situations. A useful application of this work would be the automatic selection of intrinsic dimensions in mixtures of isotropic PPCA for classification.  相似文献   

2.
本文提出了一种基于区分性准则的模型结构优化方法,用以调整HMM自动语音识别系统中声学模型各状态混合高斯核成分数量的分配。通过优化选定的准则,声学模型可以在使用相同参数数量的情况下得到更好的识别性能,也可以在保持相当性能的前提下降低所需要的模型参数。相对于传统的基于似然度及复杂度惩罚的模型结构优化准则来讲,基于区分性准则的优化方法能够更直接地提高模型的区分度和鉴别力,从而得到更好的识别效果。在一个面向嵌入式系统的中文连续数字串识别任务上的实验结果证明,基于最大互信息量准则的模型结构优化能够得到比传统的、基于模型似然度及复杂度的方法更好的识别效果。  相似文献   

3.
基于Log似然比的特征选择算法   总被引:2,自引:2,他引:0       下载免费PDF全文
林森  唐发根 《计算机工程》2009,35(19):56-58,6
针对基于向量空间模型文本分类系统中特征选择算法存在的问题,提出一种基于Log似然比的特征选择算法,引进Log似然比统计量,在考虑稀有事件对分类结果产生正面影响的同时,较好地控制其对分类产生的负面影响。采用KNN分类方法,将Log似然比特征选择算法与典型特征算法进行比较,实验结果表明,该算法能够获得良好的性能。  相似文献   

4.
This paper provides an alternative formulation of the conditional correlation structure in fitting the multivariate GARCH model. A special case is the multivariate ARCH model with random coefficients. Its coherence structure is derived by the correlations between the random coefficients which play an important role in describing the interested heteroscedastic features. The parameter estimation problem can be solved by maximum likelihood estimation and model selection is via the likelihood ratio test. We consider three real applications: (1) the spot and forward rates of the Deutsche Mark against the US dollars; (2) exchange rates of Deutsche Mark and Japanese Yen against US dollars; (3) the Heng Sang index and SES index.  相似文献   

5.
Mixtures of factor analyzers have been receiving wide interest in statistics as a tool for performing clustering and dimension reduction simultaneously. In this model it is assumed that, within each component, the data are generated according to a factor model. Therefore, the number of parameters on which the covariance matrices depend is reduced. Several estimation methods have been proposed for this model, both in the classical and in the Bayesian framework. However, so far, a direct maximum likelihood procedure has not been developed. This direct estimation problem, which simultaneously allows one to derive the information matrix for the mixtures of factor analyzers, is solved. The effectiveness of the proposed procedure is shown on a simulation study and on a toy example.  相似文献   

6.
With the wide applications of Gaussian mixture clustering, e.g., in semantic video classification [H. Luo, J. Fan, J. Xiao, X. Zhu, Semantic principal video shot classification via mixture Gaussian, in: Proceedings of the 2003 International Conference on Multimedia and Expo, vol. 2, 2003, pp. 189-192], it is a nontrivial task to select the useful features in Gaussian mixture clustering without class labels. This paper, therefore, proposes a new feature selection method, through which not only the most relevant features are identified, but the redundant features are also eliminated so that the smallest relevant feature subset can be found. We integrate this method with our recently proposed Gaussian mixture clustering approach, namely rival penalized expectation-maximization (RPEM) algorithm [Y.M. Cheung, A rival penalized EM algorithm towards maximizing weighted likelihood for density mixture clustering with automatic model selection, in: Proceedings of the 17th International Conference on Pattern Recognition, 2004, pp. 633-636; Y.M. Cheung, Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection, IEEE Trans. Knowl. Data Eng. 17(6) (2005) 750-761], which is able to determine the number of components (i.e., the model order selection) in a Gaussian mixture automatically. Subsequently, the data clustering, model selection, and the feature selection are all performed in a single learning process. Experimental results have shown the efficacy of the proposed approach.  相似文献   

7.
8.
M. C.   《Decision Support Systems》2005,38(4):539-555
Predictive maintenance programs (PMPs) can provide significant advantages in relation to quality, safety, availability and cost reduction in industrial plants. Nevertheless, during implementation, different decision making processes are involved, such as the selection of the most suitable diagnostic techniques. A wrong decision can lead to the failure of the setting up of the predictive maintenance program and its elimination, with the consequent economic losses, as the setting up of these programs is a strategic decision. In this article, a model is proposed that carries out the decision making in relation to the selection of the diagnostic techniques and instrumentation in the predictive maintenance programs. The model uses a combination of tools belonging to operational research such as: analytic hierarchy process (AHP) and factor analysis (FA). The model has been tested in screw compressors when lubricant and vibration analyses are integrated.  相似文献   

9.
We present a novel subspace modeling and selection approach for noisy speech recognition. In subspace modeling, we develop a factor analysis (FA) representation of noisy speech, which is a generalization of a signal subspace (SS) representation. Using FA, noisy speech is represented by the extracted common factors, factor loading matrix, and specific factors. The observation space of noisy speech is accordingly partitioned into a principal subspace, containing speech and noise, and a minor subspace, containing residual speech and residual noise. We minimize the energies of speech distortion in the principal subspace as well as in the minor subspace so as to estimate clean speech with residual information. Importantly, we explore the optimal subspace selection via solving the hypothesis test problems. We test the equivalence of eigenvalues in the minor subspace to select the subspace dimension. To fulfill the FA spirit, we also examine the hypothesis of uncorrelated specific factors/residual speech. The subspace can be partitioned according to a consistent confidence towards rejecting the null hypothesis. Optimal solutions are realized through the likelihood ratio tests, which arrive at the approximated chi-square distributions as test statistics. In the experiments on the Aurora2 database, the FA model significantly outperforms the SS model for speech enhancement and recognition. Subspace selection via testing the correlation of residual speech achieves higher recognition accuracies than that of testing the equivalent eigenvalues in the minor subspace.  相似文献   

10.
吴娅辉  刘刚  郭军 《自动化学报》2009,35(5):551-555
传统的声学模型训练算法如最大似然估计(Maximum likelihood estimation, MLE), 在训练时只考虑了模型自身而没有考虑模型之间的相互影响. 为了进一步提升模型的识别效果, 区分性训练算法被提出. 本文在最小音素错误(Minimum phone error, MPE)区分性训练算法的基础上提出一种基于模型间混淆程度进行模型组合的算法: 针对单混合分量模型, 依据模型间混淆程度对MLE和MPE的模型进行加权组合; 针对多混合分量模型, 提出一种模型选择的算法来获取新的模型参数. 实验表明, 与MPE算法相比, 对单分量的情况, 该算法可以使系统的误识率相对降低4%左右; 对于多分量的情况, 该算法可以使系统的误识率相对降低3%左右.  相似文献   

11.
Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. In this paper we show how the GF framework can be used for semi-supervised regression on high-dimensional data. We propose an active learning strategy based on entropy minimization and a maximum likelihood model selection method. Furthermore, we show how a recent generalization of the LLE algorithm for correspondence learning can be cast into the GF framework, which obviates the need to choose a representation dimensionality.  相似文献   

12.
LVCSR systems are usually based on continuous density HMMs, which are typically implemented using Gaussian mixture distributions. Such statistical modeling systems tend to operate slower than real-time, largely because of the heavy computational overhead of the likelihood evaluation. The objective of our research is to investigate approximate methods that can substantially reduce the computational cost in likelihood evaluation without obviously degrading the recognition accuracy. In this paper, the most common techniques to speed up the likelihood computation are classified into three categories, namely machine optimization, model optimization, and algorithm optimization. Each category is surveyed and summarized by describing and analyzing the basic ideas of the corresponding techniques. The distribution of the numerical values of Gaussian mixtures within a GMM model are evaluated and analyzed to show that computations of some Gaussians are unnecessary and can thus be eliminated. Two commonly used techniques for likelihood approximation, namely VQ-based Gaussian selection and partial distance elimination, are analyzed in detail. Based on the analyses, a fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed. DGS approach is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation. In principle, DGS is an extension of both techniques of partial distance elimination and best mixture prediction, and it does not require additional memory for the storage of Gaussian shortlists. DGS algorithm has been implemented by modifying the likelihood computation procedure in HTK 3.4 system. Experimental results on TIMIT and WSJ0 corpora indicate that this approach can speed up the likelihood computation significantly without introducing apparent additional recognition error.  相似文献   

13.
李茂  周志刚  王涛 《计算机科学》2019,46(1):138-142
稀疏码分多址(即非正交多址)(Sparse Code Multiple Access,SCMA) 技术,具有在有限频谱资源下过载通信的特点,能够显著提升频谱利用率。得益于稀疏码分多址码本的稀疏性,消息传递算法(Message Passing Algorithm,MPA)成为经典多用户检测算法。在传统MPA方法中,尽管与最大似然译码具有相近的误比特率(Bit Error Ratio,BER)性能,但指数运算的复杂度仍然很高。据此,设计一种基于置信度的动态边缘选择更新方法,以减少不必要的节点运算。每次迭代中,利用因子图模型中功能节点到变量节点的置信度稳定性信息,动态判定是否需要节点更新运算。仿真结果表明,动态边缘选择方案使得算法的复杂度得到显著降低,并且能够与BER取得良好的均衡。  相似文献   

14.
提出了增量式有限混合模型来提取概率假设密度滤波器序贯蒙特卡罗实现方式中的多目标状态. 该模型以增量方式构建, 其混合分量采用逐个方式插入其中. 采用极大似然准则来估计多目标状态. 对于给定分量数目的混合模型, 应用期望极大化算法来获得参数的极大似然解. 在新分量插入混合模型时, 保持已有混合模型的参数不变, 仍旧采用极大似然准则从候选新分量集合中选择新插入分量. 新分量插入混合步和期望极大化算法拟合混合参数步交替应用直到混合分量数目达到概率假设密度滤波器的目标数目估计值. 利用k-d树生成插入到混合模型的新分量候选集合. 增量式有限混合模型统一了分量数目变化趋势和粒子集合似然函数的变化趋势, 有助于一步一步地搜寻混合模型的极大似然解. 仿真结果表明, 基于增量式有限混合模型的概率假设密度滤波器状态提取算法在多目标跟踪的应用中优于已有的状态提取算法.  相似文献   

15.
Selecting the order of autoregressions when the parameters of the model are estimated with least-squares algorithms (LSA) is a well researched topic. This type of approach assumes implicitly that the analyzed time series is stationary, which is rarely true in practical applications. It is known since long time that, in the case of nonstationary signals, is recommended to employ forgetting factor least-squares algorithms (FF-LSA) instead of LSA. This makes necessary to modify the selection criteria originally designed for LSA in order to become compatible with FF-LSA. Sequentially normalized maximum likelihood (SNML), which is one of the newest model selection criteria, has been modified independently by two groups of researchers such that to be used in conjunction with FF-LSA. As the proposals coming from the two groups have not been compared in the previous literature, we conduct in this work a theoretical and empirical study for clarifying the relationship between the existing solutions. As part of our study, we also investigate some possibilities to further modify the criteria. Based on our findings, we provide guidance which can potentially be useful for the practitioners.  相似文献   

16.
The assumption of proportional hazards (PH) fundamental to the Cox PH model sometimes may not hold in practice. In this paper, we propose a generalization of the Cox PH model in terms of the cumulative hazard function taking a form similar to the Cox PH model, with the extension that the baseline cumulative hazard function is raised to a power function. Our model allows for interaction between covariates and the baseline hazard and it also includes, for the two sample problem, the case of two Weibull distributions and two extreme value distributions differing in both scale and shape parameters. The partial likelihood approach can not be applied here to estimate the model parameters. We use the full likelihood approach via a cubic B-spline approximation for the baseline hazard to estimate the model parameters. A semi-automatic procedure for knot selection based on Akaike’s information criterion is developed. We illustrate the applicability of our approach using real-life data.  相似文献   

17.
18.
In this paper the ensemble of independent factor analyzers (EIFA) is proposed. This new statistical model assumes that each data point is generated by the sum of outputs of independently activated factor analyzers. A maximum likelihood (ML) estimation algorithm for the parameter is derived using a Monte Carlo EM algorithm with a Gibbs sampler. The EIFA model is applied to natural image data. With the progress of the learning, the independent factor analyzers develop into feature detectors that resemble complex cells in mammalian visual systems. Although this result is similar to the previous one obtained by independent subspace analysis, we observe the emergence of complex cells from natural images in a more general framework of models, including overcomplete models allowing additive noise in the observables.  相似文献   

19.
A FORTRAN program is described for maximum likelihood estimation within the Generalized F family of distributions. It can be used to estimate regression parameters in a log-linear model for censored survival times with covariates, for which the error distribution may have a great variety of shapes, including most distributions of current use in biostatistics. The optimization is performed by an algorithm based on the generalized reduced gradient method. A stepwise variable search algorithm for covariate selection is included in the program. Output features include: model selection criteria, standard errors of parameter estimates, quantile and survival rates with their standard errors, residuals and several plots. An example based on data from Princess Margaret Hospital, Toronto, is discussed to illustrate the program's capabilities.  相似文献   

20.
The use of the multinomial logit model is typically restricted to applications with few predictors, because in high-dimensional settings maximum likelihood estimates tend to deteriorate. A sparsity-inducing penalty is proposed that accounts for the special structure of multinomial models by penalizing the parameters that are linked to one variable in a grouped way. It is devised to handle general multinomial logit models with a combination of global predictors and those that are specific to the response categories. A proximal gradient algorithm is used that efficiently computes stable estimates. Adaptive weights and a refitting procedure are incorporated to improve variable selection and predictive performance. The effectiveness of the proposed method is demonstrated by simulation studies and an application to the modeling of party choice of voters in Germany.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号