首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
2.
The accuracy of head pose estimation is significant for many computer vision applications such as face recognition, driver attention detection and human-computer interaction. Most appearance-based head pose estimation works typically extract the low-dimensional face appearance features in some statistic subspaces, where the subspaces represent the underlying geometry structure of the pose space. However, there is an open problem, namely, how to effectively represent appearance-based subspace face for the head pose estimation problem. To address the problem, this paper proposes a head pose estimation approach based on the Lie Algebrized Gaussians (LAG) feature to model the pose characteristic. LAG is built on Gaussian Mixture Models (GMM), which actually not only models the distribution of local appearance features, but also captures the Lie group manifold structure of the feature space. Moreover, to keep multi-resolution structure information, LAG is operated on many subregions of the image. As a result, these properties of LAG enable it to effectively model the structure of subspace face which can lead to powerful discriminative ability for head pose estimation. After representing subspace face using the LAG, we treat the head pose estimation as a classification problem. The within-class covariance normalization (WCCN) based Support Vector Machine (SVM) classifier is employed to achieve robust performance as WCCN could reduce the within-class variabilities of the same pose. Extensive experimental analysis and comparison with both traditional and state-of-the-art algorithms on two challenging benchmarks demonstrate the effectiveness of our approach.  相似文献   

3.
Discriminative approaches for human pose estimation model the functional mapping, or conditional distribution, between image features and 3D poses. Learning such multi-modal models in high dimensional spaces, however, is challenging with limited training data; often resulting in over-fitting and poor generalization. To address these issues Latent Variable Models (LVMs) have been introduced. Shared LVMs learn a low dimensional representation of common causes that give rise to both the image features and the 3D pose. Discovering the shared manifold structure can, in itself, however, be challenging. In addition, shared LVM models are often non-parametric, requiring the model representation to be a function of the training set size. We present a parametric framework that addresses these shortcomings. In particular, we jointly learn latent spaces for both image features and 3D poses by maximizing the non-linear dependencies in the projected latent space, while preserving local structure in the original space; we then learn a multi-modal conditional density between these two low-dimensional spaces in the form of Gaussian Mixture Regression. With this model we can address the issue of over-fitting and generalization, since the data is denser in the learned latent space, as well as avoid the need for learning a shared manifold for the data. We quantitatively compare the performance of the proposed method to several state-of-the-art alternatives, and show that our method gives a competitive performance.  相似文献   

4.
《Pattern recognition letters》2002,23(1-3):103-111
This paper proposes a method for recognizing the numeral characters based on the PCA (Principal Component Analysis) mixture model. The proposed method is motivated by the idea that the classification accuracy is improved by modeling each class into a mixture of several components and by performing the classification in the compact and decorrelated feature space. For realizing the idea, each numeral class is partitioned into several clusters and each cluster's density is estimated by a Gaussian distribution function in the PCA transformed space. The parameter estimation is performed by an iterative EM (Expectation Maximization) algorithm, and model order is selected by a fast sub-optimal validation scheme. The proposed method is also computation-effective because the optimal feature components for a cluster are determined by a sequential elimination of insignificant feature due to the ordering property of the significance among the feature components in the PCA transformed space. Simulation results shows that the proposed recognition method outperforms other methods such as the k-NN (Nearest Neighbor) method, a single PCA model, or the ICA (Independent Component Analysis) mixture model in terms of recognition accuracy.  相似文献   

5.
目前针对人体姿态估计的深度神经网络都是在特征图的固定位置上进行采样,无法对人体姿态的几何变换进行建模,当人体实例在尺寸、姿势、拍摄角度等方面发生变化后,网络泛化能力较差.因此,文中提出基于可变形卷积的多人人体姿态估计方法.利用可变形卷积对目标几何变换建模能力较强的特性,设计特征提取模块,可在人体关键点几何变化的条件下保证检测的准确性.为了进一步提高网络性能,利用预训练残差网络.模型的预测值与二维高斯模型生成的真值用于计算损失,并迭代训练模型,能在拍摄视角、附着物及人物尺度变化等复杂条件下有效检测人体关键点.实验表明,文中模型可有效提升人体关键点检测的准确性.  相似文献   

6.
遥感影像特征发现的稳健统计模型研究   总被引:1,自引:1,他引:0       下载免费PDF全文
高斯混合密降解模型是一种基于稳健统计理论的层次结构的聚类模。GMDD首先假设特征空间是由一组混合的高斯分布组成,然后通过一定的优化算法来获得特征空间中与预称假设相符合的特征分布宵步分离轩到特征空间全部降解为一组混合特征模式的分布集。  相似文献   

7.
高斯混合密度降解模型(GMDD)是一种基于稳健统计理论的层次聚类方法。GMDD的分布模型是假设特征空间是由一组混合的高斯(Gaussian)分布组成的,然后通过一定优化算法来获得特征空间中与预先假设最符的特征分布,并逐步分离出特征空间,直到特征空间全部降解为一组特征模式的混合密度分布集。GMDD与传统的统计聚类相比较,主要优点有:特征类别不受限定、抗干扰力强、参数估计与初始无关、考虑密度分布的可变性等。初步探讨了基于GMDD方法的遥感影像特征估计模型和方法(GIFEM),并提出基于遗传算法的GMDD优化模型。  相似文献   

8.
Data sparseness or overfitting is a serious problem in natural language processing employing machine learning methods. This is still true even for the maximum entropy (ME) method, whose flexible modeling capability has alleviated data sparseness more successfully than the other probabilistic models in many NLP tasks. Although we usually estimate the model so that it completely satisfies the equality constraints on feature expectations with the ME method, complete satisfaction leads to undesirable overfitting, especially for sparse features, since the constraints derived from a limited amount of training data are always uncertain. To control overfitting in ME estimation, we propose the use of box-type inequality constraints, where equality can be violated up to certain predefined levels that reflect this uncertainty. The derived models, inequality ME models, in effect have regularized estimation with L 1 norm penalties of bounded parameters. Most importantly, this regularized estimation enables the model parameters to become sparse. This can be thought of as automatic feature selection, which is expected to improve generalization performance further. We evaluate the inequality ME models on text categorization datasets, and demonstrate their advantages over standard ME estimation, similarly motivated Gaussian MAP estimation of ME models, and support vector machines (SVMs), which are one of the state-of-the-art methods for text categorization.  相似文献   

9.
AI is remarkably successful and outperforms human experts in certain tasks, even in complex domains such as medicine. Humans on the other hand are experts at multi-modal thinking and can embed new inputs almost instantly into a conceptual knowledge space shaped by experience. In many fields the aim is to build systems capable of explaining themselves, engaging in interactive what-if questions. Such questions, called counterfactuals, are becoming important in the rising field of explainable AI (xAI). Our central hypothesis is that using conceptual knowledge as a guiding model of reality will help to train more explainable, more robust and less biased machine learning models, ideally able to learn from fewer data. One important aspect in the medical domain is that various modalities contribute to one single result. Our main question is “How can we construct a multi-modal feature representation space (spanning images, text, genomics data) using knowledge bases as an initial connector for the development of novel explanation interface techniques?”. In this paper we argue for using Graph Neural Networks as a method-of-choice, enabling information fusion for multi-modal causability (causability – not to confuse with causality – is the measurable extent to which an explanation to a human expert achieves a specified level of causal understanding). The aim of this paper is to motivate the international xAI community to further work into the fields of multi-modal embeddings and interactive explainability, to lay the foundations for effective future human–AI interfaces. We emphasize that Graph Neural Networks play a major role for multi-modal causability, since causal links between features can be defined directly using graph structures.  相似文献   

10.
从语音信号声学特征空间的非线性流形结构特点出发, 利用流形上的压缩感知原理, 构建新的语音识别声学模型. 将特征空间划分为多个局部区域, 对每个局部区域用一个低维的因子分析模型进行近似, 从而得到混合因子分析模型. 将上下文相关状态的观测矢量限定在该非线性低维流形结构上, 推导得到其观测概率模型. 最终, 每个状态由一个服从稀疏约束的权重矢量和若干个服从标准正态分布的低维局部因子矢量所决定. 文中给出了局部区域潜在维数的确定准则及模型参数的迭代估计算法. 基于RM语料库的连续语音识别实验表明, 相比于传统的高斯混合模型(Gaussian mixture model, GMM)和子空间高斯混合模型(Subspace Gaussian mixture model, SGMM), 新声学模型在测试集上的平均词错误率(Word error rate, WER)分别相对下降了33.1%和9.2%.  相似文献   

11.
An automatic method to combine several local surrogate models is presented. This method is intended to build accurate and smooth approximation of discontinuous functions that are to be used in structural optimization problems. It strongly relies on the Expectation−Maximization (EM) algorithm for Gaussian mixture models (GMM). To the end of regression, the inputs are clustered together with their output values by means of parameter estimation of the joint distribution. A local expert is then built (linear, quadratic, artificial neural network, moving least squares) on each cluster. Lastly, the local experts are combined using the Gaussian mixture model parameters found by the EM algorithm to obtain a global model. This method is tested over both mathematical test cases and an engineering optimization problem from aeronautics and is found to improve the accuracy of the approximation.  相似文献   

12.
针对单模态特征鉴别行为动作类别的能力有限问题,提出基于RGB-D视频中多模态视觉特征融合和实例化多重核超限学习(Exemplars-MKL-ELM)的动作分类方法.首先,利用骨架表面拟合和密集轨迹提取稳健的密集运动姿态特征,以稠密点云法平面感知人体3维几何的稀疏化有向主成分直方图特征,提取外观纹理嵌入身体节点空-时邻域的三维梯度直方图特征.然后,采用半径边缘约束多重核超限学习机融合多模态视觉特征,并利用对比数据法挖掘每个行为类别的代表性实例集合.最后,每个样本结合融合视觉特征和即得实例集合,采用Exemplars-MKL-ELM模型和贪婪预测思想分层分类识别行为.实验表明,文中方法在分类准确度和计算效率上都较优.  相似文献   

13.
针对在物体外观快速变化的情况下,大多数弱学习器不能捕获物体新的特征分布,导致追踪失败的问题,提出了高斯加权的联机多分类器增强算法。该算法为每一个领域问题定义一个弱分类器,每个弱分类器包括一个简单的视觉特征和阈值,引入高斯加权函数来权衡每个弱分类器在特定样本上的贡献,通过多分类器联合学习来提高追踪性能。在物体追踪过程中,联机多分类器在对物体定位的同时还能估计物体的姿态,能够成功地学习多模态外观模型,在物体外观快速变化的情况下追踪物体。实验结果表明:所提算法在经过一个较短序列的训练后,平均追踪错误率为12.8%,追踪性能明显提升。  相似文献   

14.
We describe the use of support vector machines (SVMs) for continuous speech recognition by incorporating them in segmental minimum Bayes risk decoding. Lattice cutting is used to convert the Automatic Speech Recognition search space into sequences of smaller recognition problems. SVMs are then trained as discriminative models over each of these problems and used in a rescoring framework. We pose the estimation of a posterior distribution over hypotheses in these regions of acoustic confusion as a logistic regression problem. We also show that GiniSVMs can be used as an approximation technique to estimate the parameters of the logistic regression problem. On a small vocabulary recognition task we show that the use of GiniSVMs can improve the performance of a well trained hidden Markov model system trained under the Maximum Mutual Information criterion. We also find that it is possible to derive reliable confidence scores over the GiniSVM hypotheses and that these can be used to good effect in hypothesis combination. We discuss the problems that we expect to encounter in extending this approach to large vocabulary continuous speech recognition and describe initial investigation of constrained estimation techniques to derive feature spaces for SVMs.  相似文献   

15.
高斯过程回归(Gaussian process regression,GPR)是一种广泛应用的回归方法,可以用于解决输入输出均为多元变量的人体姿态估计问题.计算复杂度是高斯过程回归的一个重要考虑因素,而常用的降低计算复杂度的方法为稀疏表示算法.在稀疏算法中,完全独立训练条件(Fully independent training conditional,FITC)法是一种较为先进的算法,多用于解决输入变量彼此之间完全独立的回归问题.另外,输入变量的噪声问题是高斯过程回归的另一个需要考虑的重要因素.对于测试的输入变量噪声,可以通过矩匹配的方法进行解决,而训练输入样本的噪声则可通过将其转换为输出噪声的方法进行解决,从而得到更高的计算精度.本文基于以上算法,提出一种基于噪声输入的稀疏高斯算法,同时将其应用于解决人体姿态估计问题.本文实验中的数据集来源于之前的众多研究人员,其输入为从视频序列中截取的图像或通过特征提取得到的图像信息,输出为三维的人体姿态.与其他算法相比,本文的算法在准确性,运行时间与算法稳定性方面均达到了令人满意的效果.  相似文献   

16.
We develop a method for the estimation of articulated pose, such as that of the human body or the human hand, from a single (monocular) image. Pose estimation is formulated as a statistical inference problem, where the goal is to find a posterior probability distribution over poses as well as a maximum a posteriori (MAP) estimate. The method combines two modeling approaches, one discriminative and the other generative. The discriminative model consists of a set of mapping functions that are constructed automatically from a labeled training set of body poses and their respective image features. The discriminative formulation allows for modeling ambiguous, one-to-many mappings (through the use of multi-modal distributions) that may yield multiple valid articulated pose hypotheses from a single image. The generative model is defined in terms of a computer graphics rendering of poses. While the generative model offers an accurate way to relate observed (image features) and hidden (body pose) random variables, it is difficult to use it directly in pose estimation, since inference is computationally intractable. In contrast, inference with the discriminative model is tractable, but considerably less accurate for the problem of interest. A combined discriminative/generative formulation is derived that leverages the complimentary strengths of both models in a principled framework for articulated pose inference. Two efficient MAP pose estimation algorithms are derived from this formulation; the first is deterministic and the second non-deterministic. Performance of the framework is quantitatively evaluated in estimating articulated pose of both the human hand and human body. Most of this work was done while the first author was with Boston University.  相似文献   

17.
针对现有的人脸姿态估计方法易受“自遮挡”影响,采用改进的ASM 算法 提取人脸特征点,并利用人脸形态的几何统计知识来估计人脸特征点的深度值。以人脸主要 特征点建立人脸稀疏模型,在利用相关人脸特征点近似估计人脸姿态后,通过最小二乘法精 确估计三维人脸空间姿态。实验结果表明,对于“自遮挡”情况,该方法仍有较好的估计结果, 与同类方法比较具有良好的姿态估计精度。  相似文献   

18.
Traditional knowledge graphs (KG) representation learning focuses on the link information between entities, and the effectiveness of learning is influenced by the complexity of KGs. Considering a multi-modal knowledge graph (MKG), due to the introduction of considerable other modal information(such as images and texts), the complexity of KGs further increases, which degrades the effectiveness of representation learning. To resolve this solve the problem, this study proposed the multi-modal knowledge graphs representation learning via multi-head self-attention (MKGRL-MS) model, which improved the effectiveness of link prediction by adding rich multi-modal information to the entity. We first generated a single-modal feature vector corresponding to each entity. Then, we used multi-headed self-attention to obtain the attention degree of different modal features of entities in the process of semantic synthesis. In this manner, we learned the multi-modal feature representation of entities. New knowledge representation is the sum of traditional knowledge representation and an entity’s multi-modal feature representation. Simultaneously, we successfully train our model on two existing models and two different datasets and verified its versatility and effectiveness on the link prediction task.  相似文献   

19.
Robust 3-D-3-D pose estimation   总被引:1,自引:0,他引:1  
The correspondence focuses on the robust 3-D-3-D pose estimation, especially, multiple pose estimation. The robust 3-D-3-D multiple pose estimation problem is formulated as a series of general regressions which involve a successively size-decreasing data set, with each regression relating to one particular pose of interest. Since the first few regressions may carry a severely contaminated Gaussian error noise model, the MF-estimator (Zhuang et al., 1992) is used to solve each regression for each pose of interest. Extensive computer experiments with both real imagery and simulated data are conducted and results are promising. Three distinctive features of the MF-estimator are theoretically discussed and experimentally demonstrated: It is highly robust in the sense that it is not much affected by a possible large portion of outliers or incorrect matches as long as the minimum number of inliers necessary to give a unique solution are provided; It is made virtually independent of initial guesses; It is computationally reasonable and admits an efficient parallel implementation  相似文献   

20.
We introduce Gaussian process dynamical models (GPDM) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensionalmotion capture data. A GPDM is a latent variable model. It comprises a low-dimensional latent space with associated dynamics, and a map from the latent space to an observation space. We marginalize out the model parameters in closed-form, using Gaussian process priors for both the dynamics and the observation mappings. This results in a non-parametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach, and compare four learning algorithms on human motion capture data in which each pose is 50-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号