首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 937 毫秒
1.
This paper describes an approach to human action recognition based on a probabilistic optimization model of body parts using hidden Markov model (HMM). Our method is able to distinguish between similar actions by only considering the body parts having major contribution to the actions, for example, legs for walking, jogging and running; arms for boxing, waving and clapping. We apply HMMs to model the stochastic movement of the body parts for action recognition. The HMM construction uses an ensemble of body‐part detectors, followed by grouping of part detections, to perform human identification. Three example‐based body‐part detectors are trained to detect three components of the human body: the head, legs and arms. These detectors cope with viewpoint changes and self‐occlusions through the use of ten sub‐classifiers that detect body parts over a specific range of viewpoints. Each sub‐classifier is a support vector machine trained on features selected for the discriminative power for each particular part/viewpoint combination. Grouping of these detections is performed using a simple geometric constraint model that yields a viewpoint‐invariant human detector. We test our approach on three publicly available action datasets: the KTH dataset, Weizmann dataset and HumanEva dataset. Our results illustrate that with a simple and compact representation we can achieve robust recognition of human actions comparable to the most complex, state‐of‐the‐art methods.  相似文献   

2.
The role of gesture recognition is significant in areas like human‐computer interaction, sign language, virtual reality, machine vision, etc. Among various gestures of the human body, hand gestures play a major role to communicate nonverbally with the computer. As the hand gesture is a continuous pattern with respect to time, the hidden Markov model (HMM) is found to be the most suitable pattern recognition tool, which can be modeled using the hand gesture parameters. The HMM considers the speeded up robust feature features of hand gesture and uses them to train and test the system. Conventionally, the Viterbi algorithm has been used for training process in HMM by discovering the shortest decoded path in the state diagram. The recursiveness of the Viterbi algorithm leads to computational complexity during the execution process. In order to reduce the complexity, the state sequence analysis approach is proposed for training the hand gesture model, which provides a better recognition rate and accuracy than that of the Viterbi algorithm. The performance of the proposed approach is explored in the context of pattern recognition with the Cambridge hand gesture data set.  相似文献   

3.
Action recognition is usually a central problem for many practical applications, such as video annotations, video surveillance and human computer interaction. Most action recognition approaches are based on localized spatio-temporal features that can vary significantly when the viewpoint changes. However, their performance rapidly drops when the viewpoints of the training and testing data are different. In this paper, we propose a transfer learning framework for view-invariant action recognition by the way of sharing image stitching feature among different views. Experimental results on multi-view action recognition IXMAS dataset demonstrate that our method produces remarkably good results and outperforms baseline methods.  相似文献   

4.
基于扩展C型HMM人脸表情识别   总被引:1,自引:0,他引:1  
隐马尔科夫模型(HMM)能够很好地对时间和空间建模,在对动态的表情序列进行识别时HMM取得了很好的识别效果。但是传统的HMM训练算法基于最大似然准则,在该准则下训练的HMM表情序列模型识别能力有限。针对这一不足,通过增加状态中心参数C对HMM模型进行了扩展,然后在此基础上使用状态空间上隐射算法来建立模型。试验结果表明所建立的扩展C型HMM模型和相应的算法提高了识别能力。  相似文献   

5.
In previous research on human machine interaction, parameters or templates of gestures are always learnt from training samples first and then a certain kind of matching is conducted. For these training-required methods, a small number of training samples always result in poor or user-independent performance, while a large quantity of training samples lead to time-consuming and laborious sample collection processes. In this paper, a high-performance training-free approach for hand gesture recognition with accelerometer is proposed. First, we determine the underlining space for gesture generation with the physical meaning of acceleration direction. Then, the template of each gesture in the underlining space can be generated from the gesture trails, which are frequently provided in the instructions of gesture recognition devices. Thus, during the gesture template generation process, the algorithm does not require training samples any more and fulfills training-free gesture recognition. After that, a feature extraction method, which transforms the original acceleration sequence into a sequence of more user-invariant features in the underlining space, and a more robust template matching method, which is based on dynamic programming, are presented to finish the gesture recognition process and enhance the system performance. Our algorithm is tested in a 28-user experiment with 2,240 gesture samples and this training-free algorithm shows better performance than the traditional training-required algorithms of Hidden Markov Model (HMM) and Dynamic Time Warping (DTW).  相似文献   

6.
This paper investigates the modelling of the interframe dependence in a hidden Markov model (HMM) for speech recognition. First, a new observation model, assuming dependence on multiple previous frames, is proposed. This model represents such a dependence structure with a weighted mixture of a set of first-order conditional Gaussian densities, each mixture component accounting for a specific conditional frame. Next, an optimization in choosing the conditional frames/segment is performed in both training and recognition, thereby helping to remove the mismatch of the conditional segments due to different observation histories. An EM (Expectation–Maximization) iteration algorithm is developed for the estimation of the model parameters and for the optimization over the dependence structure. Experimental comparisons on a speaker-independent E-set database show that the new model, without optimization on the dependence structure, achieves better performance than the standard HMM, the bigram HMM and the linear-predictive HMM, all in comparable or smaller parameter sizes. The optimization over the dependence structure leads to further improvement in the performance.  相似文献   

7.
This paper proposes a view-invariant gait recognition algorithm, which builds a unique view invariant model taking advantage of the dimensionality reduction provided by the Direct Linear Discriminant Analysis (DLDA). Proposed scheme is able to reduce the under-sampling problem (USP) that appears usually when the number of training samples is much smaller than the dimension of the feature space. Proposed approach uses the Gait Energy Images (GEIs) and DLDA to create a view invariant model that is able to determine with high accuracy the identity of the person under analysis independently of incoming angles. Evaluation results show that the proposed scheme provides a recognition performance quite independent of the view angles and higher accuracy compared with other previously proposed gait recognition methods, in terms of computational complexity and recognition accuracy.  相似文献   

8.
The hidden Markov model (HMM) inversion algorithm, based on either the gradient search or the Baum-Welch reestimation of input speech features, is proposed and applied to the robust speech recognition tasks under general types of mismatch conditions. This algorithm stems from the gradient-based inversion algorithm of an artificial neural network (ANN) by viewing an HMM as a special type of ANN. Given input speech features s, the forward training of an HMM finds the model parameters lambda subject to an optimization criterion. On the other hand, the inversion of an HMM finds speech features, s, subject to an optimization criterion with given model parameters lambda. The gradient-based HMM inversion and the Baum-Welch HMM inversion algorithms can be successfully integrated with the model space optimization techniques, such as the robust MINIMAX technique, to compensate the mismatch in the joint model and feature space. The joint space mismatch compensation technique achieves better performance than the single space, i.e. either the model space or the feature space alone, mismatch compensation techniques. It is also demonstrated that approximately 10-dB signal-to-noise ratio (SNR) gain is obtained in the low SNR environments when the joint model and feature space mismatch compensation technique is used.  相似文献   

9.
Human action recognition is a challenging task due to significant intra-class variations, occlusion, and background clutter. Most of the existing work use the action models based on statistic learning algorithms for classification. To achieve good performance on recognition, a large amount of the labeled samples are therefore required to train the sophisticated action models. However, collecting labeled samples is labor-intensive. To tackle this problem, we propose a boosted multi-class semi-supervised learning algorithm in which the co-EM algorithm is adopted to leverage the information from unlabeled data. Three key issues are addressed in this paper. Firstly, we formulate the action recognition in a multi-class semi-supervised learning problem to deal with the insufficient labeled data and high computational expense. Secondly, boosted co-EM is employed for the semi-supervised model construction. To overcome the high dimensional feature space, weighted multiple discriminant analysis (WMDA) is used to project the features into low dimensional subspaces in which the Gaussian mixture models (GMM) are trained and boosting scheme is used to integrate the subspace models. Thirdly, we present the upper bound of the training error in multi-class framework, which is able to guide the novel classifier construction. In theory, the proposed solution is proved to minimize this upper error bound. Experimental results have shown good performance on public datasets.  相似文献   

10.
One serious difficulty in the deployment of wideband speech recognition systems for new tasks is the expense in both time and cost of obtaining sufficient training data. A more economical approach is to collect telephone speech and then restrict the application to operate at the telephone bandwidth. However, this generally results in suboptimal performance compared to a wideband recognition system. In this paper, we propose a novel expectation-maximization (EM) algorithm in which wideband acoustic models are trained using a small amount of wideband speech and a larger amount of narrowband speech. We show how this algorithm can be incorporated into the existing training schemes of hidden Markov model (HMM) speech recognizers. Experiments performed using wideband speech and telephone speech demonstrate that the proposed mixed-bandwidth training algorithm results in significant improvements in recognition accuracy over conventional training strategies when the amount of wideband data is limited  相似文献   

11.
This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated by the identical HMM state. Accordingly, the multiple Markov chains are adopted to characterize the variation trends in different dimensions of cepstral vectors. An FA streamed HMM (FASHMM) method is developed to relax the assumption of standard HMM topology, namely, that all features of a speech frame perform the same state emission. The proposed FASHMM is more flexible than the streamed factorial HMM (SFHMM) where the streaming was empirically determined. To reduce the number of factor loading matrices in FA, we evaluated the similarity between individual matrices to find the optimal solution to parameter clustering of FA models. A new decoding algorithm was presented to perform FASHMM speech recognition. FASHMM carries out the streamed Markov chains for a sequence of multivariate Gaussian mixture observations through the state transitions of the partitioned vectors. In the experiments, the proposed method reduced the recognition error rates significantly when compared with the standard HMM and SFHMM methods.  相似文献   

12.
This paper presents an improved method based on single trial EEG data for the online classification of motor imagery tasks for brain-computer interface (BCI) applications. The ultimate goal of this research is the development of a novel classification method that can be used to control an interactive robot agent platform via a BCI system. The proposed classification process is an adaptive learning method based on an optimization process of the hidden Markov model (HMM), which is, in turn, based on meta-heuristic algorithms. We utilize an optimized strategy for the HMM in the training phase of time-series EEG data during motor imagery-related mental tasks. However, this process raises important issues of model interpretation and complexity control. With these issues in mind, we explore the possibility of using a harmony search algorithm that is flexible and thus allows the elimination of tedious parameter assignment efforts to optimize the HMM parameter configuration. In this paper, we illustrate a sequential data analysis simulation, and we evaluate the optimized HMM. The performance results of the proposed BCI experiment show that the optimized HMM classifier is more capable of classifying EEG datasets than ordinary HMM during motor imagery tasks.  相似文献   

13.
动作识别使得机器能够对人体动作的意图进行判别理解,进而实现高效的人机交互。提出一种肢体角度模型,实现在三维空间中对人体动作进行表示,该模型具有一定的不变性,计算复杂度低。针对传统的基于混合高斯的隐马尔可夫模型(GMM-HMM)的动作识别,提出深度置信网络模型(DBN)和隐马尔可夫模型相结合的动作识别模型,构建了一种非线性的基于条件限制玻尔兹曼机(CRBM)的DBN深度学习模型,深层次结构使其建模能力更强,且能够结合历史信息建模,更适用于动作识别。实验表明该算法具有较高的识别结果。  相似文献   

14.
View Invariance for Human Action Recognition   总被引:4,自引:0,他引:4  
This paper presents an approach for viewpoint invariant human action recognition, an area that has received scant attention so far, relative to the overall body of work in human action recognition. It has been established previously that there exist no invariants for 3D to 2D projection. However, there exist a wealth of techniques in 2D invariance that can be used to advantage in 3D to 2D projection. We exploit these techniques and model actions in terms of view-invariant canonical body poses and trajectories in 2D invariance space, leading to a simple and effective way to represent and recognize human actions from a general viewpoint. We first evaluate the approach theoretically and show why a straightforward application of the 2D invariance idea will not work. We describe strategies designed to overcome inherent problems in the straightforward approach and outline the recognition algorithm. We then present results on 2D projections of publicly available human motion capture data as well on manually segmented real image sequences. In addition to robustness to viewpoint change, the approach is robust enough to handle different people, minor variabilities in a given action, and the speed of aciton (and hence, frame-rate) while encoding sufficient distinction among actions. This work was done when the author was a graduate student in the Department of Computer Science and was partially supported by the NSF Grant ECS-02-5475. The author is curently with Siemens Corporate Research, Princeton, NJ. Dr. Chellappa is with the Department of Electrical and Computer Engineering.  相似文献   

15.
This paper proposes a fuzzy qualitative approach to vision-based human motion analysis with an emphasis on human motion recognition. It achieves feasible computational cost for human motion recognition by combining fuzzy qualitative robot kinematics with human motion tracking and recognition algorithms. First, a data-quantization process is proposed to relax the computational complexity suffered from visual tracking algorithms. Second, a novel human motion representation, i.e., qualitative normalized template, is developed in terms of the fuzzy qualitative robot kinematics framework to effectively represent human motion. The human skeleton is modeled as a complex kinematic chain, and its motion is represented by a series of such models in terms of time. Finally, experiment results are provided to demonstrate the effectiveness of the proposed method. An empirical comparison with conventional hidden Markov model (HMM) and fuzzy HMM (FHMM) shows that the proposed approach consistently outperforms both HMMs in human motion recognition.   相似文献   

16.
一种改进的隐马尔可夫模型在语音识别中的应用   总被引:1,自引:0,他引:1  
提出了一种新的马尔可夫模型——异步隐马尔可夫模型.该模型针对噪音环境下语音识别过程中出现丢失帧的情况,通过增加新的隐藏时间标示变量Ck,估计出实际观察值对应的状态序列,实现对不规则或者不完整采样数据的建模.详细介绍了适合异步HMM的前后向算法以及用于训练的EM算法,并且对转移矩阵的计算进行了优化.最后通过实验仿真,分别使用经典HMM和异步HMM对相同的随机抽取帧的语音数据进行识别,识别结果显示在抽取帧相同情况下异步HMM比经典HMM的识别错误率低.  相似文献   

17.
In this paper, we propose a novel optimization algorithm called constrained line search (CLS) for discriminative training (DT) of Gaussian mixture continuous density hidden Markov model (CDHMM) in speech recognition. The CLS method is formulated under a general framework for optimizing any discriminative objective functions including maximum mutual information (MMI), minimum classification error (MCE), minimum phone error (MPE)/minimum word error (MWE), etc. In this method, discriminative training of HMM is first cast as a constrained optimization problem, where Kullback-Leibler divergence (KLD) between models is explicitly imposed as a constraint during optimization. Based upon the idea of line search, we show that a simple formula of HMM parameters can be found by constraining the KLD between HMM of two successive iterations in an quadratic form. The proposed CLS method can be applied to optimize all model parameters in Gaussian mixture CDHMMs, including means, covariances, and mixture weights. We have investigated the proposed CLS approach on several benchmark speech recognition databases, including TIDIGITS, Resource Management (RM), and Switchboard. Experimental results show that the new CLS optimization method consistently outperforms the conventional EBW method in both recognition performance and convergence behavior.  相似文献   

18.
为了解决语音信号中帧与帧之间的重叠,提高语音信号的自适应能力,本文提出基于隐马尔可夫(HMM)与遗传算法神经网络改进的语音识别系统.该改进方法主要利用小波神经网络对Mel频率倒谱系数(MFCC)进行训练,然后利用HMM对语音信号进行时序建模,计算出语音对HMM的输出概率的评分,结果作为遗传神经网络的输入,即得语音的分类识别信息.实验结果表明,改进的语音识别系统比单纯的HMM有更好的噪声鲁棒性,提高了语音识别系统的性能.  相似文献   

19.
We present a novel confidence- and margin-based discriminative training approach for model adaptation of a hidden Markov model (HMM)-based handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood (ML) trained HMM systems and try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer-specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used to train writer-independent handwriting models. For model adaptation during decoding, an unsupervised confidence-based discriminative training on a word and frame level within a two-pass decoding process is proposed. The proposed methods are evaluated for closed-vocabulary isolated handwritten word recognition on the IFN/ENIT Arabic handwriting database, where the word error rate is decreased by 33% relative compared to a ML trained baseline system. On the large-vocabulary line recognition task of the IAM English handwriting database, the word error rate is decreased by 25% relative.  相似文献   

20.
一种基于改进CP网络与HMM相结合的混合音素识别方法   总被引:2,自引:0,他引:2  
提出了一种基于改进对偶传播(CP)神经网络与隐驰尔可夫模型(HMM)相结合的混合音素识别方法.这一方法的特点是用一个具有有指导学习矢量量化(LVQ)和动态节点分配等特性的改进的CP网络生成离散HMM音素识别系统中的码书。因此,用这一方法构造的混合音素识别系统中的码书实际上是一个由有指导LVQ算法训练的具有很强分类能力的高性能分类器,这就意味着在用HMM对语音信号进行建模之前,由码书产生的观测序列中  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号