首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
This paper presents a comparative study of two machine learning techniques for recognizing handwritten Arabic words, where hidden Markov models (HMMs) and dynamic Bayesian networks (DBNs) were evaluated. The work proposed is divided into three stages, namely preprocessing, feature extraction and classification. Preprocessing includes baseline estimation and normalization as well as segmentation. In the second stage, features are extracted from each of the normalized words, where a set of new features for handwritten Arabic words is proposed, based on a sliding window approach moving across the mirrored word image. The third stage is for classification and recognition, where machine learning is applied using HMMs and DBNs. In order to validate the techniques, extensive experiments were conducted using the IFN/ENIT database which contains 32,492 Arabic words. Experimental results and quantitative evaluations showed that HMM outperforms DBN in terms of higher recognition rate and lower complexity.  相似文献   

2.
基于HMM方法的银行票据自动识别   总被引:2,自引:0,他引:2  
利用隐态马尔可夫模型(HMMs),对银行票据中金额的大小写数据识别问题进行了研究.主要内容包括建立新颖的文字分刻算法;设计HMM训练和识别算法.在HMM系统中,将使用频率比较高的手写体错别字和同音字作为不同的字符类来处理;同时在HMM的训练过程中,提出了平滑参数的新方法.实验结果表明,该方法在实践中是可行的,在银行票据自动识别中有很好的应用前景.  相似文献   

3.
In this paper, we present a novel segmentation-free Arabic handwriting recognition system based on hidden Markov model (HMM). Two main contributions are introduced: a new technique for dividing the image into nonuniform horizontal segments to extract the features and a new technique for solving the problems of the skewing of characters by fusing multiple HMMs. Moreover, two enhancements are introduced: the pre-processing method and feature extraction using concavity space. The proposed system first pre-processes the input image by setting the thickness of the input word to three pixels and fixing the spacing between the different parts of the word. The input image is divided into constant number of nonuniform horizontal segments depending on the distribution of the foreground pixels. A set of robust features representing the gradient of the foreground pixels is extracted using sliding windows. The input image is decomposed into several images representing the vertical, horizontal, left diagonal and right diagonal edges in the image. A set of robust features representing the densities of the foreground pixels in the various edge images is extracted using sliding windows. The proposed system builds character HMM models and learns word HMM models using embedded training. Besides the vertical sliding window, two slanted sliding windows are used to extract the features. Three different HMMs are used: one for the vertical sliding window and two for the slanted windows. A fusion scheme is used to combine the three HMMs. The proposed system is very promising and outperforms all the other Arabic handwriting recognition systems reported in the literature.  相似文献   

4.
近年来,由于动态贝叶斯网络(DBN)相对于传统的隐马尔可夫模型(HMM)更具可解释性、可分解性以及可扩展性,基于DBN的语音识别引起学者们越来越多的关注.但是,目前关于基于DBN的语音识别的研究主要集中在孤立语音识别上,连续语音识别的框架和识别算法还远没有HMM成熟和灵活.为了解决基于DBN的连续语音识别的灵活性和可扩展性,将在基于HMM的连续语音识别中很好地解决了上述问题的Token传递模型加以修改,使之适用于DBN.在该模型基础上,为基于DBN的连续语音识别提出了一个基本框架,并在此框架下提出了一个新的独立于上层语言模型的识别算法.还介绍了作者开发的一套基于该框架的可用于连续语音识别及其他时序系统的工具包DTK.  相似文献   

5.
In this paper we consider two related problems in hidden Markov models (HMMs). One, how the various parameters of an HMM actually contribute to predictions of state sequences and spatio-temporal pattern recognition. Two, how the HMM parameters (and associated HMM topology) can be updated to improve performance. These issues are examined in the context of four different experimental settings from pure simulations to observed data. Results clearly demonstrate the benefits of applying some critical tests on the model parameters before using it as a predictor or spatio-temporal pattern recognition technique.  相似文献   

6.
Traditional statistical models for speech recognition have mostly been based on a Bayesian framework using generative models such as hidden Markov models (HMMs). This paper focuses on a new framework for speech recognition using maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMMs, features can be asynchronous and overlapping. This model therefore allows for the potential combination of many different types of features, which need not be statistically independent of each other. In this paper, a specific kind of direct model, the maximum entropy Markov model (MEMM), is studied. Even with conventional acoustic features, the approach already shows promising results for phone level decoding. The MEMM significantly outperforms traditional HMMs in word error rate when used as stand-alone acoustic models. Preliminary results combining the MEMM scores with HMM and language model scores show modest improvements over the best HMM speech recognizer.  相似文献   

7.
Motion trajectories provide rich spatio-temporal information about an object's activity. The trajectory information can be obtained using a tracking algorithm on data streams available from a range of devices including motion sensors, video cameras, haptic devices, etc. Developing view-invariant activity recognition algorithms based on this high dimensional cue is an extremely challenging task. This paper presents efficient activity recognition algorithms using novel view-invariant representation of trajectories. Towards this end, we derive two Affine-invariant representations for motion trajectories based on curvature scale space (CSS) and centroid distance function (CDF). The properties of these schemes facilitate the design of efficient recognition algorithms based on hidden Markov models (HMMs). In the CSS-based representation, maxima of curvature zero crossings at increasing levels of smoothness are extracted to mark the location and extent of concavities in the curvature. The sequences of these CSS maxima are then modeled by continuous density (HMMs). For the case of CDF, we first segment the trajectory into subtrajectories using CDF-based representation. These subtrajectories are then represented by their Principal Component Analysis (PCA) coefficients. The sequences of these PCA coefficients from subtrajectories are then modeled by continuous density hidden Markov models (HMMs). Different classes of object motions are modeled by one Continuous HMM per class where state PDFs are represented by GMMs. Experiments using a database of around 1750 complex trajectories (obtained from UCI-KDD data archives) subdivided into five different classes are reported.  相似文献   

8.
We present a glove-based hand gesture recognition system using hidden Markov models (HMMs) for recognizing the unconstrained 3D trajectory gestures of operators in a remote work environment. A Polhemus sensor attached to a PinchGlove is employed to obtain a sequence of 3D positions of a hand trajectory. The direct use of 3D data provides more naturalness in generating gestures, thereby avoiding some of the constraints usually imposed to prevent performance degradation when trajectory data are projected into a specific 2D plane. We use two kinds of HMMs according to the basic units to be modeled: gesture-based HMM and stroke-based HMM. The decomposition of gestures into more primitive strokes is quite attractive, since reversely concatenating stroke-based HMMs makes it possible to construct a new set of gesture-based HMMs. Any deterioration in performance and reliability arising from decomposition can be remedied by a fine-tuned relearning process for such composite HMMs. We also propose an efficient method of estimating a variable threshold of reliability for an HMM, which is found to be useful in rejecting unreliable patterns. In recognition experiments on 16 types of gestures defined for remote work, the fine-tuned composite HMM achieves the best performance of 96.88% recognition rate and also the highest reliability.  相似文献   

9.
In this paper we present a multiple classifier system (MCS) for on-line handwriting recognition. The MCS combines several individual recognition systems based on hidden Markov models (HMMs) and bidirectional long short-term memory networks (BLSTM). Beside using two different recognition architectures (HMM and BLSTM), we use various feature sets based on on-line and off-line features to obtain diverse recognizers. Furthermore, we generate a number of different neural network recognizers by changing the initialization parameters. To combine the word sequences output by the recognizers, we incrementally align these sequences using the recognizer output voting error reduction framework (ROVER). For deriving the final decision, different voting strategies are applied. The best combination ensemble has a recognition rate of 84.13%, which is significantly higher than the 83.64% achieved if only one recognition architecture (HMM or BLSTM) is used for the combination, and even remarkably higher than the 81.26% achieved by the best individual classifier. To demonstrate the high performance of the classification system, the results are compared with two widely used commercial recognizers from Microsoft and Vision Objects.  相似文献   

10.
隐马尔可夫模型在脱机手写体汉字识别中的应用   总被引:6,自引:1,他引:6  
介绍了一种新的脱机手写汉字识别方法--隐马尔可夫模型(HMM)法,该方法对每个汉字建立8个HMM,通过等比重综合方法将8个分类器的计算结果进行综合,从而得到识别结果,实践证明该方法是可行的。  相似文献   

11.
The article presents an application of hidden Markov models (HMMs) for pattern recognition on genome sequences. We apply HMM for identifying genes encoding the variant surface glycoprotein (VSG) in the genomes of Trypanosoma brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa causative agents of sleeping sickness and several diseases in domestic and wild animals. These parasites have a peculiar strategy to evade the host’s immune system that consists in periodically changing their predominant cellular surface protein (VSG). The motivation for using patterns recognition methods to identify these genes, instead of traditional homology based ones, is that the levels of sequence identity (amino acid and DNA sequence) amongst these genes is often below of what is considered reliable in these methods. Among pattern recognition approaches, HMM are particularly suitable to tackle this problem because they can handle more naturally the determination of gene edges. We evaluate the performance of the model using different number of states in the Markov model, as well as several performance metrics. The model is applied using public genomic data. Our empirical results show that the VSG genes on T. brucei can be safely identified (high sensitivity and low rate of false positives) using HMM.  相似文献   

12.
During the last decade, the most significant advances in the field of continuous speech recognition (CSR) have arisen from the use of hidden Markov models (HMM) for acoustic modeling. These models address one of the major issues for CSR: simultaneous modeling of temporal and frequency distortions in the speech signal. In the HMM, the temporal dimension is managed through an oriented states graph, each state accounting for the local frequency distortions through a probability density function. In this study, improvement of the HMM performance is expected from the introduction of a very effective non-parametric probability density function estimate: the k-nearest neighbors (k-nn) estimate.First, experiments on a short-term speech spectrum identification task are performed to compare the k-nn estimate and the widespread estimate based on mixtures of Gaussian functions. Then adaptations implied by the integration of the k-nn estimate in an HMM-based recognition system are developed. An optimal training protocol is obtained based on the introduction of the membership coefficients in the HMM parameters. The membership coefficients measure the degree of association between a reference acoustic vector and a HMM state. The training procedure uses the expectation-maximization (EM) algorithm applied to the membership coefficient estimation. Its convergence is shown according to the maximum likelihood criterion. This study leads to the development of a baseline k-nn/HMM recognition system which is evaluated on the TIMIT speech database. Further improvements of the k-nn/HMM system are finally sought through the introduction of a temporal information into the representation space (delta coefficients) and the adaptation of the references (mainly, gender modeling and contextual modeling).  相似文献   

13.
We propose a coupled hidden Markov model (CHMM) approach to video-realistic speech animation, which realizes realistic facial animations driven by speaker independent continuous speech. Different from hidden Markov model (HMM)-based animation approaches that use a single-state chain, we use CHMMs to explicitly model the subtle characteristics of audio-visual speech, e.g., the asynchrony, temporal dependency (synchrony), and different speech classes between the two modalities. We derive an expectation maximization (EM)-based A/V conversion algorithm for the CHMMs, which converts acoustic speech into decent facial animation parameters. We also present a video-realistic speech animation system. The system transforms the facial animation parameters to a mouth animation sequence, refines the animation with a performance refinement process, and finally stitches the animated mouth with a background facial sequence seamlessly. We have compared the animation performance of the CHMM with the HMMs, the multi-stream HMMs and the factorial HMMs both objectively and subjectively. Results show that the CHMMs achieve superior animation performance. The ph-vi-CHMM system, which adopts different state variables (phoneme states and viseme states) in the audio and visual modalities, performs the best. The proposed approach indicates that explicitly modelling audio-visual speech is promising for speech animation.  相似文献   

14.
While Hidden Markov Models (HMMs) have been successful in many speech recognition tasks, performance on conversational speech is somewhat less successful, arguably due in part to the greater variation in timing of articulatory events. Loosely Coupled or Factorial HMMs (FHMMs) represent a family of models that have more flexibility for modeling such variation in speech, but there are tradeoffs to be studied in terms of computation and potential added confusability. This paper investigates two specific instances – Mixed-Memory and Parameter-Tied FHMMs – that can both be thought of as loosely coupled HMMs for modelling multiple time series. The Parameter-Tied FHMM, introduced here, has a potential advantage for speech modelling since it allows a left-to-right topology across the product state space. Experimental results on the ISOLET task show both models are feasible for speech recognition; TI-DIGITS recognition results show the Parameter-Tied FHMM is competitive with Multiband Models. State occupancy and pruning analyses show trends related to asynchrony that hold across the different models.  相似文献   

15.
The speech recognition system basically extracts the textual information present in the speech. In the present work, speaker independent isolated word recognition system for one of the south Indian language—Kannada has been developed. For European languages such as English, large amount of research has been carried out in the context of speech recognition. But, speech recognition in Indian languages such as Kannada reported significantly less amount of work and there are no standard speech corpus readily available. In the present study, speech database has been developed by recording the speech utterances of regional Kannada news corpus of different speakers. The speech recognition system has been implemented using the Hidden Markov Tool Kit. Two separate pronunciation dictionaries namely phone based and syllable based dictionaries are built in-order to design and evaluate the performances of phone-level and syllable-level sub-word acoustical models. Experiments have been carried out and results are analyzed by varying the number of Gaussian mixtures in each state of monophone Hidden Markov Model (HMM). Also, context dependent triphone HMM models have been built for the same Kannada speech corpus and the recognition accuracies are comparatively analyzed. Mel frequency cepstral coefficients along with their first and second derivative coefficients are used as feature vectors and are computed in acoustic front-end processing. The overall word recognition accuracy of 60.2 and 74.35 % respectively for monophone and triphone models have been obtained. The study shows a good improvement in the accuracy of isolated-word Kannada speech recognition system using triphone HMM models compared to that of monophone HMM models.  相似文献   

16.
This paper proposes and evaluates a new statistical discrimination measure for hidden Markov models (HMMs) extending the notion of divergence, a measure of average discrimination information originally defined for two probability density functions. Similar distance measures have been proposed for the case of HMMs, but those have focused primarily on the stationary behavior of the models. However, in speech recognition applications, the transient aspects of the models have a principal role in the discrimination process and, consequently, capturing this information is crucial in the formulation of any discrimination indicator. This paper proposes the notion of average divergence distance (ADD) as a statistical discrimination measure between two HMMs, considering the transient behavior of these models. This paper provides an analytical formulation of the proposed discrimination measure, a justification of its definition based on the Viterbi decoding approach, and a formal proof that this quantity is well defined for a left-to-right HMM topology with a final nonemitting state, a standard model for basic acoustic units in automatic speech recognition (ASR) systems. Using experiments based on this discrimination measure, it is shown that ADD provides a coherent way to evaluate the discrimination dissimilarity between acoustic models.  相似文献   

17.
The performance of an automatic facial expression recognition system can be significantly improved by modeling the reliability of different streams of facial expression information utilizing multistream hidden Markov models (HMMs). In this paper, we present an automatic multistream HMM facial expression recognition system and analyze its performance. The proposed system utilizes facial animation parameters (FAPs), supported by the MPEG-4 standard, as features for facial expression classification. Specifically, the FAPs describing the movement of the outer-lip contours and eyebrows are used as observations. Experiments are first performed employing single-stream HMMs under several different scenarios, utilizing outer-lip and eyebrow FAPs individually and jointly. A multistream HMM approach is proposed for introducing facial expression and FAP group dependent stream reliability weights. The stream weights are determined based on the facial expression recognition results obtained when FAP streams are utilized individually. The proposed multistream HMM facial expression system, which utilizes stream reliability weights, achieves relative reduction of the facial expression recognition error of 44% compared to the single-stream HMM system.  相似文献   

18.
Factorial Hidden Markov Models   总被引:15,自引:0,他引:15  
Hidden Markov models (HMMs) have proven to be one of the most widely used tools for learning probabilistic models of time series data. In an HMM, information about the past is conveyed through a single discrete variable—the hidden state. We discuss a generalization of HMMs in which this state is factored into multiple state variables and is therefore represented in a distributed manner. We describe an exact algorithm for inferring the posterior probabilities of the hidden state variables given the observations, and relate it to the forward–backward algorithm for HMMs and to algorithms for more general graphical models. Due to the combinatorial nature of the hidden state representation, this exact algorithm is intractable. As in other intractable systems, approximate inference can be carried out using Gibbs sampling or variational methods. Within the variational framework, we present a structured approximation in which the the state variables are decoupled, yielding a tractable algorithm for learning the parameters of the model. Empirical comparisons suggest that these approximations are efficient and provide accurate alternatives to the exact methods. Finally, we use the structured approximation to model Bach's chorales and show that factorial HMMs can capture statistical structure in this data set which an unconstrained HMM cannot.  相似文献   

19.
This paper presents a new hybrid method for continuous Arabic speech recognition based on triphones modelling. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Hidden Markov Models (HMM) standards. In this work, we describe a new approach of categorising Arabic vowels to long and short vowels to be applied on the labeling phase of speech signals. Using this new labeling method, we deduce that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM. The obtained results for the Arabic speech recognition system based on triphones are 64.68 % with HMMs, 72.39 % with MLP/HMM and 74.01 % for SVM/HMM hybrid model. The WER obtained for the recognition of continuous speech by the three systems proves the performance of SVM/HMM by obtaining the lowest average for 4 tested speakers 11.42 %.  相似文献   

20.
This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated by the identical HMM state. Accordingly, the multiple Markov chains are adopted to characterize the variation trends in different dimensions of cepstral vectors. An FA streamed HMM (FASHMM) method is developed to relax the assumption of standard HMM topology, namely, that all features of a speech frame perform the same state emission. The proposed FASHMM is more flexible than the streamed factorial HMM (SFHMM) where the streaming was empirically determined. To reduce the number of factor loading matrices in FA, we evaluated the similarity between individual matrices to find the optimal solution to parameter clustering of FA models. A new decoding algorithm was presented to perform FASHMM speech recognition. FASHMM carries out the streamed Markov chains for a sequence of multivariate Gaussian mixture observations through the state transitions of the partitioned vectors. In the experiments, the proposed method reduced the recognition error rates significantly when compared with the standard HMM and SFHMM methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号