首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in this work. This approach has been tested on our collected emotional speech database which is composed of six emotions. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues by 7.22 %, 4.45 %, and 19.56 %, respectively. This work also shows that the optimum speaker identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models in the emotional talking environments. The achieved average speaker identification performance based on the new proposed approach falls within 2.35 % of that obtained in subjective evaluation by human judges.  相似文献   

2.
This paper addresses the formulation of a new speaker identification approach which employs knowledge of emotional content of speaker information. Our proposed approach in this work is based on a two-stage recognizer that combines and integrates both emotion recognizer and speaker recognizer into one recognizer. The proposed approach employs both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. In the experiments, six emotions are considered including neutral, angry, sad, happy, disgust and fear. Our results show that average speaker identification performance based on the proposed two-stage recognizer is 79.92% with a significant improvement over a one-stage recognizer with an identification performance of 71.58%. The results obtained based on the proposed approach are close to those achieved in subjective evaluation by human listeners.  相似文献   

3.
Speaker recognition systems perform almost ideal in neutral talking environments; however, these systems perform poorly in emotional talking environments. This research is devoted to enhancing the low performance of text-independent and emotion-dependent speaker identification in emotional talking environments based on employing Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) as classifiers. This work has been tested on our speech database which is composed of 50 speakers talking in six different emotional states. These states are neutral, angry, sad, happy, disgust, and fear. Our results show that the average speaker identification performance in these talking environments based on CSPHMM2s is 81.50% with an improvement rate of 5.61%, 3.39%, and 3.06% compared, respectively, to First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s), and First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s). Our results based on subjective evaluation by human judges fall within 2.26% of those obtained based on CSPHMM2s.  相似文献   

4.
The recognition of the emotional state of speakers is a multi-disciplinary research area that has received great interest over the last years. One of the most important goals is to improve the voice-based human–machine interactions. Several works on this domain use the prosodic features or the spectrum characteristics of speech signal, with neural networks, Gaussian mixtures and other standard classifiers. Usually, there is no acoustic interpretation of types of errors in the results. In this paper, the spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations. Standard classifiers based on Gaussian Mixture Models, Hidden Markov Models and Multilayer Perceptron are tested. These classifiers have been evaluated with different configurations and input features, in order to design a new hierarchical method for emotion classification. The proposed multiple feature hierarchical method for seven emotions, based on spectral and prosodic information, improves the performance over the standard classifiers and the fixed features.  相似文献   

5.
基于PCA算法的人脸识别   总被引:3,自引:0,他引:3  
介绍了隐马尔可夫特征脸模型(HMEM),由概率性主成分分析方法(PPCA)与离散空间马尔可夫模型法(SL-HMM)整合而成,具有PPCA和SL-HMM的双重特性。利用ORL数据库进行人脸识别实验,结果说明该模型在性能上表现出较大的优势。  相似文献   

6.
We propose a general framework to combine multiple sequence classifiers working on different sequence representations of a given input. This framework, based on Multi-Stream Hidden Markov Models (MS-HMMs), allows the combination of multiple HMMs operating on partially asynchronous information streams. This combination may operate at different levels of modeling: from the feature level to the post-processing level. This framework is applied to on-line handwriting word recognition by combining temporal and spatial representation of the signal. Different combination schemes are compared experimentally on isolated character recognition and word recognition tasks, using the UNIPEN international database.Received: 16 August 2002, Accepted: 21 November 2002, Published online: 6 June 2003  相似文献   

7.
The speech recognition system basically extracts the textual information present in the speech. In the present work, speaker independent isolated word recognition system for one of the south Indian language—Kannada has been developed. For European languages such as English, large amount of research has been carried out in the context of speech recognition. But, speech recognition in Indian languages such as Kannada reported significantly less amount of work and there are no standard speech corpus readily available. In the present study, speech database has been developed by recording the speech utterances of regional Kannada news corpus of different speakers. The speech recognition system has been implemented using the Hidden Markov Tool Kit. Two separate pronunciation dictionaries namely phone based and syllable based dictionaries are built in-order to design and evaluate the performances of phone-level and syllable-level sub-word acoustical models. Experiments have been carried out and results are analyzed by varying the number of Gaussian mixtures in each state of monophone Hidden Markov Model (HMM). Also, context dependent triphone HMM models have been built for the same Kannada speech corpus and the recognition accuracies are comparatively analyzed. Mel frequency cepstral coefficients along with their first and second derivative coefficients are used as feature vectors and are computed in acoustic front-end processing. The overall word recognition accuracy of 60.2 and 74.35 % respectively for monophone and triphone models have been obtained. The study shows a good improvement in the accuracy of isolated-word Kannada speech recognition system using triphone HMM models compared to that of monophone HMM models.  相似文献   

8.
To recognize speech, handwriting or sign language, many hybrid approaches have been proposed that combine Dynamic Time Warping (DTW) or Hidden Markov Models (HMM) with discriminative classifiers. However, all methods rely directly on the likelihood models of DTW/HMM. We hypothesize that time warping and classification should be separated because of conflicting likelihood modelling demands. To overcome these restrictions, we propose to use Statistical DTW (SDTW) only for time warping, while classifying the warped features with a different method. Two novel statistical classifiers are proposed (CDFD and Q-DFFM), both using a selection of discriminative features (DF), and are shown to outperform HMM and SDTW. However, we have found that combining likelihoods of multiple models in a second classification stage degrades performance of the proposed classifiers, while improving performance with HMM and SDTW. A proof-of-concept experiment, combining DFFM mappings of multiple SDTW models with SDTW likelihoods, shows that also for model-combining, hybrid classification can provide significant improvement over SDTW. Although recognition is mainly based on 3D hand motion features, these results can be expected to generalize to recognition with more detailed measurements such as hand/body pose and facial expression.  相似文献   

9.
This paper presents a new hybrid method for continuous Arabic speech recognition based on triphones modelling. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Hidden Markov Models (HMM) standards. In this work, we describe a new approach of categorising Arabic vowels to long and short vowels to be applied on the labeling phase of speech signals. Using this new labeling method, we deduce that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM. The obtained results for the Arabic speech recognition system based on triphones are 64.68 % with HMMs, 72.39 % with MLP/HMM and 74.01 % for SVM/HMM hybrid model. The WER obtained for the recognition of continuous speech by the three systems proves the performance of SVM/HMM by obtaining the lowest average for 4 tested speakers 11.42 %.  相似文献   

10.
Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications in human-computer interfaces. We develop a face and head gesture detector in video streams. The detector is based on face landmark paradigm in that appearance and configuration information of landmarks are used. First we detect and track accurately facial landmarks using adaptive templates, Kalman predictor and subspace regularization. Then the trajectories (time series) of facial landmark positions during the course of the head gesture or facial expression are converted in various discriminative features. Features can be landmark coordinate time series, facial geometric features or patches on expressive regions of the face. We use comparatively, two feature sequence classifiers, that is, Hidden Markov Models (HMM) and Hidden Conditional Random Fields (HCRF), and various feature subspace classifiers, that is, ICA (Independent Component Analysis) and NMF (Non-negative Matrix Factorization) on the spatiotemporal data. We achieve 87.3% correct gesture classification on a seven-gesture test database, and the performance reaches 98.2% correct detection under a fusion scheme. Promising and competitive results are also achieved on classification of naturally occurring gesture clips of LIlir TwoTalk Corpus.  相似文献   

11.
This research work aims to disseminate the efforts towards developing and evaluating TAMEEM V1.0, which is a state-of-the-art pure Modern Standard Arabic (MSA), automatic, continuous, speaker independent, and text independent speech recognizer using high proportion of the spoken data of the phonetically rich and balanced MSA speech corpus. The speech corpus contains speech recordings of Arabic native speakers from 11 Arab countries representing Levant, Gulf, and Africa regions of the Arabic World, which make about 45.30 h of speech data. The recordings contain about 39.28 h of 367 sentences that are considered phonetically rich and balanced, which are used for training TAMEEM V1.0 speech recognizer, and another 6.02 h of another 48 sentences that are used for testing purposes, which are mostly text independent and foreign to the training sentences. TAMEEM V1.0 speech recognizer is developed using the Carnegie Mellon University (CMU) Sphinx 3 tools in order to evaluate the speech corpus, whereby the speech engine uses three-emitting state Continuous Density Hidden Markov Model for tri-phone based acoustic models, and the language model contains uni-grams, bi-grams, and tri-grams. Using three different testing data sets, this work obtained 7.64% average Word Error Rate (WER) for speakers dependent with text independent data set. For speakers independent with text dependent data set, this work obtained 2.22% average WER, whereas 7.82% average WER is achieved for speakers independent with text independent data set.  相似文献   

12.
Recognition of Arabic (Indian) bank check digits using log-gabor filters   总被引:3,自引:3,他引:0  
In this paper we present a technique for the automatic recognition of Arabic (Indian) bank check digits based on features extracted by using the Log Gabor filters. The digits are classified by using the K-Nearest Neighbor (K-NN), Hidden Markov Models (HMM) and Support Vector Machines (SVM) classifiers. An extensive experimentation is conducted on the CENPARMI data, a database consisting of 7390 samples of Arabic (Indian) digits for training and 3035 samples for testing extracted from real bank checks. The data is normalized to a height of 64 pixels, maintaining the aspect ratio. Log Gabor filters with several scales and orientations are used. In addition, the filtered images are segmented into different region sizes for feature extraction. Recognition rates of 98.95%, 98.75%, 98.62%, 97.21% and 94.43% are achieved with SVM, 1-NN, 3-NN, HMM and NM classifiers, respectively. These results significantly outperform published work using the same database. The misclassified digits are evaluated subjectively and results indicate that human subjects misclassified 1/3 of these digits. The experimental results, including the subjective evaluation of misclassified digits, indicate the effectiveness of the selected Log Gabor filters parameters, the implemented image segmentation technique, and extracted features for practical recognition of Arabic (Indian) digits.  相似文献   

13.
用于拟人机器人的嵌入式语音交互系统研究   总被引:3,自引:0,他引:3  
陈斌  郭大勇  施克仁 《机器人》2003,25(5):452-455
本文介绍了一种用于拟人机器人的嵌入式语音交互系统.系统采用高质量的语音 采集模块及语音输出模块,以高性能数字信号处理器(DSP)TMS320VC5402为硬件核心.HMM语音识别引擎以LPC倒谱及其差分分量作为语音特征表达,改进的Baum Welch重估算法完成了多观察值序列下的语音模板训练.同时进行了语音特征不同表达形式对识别结果影响的对比实验.系统外围控制程序完成识别结果提示以及与上位机的通讯.系统在词汇量为200的非特定人、孤立词识别上取得了很好的效果.  相似文献   

14.
This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented using the HTK Toolkit. The front-end of the system combines features based on conventional Mel-Frequency Cepstral Coefficient (MFFC), prosodic information and formants. The experiments are performed on the ARADIGIT corpus which is a database of Arabic spoken words. The obtained results show that the resulting multivariate feature vectors, in noisy environment, lead to a significant improvement, up to 27%, in word accuracy relative the word accuracy obtained from the state-of-the-art MFCC-based system.  相似文献   

15.
16.
提出一种新的基于条件随机域和隐马尔可夫模型(HMM)的人类动作识别方法——HMCRF。目前已有的动作识别方法均使用隐马尔可夫模型及其变型,这些模型一个最突出的不足就是要求观察值相互独立。条件模型很容易表示上下文相关性,且可使用动态规划做到有效且精确的推论,它的参数可以通过凸函数优化训练得到。把条件图形模型应用于动作识别之上,并通过大量的实验表明,所提出的方法在识别正确率方面明显优于一般线性结构的CRF和HMM。  相似文献   

17.
文章介绍了一个基于NN/HMM混合模型的汉语地名识别系统,该系统能自动判别并拒识词表之外的词。文中训练的基于HMM的模型,包括关键词模型、填充模型和“反关键词”模型。笔者对识别器的输出结果进行验证,把基于HMM的统计特征送到神经网络处理,由网络的输出来判断是否为词表之外的词。该文在实验中建立了一个基于传统N-Best方法的基准模型并试验了三种不同的网络拓扑结构,包括前馈后向传播网络、Elman后向传播网络以及可训练级联前导后向传播网络。实验结果表明前馈后向传播网络的性能最好,与基准模型比较平均错误率下降54.4%。  相似文献   

18.
This work aims at investigating and analyzing speaker identification in each unbiased and biased emotional talking environments based on a classifier called Suprasegmental Hidden Markov Models (SPHMMs). The first talking environment is unbiased towards any emotion, while the second talking environment is biased towards different emotions. Each of these talking environments is made up of six distinct emotions. These emotions are neutral, angry, sad, happy, disgust and fear. The investigation and analysis of this work show that speaker identification performance in the biased talking environment is superior to that in the unbiased talking environment. The obtained results in this work are close to those achieved in subjective assessment by human judges.  相似文献   

19.
提出一种基于改进Hu矩和隐马尔可夫模型相结合的ATM机异常行为识别方法。对ATM机前用户存(取)款行为的视频序列用改进Hu变换提取运动目标的行为特征,采用Baum-Welch算法对用户的正常行为进行训练,并建立隐马尔可夫模型;最后通过模型输出测试样本序列的概率来识别异常行为。采用Matlab对ATM机用户运动行为的模拟视频进行实验仿真,结果表明:该方法对ATM机前的用户行为具有较高的识别率。  相似文献   

20.
This paper presents an experimental study on an agent system with multimodal interfaces for a smart office environment. The agent system is based upon multimodal interfaces such as recognition modules for both speech and pen-mouse gesture, and identification modules for both face and fingerprint. For essential modules, speech recognition and synthesis were basically used for a virtual interaction between user and system. In this study, a real-time speech recognizer based on a Hidden Markov Network (HM-Net) was incorporated into the proposed system. In addition, identification techniques based on both face and fingerprint were adopted to provide a specific user with the service of a user-customized interaction with security in an office environment. In evaluation, results showed that the proposed system was easy to use and would prove useful in a smart office environment, even though the performance of the speech recognizer was not satisfactory mainly due to noisy environments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号