首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.
In speech recognition research,because of the variety of languages,corresponding speech recognition systems need to be constructed for different languages.Especially in a dialect speech recognition system,there are many special words and oral language features.In addition,dialect speech data is very scarce.Therefore,constructing a dialect speech recognition system is difficult.This paper constructs a speech recognition system for Sichuan dialect by combining a hidden Markov model(HMM)and a deep long short-term memory(LSTM)network.Using the HMM-LSTM architecture,we created a Sichuan dialect dataset and implemented a speech recognition system for this dataset.Compared with the deep neural network(DNN),the LSTM network can overcome the problem that the DNN only captures the context of a fixed number of information items.Moreover,to identify polyphone and special pronunciation vocabularies in Sichuan dialect accurately,we collect all the characters in the dataset and their common phoneme sequences to form a lexicon.Finally,this system yields a 11.34%character error rate on the Sichuan dialect evaluation dataset.As far as we know,it is the best performance for this corpus at present.  相似文献   

2.
A log-index weighted cepstral distance measure is proposed and tested in speacker-independent and speaker-dependent isolated word recognition systems using statistic techniques.The weights for the cepstral coefficients of this measure equal the logarithm of the corresponding indices.The experimental results show that this kind of measure works better than any other weighted Euclidean cepstral distance measures on three speech databases.The error rate obtained using this measure is about 1.8 percent for three databases on average,which is a 25% reduction from that obtained using other measures,and a 40% reduction from that obtained using Log Likelihood Ratio(LLR)measure.The experimental results also show that this kind of distance measure woks well in both speaker-dependent and speaker-independent speech recognition systems.  相似文献   

3.
Functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method combining the functional paralanguage features with the accompanying paralanguage features is proposed for the speaker-independent speech emotion recognition. Using this method, the functional paralanguages, such as laughter, cry, and sigh, are used to assist speech emotion recognition. The contributions of our work are threefold. First, one emotional speech database including six kinds of functional paralanguage and six typical emotions were recorded by our research group. Second, the functional paralanguage is put forward to recognize the speech emotions combined with the accompanying paralanguage features. Third, a fusion algorithm based on confidences and probabilities is proposed to combine the functional paralanguage features with the accompanying paralanguage features for speech emotion recognition. We evaluate the usefulness of the functional paralanguage features and the fusion algorithm in terms of precision, recall, and F1-measurement on the emotional speech database recorded by our research group. The overall recognition accuracy achieved for six emotions is over 67% in the speaker-independent condition using the functional paralanguage features.  相似文献   

4.
In mandarin all-syllable recognition,many insert errors occur due to the influence of non-consonant syllables.Introducing the duration model into the recognition process is a direct way to lessen these errors.But that usually could not work well as expected,for the duration is sensitive to speech rate.Hence,aiming at this problem,a novel context dependent duration distribution normalized by speech rate is proposed in this paper and applied to a speech recognition system based on the frame of improved Hidden Markov Model (HMM).To realize this algorithm,the authors employ a new method to estimate the speech rate of a sentence; then compute the duration probability combined with speech rate;and finally implement this duration information in the post-processing stage.With little change in the recognition process and resource demand,the duration model is adopted efficiently in the system.The experimental results indicate that the syllable error rates decrease significantly in two different speech corpora.Especially for the insertions,the error rates reduce about sixty to eighty percent.  相似文献   

5.
In this paper, a learning and recognition approach is proposed for univariate time series composed of output measurements of general nonlinear dynamical systems. Firstly, a class of dynamical systems in the canonical form is derived to describe the univariate time series by introducing coordinate transformation. An observer-based deterministic learning technique is then adopted to achieve dynamical modeling of the associated transformed systems of the training univariate time series, and the modeling results in the form of radial basis function network (RBFN) models are stored in a pattern library. Subsequently, multiple observer-based dynamical estimators containing the RBFN models in the pattern library are constructed for a test univariate time series, and a recognition decision scheme is proposed by the derived recognition indicator. On this basis, more concise recognition conditions are provided, which is beneficial for verifying the recognition results. Finally, simulation studies on the Rossler system and aero-engine stall warning verify the effectiveness of the proposed approach.  相似文献   

6.
The shapes of speakers' vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition--phoneme classes, Gaussian mixture model (GMM) tokenizer, and probabilistic GMM tokenizer--are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus (MASC) show that our feature pruning and feature regulation methods increase the identification rate (IR) by 3.64% and 6.77%, compared with the baseline GMM-UBM (universal background model) algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector.  相似文献   

7.
An intelligent wheelchair is devised, which is controlled by a coordinated mechanism based on a brain-computer interface (BCI) and speech recognition. By performing appropriate activities, users can navigate the wheelchair with four steering behaviors (start, stop, turn left, and turn right). Five healthy subjects participated in an indoor experiment. The results demonstrate the efficiency of the coordinated control mechanism with satisfactory path and time optimality ratios~ and show that speech recognition is a fast and accurate supplement for BCI-based control systems. The proposed intelligent wheelchair is especially suitable for patients suffering from paralysis (especially those with aphasia) who can learn to pronounce only a single sound (e.g.,ah).  相似文献   

8.
In this paper, we propose two kinds of modifications in speaker recognition. First, the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into sub-bands. Consequently we propose a particularly redundant parallel architecture for which most of the correlations are kept. Second, generally a log transformation used to modify the power spectrum is done after the filter-bank in the classical spectrum calculation. We will see that performing this transformation before the filter bank is more interesting in our case. In the processing of recognition, the Gaussian mixture model (GMM) recognition arithmetic is adopted. Experiments on speech corrupted by noise show a better adaptability of this approach in noisy environments, comoared with a conventional device, esoeciallv when oruning of some recognizers is performed.  相似文献   

9.
Probabilistic Latent Semantic Analysis (PLSA) is proven to be effective in the information retrieval and the speech recognition technique. In this paper, we modify the calculation procedure of estimation algorithm. It substantially reduces the memory requirements. And, parallelization approach enables making models in less time. Next, we examined data segmentation for PLSA adaptation. Most meetings have a number of topics. We divide the meeting automatically and fit PLSA models with them. The experiment showed recognition performance improvement.  相似文献   

10.
Phase Correlation Based Iris Image Registration Model   总被引:1,自引:0,他引:1       下载免费PDF全文
Iris recognition is one of the most reliable personal identification methods. In iris recognition systems, image registration is an important component. Accurately registering iris images leads to higher recognition rate for an iris recognition system. This paper proposes a phase correlation based method for iris image registration with sub-pixel accuracy. Compared with existing methods, it is insensitive to image intensity and can compensate to a certain extent the non-linear iris deformation caused by pupil movement. Experimental results show that the proposed algorithm has an encouraging performance.  相似文献   

11.
Speaker variability is known to have an adverse impact on speech systems that process linguistic content, such as speech and language recognition. However, speech production changes in individuals due to stress and emotions have similarly detrimental effect also on the task of speaker recognition as they introduce mismatch with the speaker models typically trained on modal speech. The focus of this study is on the analysis of stress-induced variations in speech and design of an automatic stress level assessment scheme that could be used in directing stress-dependent acoustic models or normalization strategies. Current stress detection methods typically employ a binary decision based on whether the speaker is or not under stress. In reality, the amount of stress in individuals varies and can change gradually. Using speech and biometric data collected in a real-world, variable-stress level law enforcement training scenario, this study considers two methods for stress level assessment. The first approach uses a nearest neighbor clustering scheme at the vowel token and sentence levels to classify speech data into three levels of stress. The second approach employs Euclidean distance metrics within the multi-dimensional feature space to provide real-time stress level tracking capability. Evaluations on audio data confirmed by biometric readings show both methods to be effective in assessment of stress level within a speaker (average accuracy of 55.6?% in a 3-way classification task). In addition, an impact of high-level stress on in-set speaker recognition is evaluated and shown to reduce the accuracy from 91.7?% (low/mid stress) to 21.4?% (high level stress).  相似文献   

12.
申广忠 《微计算机信息》2007,23(12):251-252
目前,蒙古语语音识别的研究尚处于空白阶段,因此蒙古语语音识别系统的研究与开发具有重要意义。而语言模型的确立是语音识别系统中最重要的环节之一。本文根据自己的实践,通过实验的方法最终确立了蒙古语、大量词汇语音识别系统中适宜的语言模型。  相似文献   

13.
In this paper, a sinusoidal model has been proposed for characterization and classification of different stress classes (emotions) in a speech signal. Frequency, amplitude and phase features of the sinusoidal model are analyzed and used as input features to a stressed speech recognition system. The performances of sinusoidal model features are evaluated for recognition of different stress classes with a vector-quantization classifier and a hidden Markov model classifier. To find the effectiveness of these features for recognition of different emotions in different languages, speech signals are recorded and tested in two languages, Telugu (an Indian language) and English. Average stressed speech index values are proposed for comparing differences between stress classes in a speech signal. Results show that sinusoidal model features are successful in characterizing different stress classes in a speech signal. Sinusoidal features perform better compared to the linear prediction and cepstral features in recognizing the emotions in a speech signal.  相似文献   

14.
Recently, increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. This paper is a survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system. The first one is the choice of suitable features for speech representation. The second issue is the design of an appropriate classification scheme and the third issue is the proper preparation of an emotional speech database for evaluating system performance. Conclusions about the performance and limitations of current speech emotion recognition systems are discussed in the last section of this survey. This section also suggests possible ways of improving speech emotion recognition systems.  相似文献   

15.
While speech recognition technology has long held the potential for improving the effectiveness of military operations, it has only been within the last several years that speech systems have enabled the realization of that potential. Commercial speech recognition technology developments aimed at improving robustness for automotive and cellular phone applications have capabilities that can be exploited in various military systems. This paper discusses the results of two research efforts directed toward applying commercial-off-the-shelf speech recognition technology in the military domain. The first effort discussed is the development and evaluation of a speech recognition interface to the Theater Air Planning system responsible for the generation of air tasking orders in a military Air Operations Center. The second effort examined the utility of speech versus conventional manual input for tasks performed by operators in an unmanned aerial vehicle control station simulator. Both efforts clearly demonstrate the military benefits obtainable from the proper application of speech technology.  相似文献   

16.
Peacocke  R.D. Graf  D.H. 《Computer》1990,23(8):26-33
Five approaches that can be used to control and simplify the speech recognition task are examined. They entail the use of isolated words, speaker-dependent systems, limited vocabulary size, a tightly constrained grammar, and quiet and controlled environmental conditions. The five components of a speech recognition system are described: a speech capture device, a digital signal processing module, preprocessed signal storage, reference speech patterns, and a pattern-matching algorithm. Current speech recognition systems are reviewed and categorized. Speaker recognition approaches and systems are also discussed  相似文献   

17.
言语信息处理的进展   总被引:1,自引:0,他引:1  
该文介绍了言语信息处理的进展,特别提到汉语言语处理的现状。言语信息处理涉及到言语识别、说话人识别、言语合成、言语知觉计算等。带口音和随意发音的言语识别有力的支持了语言学习与口语水平测评等应用;跨信道、环境噪音、多说话人、短语音、时变语音等因素存在的情况下提高识别正确率,是说话人识别的研究热点;言语合成主要关注多语言合成、情感言语合成、可视言语合成等;言语知觉计算开展了言语测听、噪声抑制算法、助听器频响补偿方法、语音信号增强算法等研究。将言语处理技术与语言、网络有效结合,促进了更加和谐的人机言语交互。  相似文献   

18.
周萍  唐李珍 《计算机工程》2011,37(2):169-171
针对短训练语音的说话人识别系统,提出一种基于决策层融合的识别算法。识别时运用经验模式分解法对语音信号进行处理,对获取的固有模态函数分量提取语音特征序列,分别进行匹配,通过决策层融合算法,将所得的匹配结果与传统独立识别结果相结合,最终输出识别结果。利用信号分解的方法,实现待测语音信号的重复识别,同时采用决策层融合算法优化识别结果,从而在短训练语音情况下,使系统的识别率得到保障。实验结果表明,该算法在短训练语音识别系统中的识别效果优于传统方法。  相似文献   

19.
语音识别的顽健性与语音库的建立   总被引:1,自引:0,他引:1  
汉语语音识别在近十几年有很大进展,现今已有一些系统投入实际应用,并初步商品化。但是一些系统的顽健性较差,因而这方面的问题将成为今后语音识别研究的一项主要任务。为此我们建立了一个适用于语音识别顽健性研究的汉语语音库,并详细介绍了它的构成、特点和试验结果等。  相似文献   

20.
Nowadays, several computational techniques for speech recognition have been proposed. These techniques suppose an important improvement in real time applications where speaker interacts with speech recognition systems. Although researchers proposed many methods, none of them solve the high false alarm problem when far-field speakers interfere in a human-machine conversation. This paper presents a two-class (speech and non-speech classes) decision-tree based approach for combining new speech pulse features in a VAD (Voice Activity Detector) for rejecting far-field speech in speech recognition systems. This Decision Tree is applied over the speech pulses obtained by a baseline VAD composed of a frame feature extractor, a HMM-based (Hidden Markov Model) segmentation module and a pulse detector. The paper also presents a detailed analysis of a great amount of features for discriminating between close and far-field speech. The detection error obtained with the proposed VAD is the lowest compared to other well-known VADs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号