首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
张永锋  田勇  张阳 《声学技术》2015,34(1):51-53
抗噪连续语音识别是当前汉语连续语音识别的重要研究领域。采用通过度量连续语音帧之间频谱的稳定性,将连续语音切分成份,再将切分结果(无论时间长短)变换为与时间无关的大小固定的频谱空间特征,通过与模板库进行比较实现语音识别。新的频谱空间特征,与语音时长无关,同时表现出较好的抗噪声能力。在特定人连续语音识别测试系统中,取得了不错的识别效果。  相似文献   

2.
N USHA RANI  P N GIRIJA 《Sadhana》2012,37(6):747-761
Speech is one of the most important communication channels among the people. Speech Recognition occupies a prominent place in communication between the humans and machine. Several factors affect the accuracy of the speech recognition system. Much effort was involved to increase the accuracy of the speech recognition system, still erroneous output is generating in current speech recognition systems. Telugu language is one of the most widely spoken south Indian languages. In the proposed Telugu speech recognition system, errors obtained from decoder are analysed to improve the performance of the speech recognition system. Static pronunciation dictionary plays a key role in the speech recognition accuracy. Modification should be performed in the dictionary, which is used in the decoder of the speech recognition system. This modification reduces the number of the confusion pairs which improves the performance of the speech recognition system. Language model scores are also varied with this modification. Hit rate is considerably increased during this modification and false alarms have been changing during the modification of the pronunciation dictionary. Variations are observed in different error measures such as F-measures, error-rate and Word Error Rate (WER) by application of the proposed method.  相似文献   

3.
Abstract

Mandarin Chinese is a tonal language, in which every syllable is assigned a tone that has a lexical meaning. Therefore tone recognition is very important for Mandarin speech. This paper presents a method for continuous speech tone recognition. Context‐dependent discrete hidden Markov models (HMM's) are used taking into account the tones of the syllables on both sides, and special efforts were made in selecting the minimum number of key context‐dependent models considering the characteristics of the tones. The results indicate that a total of 23 context‐dependent models have very good potential to describe the complicated tone behavior for all 175 possible tone concatenation conditions in continuous speech, such that the required training data can be reduced to a minimum and the recognition process can be simplified significantly. The best achievable recognition rate is 83.55 %.  相似文献   

4.
Automatic recognition of human emotions in a continuous dialog model remains challenging where a speaker’s utterance includes several sentences that may not always carry a single emotion. Limited work with standalone speech emotion recognition (SER) systems proposed for continuous speech only has been reported. In the recent decade, various effective SER systems have been proposed for discrete speech, i.e., short speech phrases. It would be more helpful if these systems could also recognize emotions from continuous speech. However, if these systems are applied directly to test emotions from continuous speech, emotion recognition performance would not be similar to that achieved for discrete speech due to the mismatch between training data (from training speech) and testing data (from continuous speech). The problem may possibly be resolved if an existing SER system for discrete speech is enhanced. Thus, in this work the author’s existing effective SER system for multilingual and mixed-lingual discrete speech is enhanced by enriching the cepstral speech feature set with bi-spectral speech features and a unique functional set of Mel frequency cepstral coefficient features derived from a sine filter bank. Data augmentation is applied to combat skewness of the SER system toward certain emotions. Classification using random forest is performed. This enhanced SER system is used to predict emotions from continuous speech with a uniform segmentation method. Due to data scarcity, several audio samples of discrete speech from the SAVEE database that has recordings in a universal language, i.e., English, are concatenated resulting in multi-emotional speech samples. Anger, fear, sad, and neutral emotions, which are vital during the initial investigation of mentally disordered individuals, are selected to build six categories of multi-emotional samples. Experimental results demonstrate the suitability of the proposed method for recognizing emotions from continuous speech as well as from discrete speech.  相似文献   

5.
Abstract

This paper presents a semi‐automatic phonetic labeling method for processing in the MAT (Mandarin across Taiwan) speech database. MAT speech data are collected through the telephone networks. Each utterance has been transcribed into Chinese characters and Pinyin symbols. The proposed phonetic labeling method will mark the syllable and sub‐syllable boundaries in an utterance. Phonetic symbols are assigned to each segmented syllable. The segmentation process is accomplished by using hidden Markov modeling (HMM) and Viterbi decoding. The accuracy of syllable segmentation is detected by measuring the syllable length and the distance of a syllable from its state models. The experimental results show that the proposed labeling method can achieve segmentation accuracy around 90% for an allowed tolerance of 16 ms.  相似文献   

6.
7.
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer for Malayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.  相似文献   

8.
9.
描述了用自然语言控制空间遥控机械手执行远程操作的自然语言接口系统。该系统以孤立词语音识别为核心,着重于系统的可靠性和实用性。为此,在系统的开发过程中,考虑了人类语音交互的听觉感知特点、汉语的一字一音节特点、具体识识时语音的声学模型不必与其语言学模型严格一致的特点以及环境噪声对系统性能的影响等,提出以过渡段+韵母段作为识别基元,采取多层识别策略进行识别。识别实验与仿真实验结果表明,系统达到了预期性能  相似文献   

10.
李涛  曹辉  郭乐乐 《声学技术》2018,37(4):367-371
为了提升连续语音识别系统性能,将深度自编码器神经网络应用于语音信号特征提取。通过堆叠稀疏自编码器组成深度自编码器(Deep Auto-Encoding,DAE),经过预训练和微调两个步骤提取语音信号的本质特征,使用与上下文相关的三音素模型,以音素错误率大小为系统性能的评判标准。仿真结果表明相对于传统梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)特征以及优化后的MFCC特征,基于深度自编码器提取的深度特征更具优越性。  相似文献   

11.
一种新的潜在语义分析语言模型   总被引:1,自引:0,他引:1  
提出了基于聚类的方法实现词的快速量化表示,并由此导出潜在语义分析语言模型预测置信度,同时运用新提出的几何加权静态插值方式同三元文法模型相结合,构建了一种新的潜在语义分析语言模型,并将其应用于汉语语音识别。实验表明其效率和性能均优于传统基于奇异值分解的潜在语义分析语言模型,相比于三元文法模型,识别错误率相对下降为3.6%~7.1%左右,并为有效量化表示词对进一步提高潜在语义分析语言模型性能提供了新的途径。  相似文献   

12.
13.
针对经典的向量空间检索模型直接用于基于音节lattice形式的汉语语音检索存在无法有效区分lattice中包含的正确音节识别候选和错误的识别候选以及不能充分利用lattice中所蕴含的各层级信息的不足,提出了一种基于语音文档邻接音节后验概率矩阵的检索方法.该方法以该矩阵作为文档索引,并计算查询请求被包含在语音文档中的后验概率,并以此来度量查询请求和语音文档间的相关度.后验概率作为可靠的置信测度能够有效区分正确和错误的音节候选,在lattice中后验概率的计算能够充分地利用语音识别结果中的多层级的信息.语音检索实验表明,与基于向量空间模型的检索方法相比,该方法的检索性能有显著提高,是一种适用于汉语语音检索的有效方法.  相似文献   

14.
沈彩凤  俞一彪 《声学技术》2013,32(4):305-311
提出一种新的连续语音的声调评测算法,该算法可应用于计算机辅助语言学习系统和普通话水平测试中的声调评测。考虑到连续语音声调受上下文之间的相互影响,采用三音节单元建立高斯混合模型(Gaussian Mixture Model, GMM),三音节中辅音部分用Spline插值法拟合声调曲线来反映音节间基音频率的转移信息,并利用Fujisaki模型去除语句的语调和说话人个性特征,只对基频曲线中的声调特征建模。实验结果显示,相比于传统方法,采用三音节Spline插值和Fujisaki改进特征的方法使得机器与人工打分的相似度在测试集中分别提高了8.75%和14.09%。  相似文献   

15.
口腔运动与人们的饮食规律息息相关,该文通过对口腔运动状态的分析识别来监测人们的饮食规律,以此来指导人们的饮食习惯。借助语音识别技术的思想和方法,分析识别口腔运动产生的骨导音,为提升识别效率,采用了传统的隐马尔可夫模型。基于隐马尔可夫模型建立了一套骨导音识别系统,在进行骨导音识别之前,通过分帧加窗、提取梅尔频率倒谱系数,对其进行模型训练;在识别过程中,找出与待测音频信号和模板库中匹配度最高的模型,以其模型输出结果作为最后的识别结果。该方法的识别结果可以达到 84%,实验结果表明该方法具有一定的可行性。  相似文献   

16.
Abstract

This paper presents a novel framework for voice conversion based on sub‐syllable spectral block clustering transformation functions. The speech signal is first transferred to a spectrum by Fast Fourier transform. A sonority measure is used to extract sub‐syllable segments from input utterances by computing the energy concentration measure among frequency components. According to the syllable structure of Mandarin, Hidden Markov Model based syllable clustering is used to deal with the variety among different syllables. Dynamic programming is applied to align the spectral blocks of the parallel corpus to constrain the mapping between the spectral unit of the source speaker and that of the listener speaker under the constraint that mapped unities should be constrained to the same sub‐syllable and sub‐band in the Mel‐scale filter bank. A content based image retrieval algorithm is employed to find the target spectral block in the transformation phase. This paper illustrates voice conversion by spectral block transformation that transfers the speech signal of the source speaker to that of the listener. Experimental results show that the proposed method is effective in voice conversion, and the discrimination with regard to speaker identification is better than with traditional approaches. However, there remain additional noises, especially in high frequency components, which reduce the signal quality carried in the transformation phase, due to the fact that speech is not smooth.  相似文献   

17.
针对语音情感识别中无法对关键的时空依赖关系进行建模,导致识别率低的问题,提出一种基于自身注意力(self-attention)时空特征的语音情感识别算法,利用双线性卷积神经网络、长短期记忆网络和多组注意力(multi-head attention)机制去自动学习语音信号的最佳时空表征.首先提取语音信号的对数梅尔(log...  相似文献   

18.
在藏文音节字构件中基字占有重要的地位,只要准确的识别出基字的位置,可通过音节字的长度来获得其他构件,并以基字为核心来判断组成音节宇其他构件的搭配是否符合规则。本文通过研究藏文音节字构件组合语法,提出了一种基于基字识别的藏文音节字检错算法。该方法不使用藏文音节字词典及大规模语料库的支撑。通过实验,该方法自动检错藏文音节字的正确率为97%。  相似文献   

19.
This paper presents a skull stripping method to segment the brain from MRI human head scans using multi-seeded region growing technique. The proposed method has two stages. In Stage-1, the brain in the middle slice is segmented, the brains in the remaining slices are segmented in Stage-2. In each stage, the proposed method is required to identify the rough brain mask. The fine brain region in the rough brain mask is segmented using multi-seeded region growing approach. The proposed method uses multiple seed points which are selected automatically based on the intensity profile of grey matter (GM), white matter (WM) and cerebrospinal fluid (CSF) of the brain image. The proposed brain segmentation method using multi-seeded region growing (BSMRG) was validated using 100 volumes of T1, T2 and PD-weighted MR brain images obtained from Internet Brain Segmentation Repository (IBSR), LONI and Whole Brain Atlas (WBA). The best Dice (D) value of 0·971 and Jaccard (J) value of 0·944 were recorded by the proposed BSMRG method on IBSR dataset. For LONI dataset, the best values of D?=?0·979 and J?=?0·960 were obtained for the sagittal oriented images by the proposed method. The performance consistency of the proposed method was tested on the brain images of all types and orientation and have and produced better and stable results than the existing methods Brain Extraction Tool (BET), Brain Surface Extraction (BSE), Watershed Algorithm (WAT), Hybrid Watershed Algorithm (HWA) and Skull Stripping using Graph Cuts (GCUT).  相似文献   

20.
This paper presents a handwritten document recognition system based on the convolutional neural network technique. In today’s world, handwritten document recognition is rapidly attaining the attention of researchers due to its promising behavior as assisting technology for visually impaired users. This technology is also helpful for the automatic data entry system. In the proposed system prepared a dataset of English language handwritten character images. The proposed system has been trained for the large set of sample data and tested on the sample images of user-defined handwritten documents. In this research, multiple experiments get very worthy recognition results. The proposed system will first perform image pre-processing stages to prepare data for training using a convolutional neural network. After this processing, the input document is segmented using line, word and character segmentation. The proposed system get the accuracy during the character segmentation up to 86%. Then these segmented characters are sent to a convolutional neural network for their recognition. The recognition and segmentation technique proposed in this paper is providing the most acceptable accurate results on a given dataset. The proposed work approaches to the accuracy of the result during convolutional neural network training up to 93%, and for validation that accuracy slightly decreases with 90.42%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号