多用途汉语方言语音数据库的设计   总被引:1,自引:0,他引:1       下载免费PDF全文
建立了一个多用途汉语方言语音数据库,用于说话人信息处理、方言特征词识别、语音识别等领域的研究。以多通道的方式采集时长106小时的语音数据,包括七种主要的汉语方言区语音,对数据进行预处理。在此基础上提出了汉语方言数据库的设计标准以及实施方案,有助于推动汉语语音库特别是方言语音库的建立。  相似文献   

在连续语音识别系统中,针对复杂环境(包括说话人及环境噪声的多变性)造成训练数据与测试数据不匹配导致语音识别率低下的问题,提出一种基于自适应深度神经网络的语音识别算法。结合改进正则化自适应准则及特征空间的自适应深度神经网络提高数据匹配度;采用融合说话人身份向量i-vector及噪声感知训练克服说话人及环境噪声变化导致的问题,并改进传统深度神经网络输出层的分类函数,以保证类内紧凑、类间分离的特性。通过在TIMIT英文语音数据集和微软中文语音数据集上叠加多种背景噪声进行测试,实验结果表明,相较于目前流行的GMM-HMM和传统DNN语音声学模型,所提算法的识别词错误率分别下降了5.151%和3.113%,在一定程度上提升了模型的泛化性能和鲁棒性。  相似文献   

Action recognition is an important research topic in video analysis that remains very challenging. Effective recognition relies on learning a good representation of both spatial information (for appearance) and temporal information (for motion). These two kinds of information are highly correlated but have quite different properties, leading to unsatisfying results of both connecting independent models (e.g., CNN-LSTM) and direct unbiased co-modeling (e.g., 3DCNN). Besides, a long-lasting tradition on this task with deep learning models is to just use 8 or 16 consecutive frames as input, making it hard to extract discriminative motion features. In this work, we propose a novel network structure called ResLNet (Deep Residual LSTM network), which can take longer inputs (e.g., of 64 frames) and have convolutions collaborate with LSTM more effectively under the residual structure to learn better spatial-temporal representations than ever without the cost of extra computations with the proposed embedded variable stride convolution. The superiority of this proposal and its ablation study are shown on the three most popular benchmark datasets: Kinetics, HMDB51, and UCF101. The proposed network could be adopted for various features, such as RGB and optical flow. Due to the limitation of the computation power of our experiment equipment and the real-time requirement, the proposed network is tested on the RGB only and shows great performance.  相似文献   

小波网络和RBF网络的抗噪语音识别   总被引:1,自引:0,他引:1       下载免费PDF全文
针对目前在噪音环境下语音识别系统性能较差的问题,利用小波神经网络融合了小波变换良好的时频局域化性质和RBF神经网络具有最佳分类能力和辨识能力等特性。构建了一个用小波基替代RBF网络中激活函数的小波-RBF神经网络结构,并采用全监督训练算法,实现了基于小波-RBF网络的抗噪语音识别系统。实验结果表明该系统比RBF网络具有更好的识别效果,尤其在噪声环境下,具有更强的鲁棒性。  相似文献   

针对带噪面罩语音识别率低的问题,结合语音增强算法,对面罩语音进行噪声抑制处理,提高信噪比,在语音增强中提出了一种改进的维纳滤波法,通过谱熵法检测有话帧和无话帧来更新噪声功率谱,同时引入参数控制增益函数;提取面罩语音信号的Mel频率倒谱系数(MFCC)作为特征参数;通过卷积神经网络(CNN)进行训练和识别,并在每个池化层后经局部响应归一化(LRN)进行优化.实验结果表明:该识别系统能够在很大程度上提高带噪面罩语音的识别率.  相似文献   

改进的T-S模糊神经网络在语音识别中的应用   总被引:3,自引:1,他引:3       下载免费PDF全文
给出一种改进的具有四层网络结构的T-S模糊神经网络算法,通过在隶属度上加入一个与输入维数有关的补偿因子,使其能够应用到语音识别系统中,并解决了由输入维数过大而引起的规则灾问题。实验结果表明改进的T-S模糊神经网络能够应用于语音识别系统,同时表明该网络的识别率比RBF网络高,并且鲁棒性较好。  相似文献   

基于RBF神经网络的抗噪语音识别   总被引:1,自引:0,他引:1  
针对目前在噪音环境下语音识别系统性能较差的问题,利用RBF神经网络具有最佳逼近性能、训练速度快等特性,分别采用聚类和全监督训练算法,实现了基于RBF神经网络的抗噪语音识别系统。聚类算法的隐含层训练采用K-均值聚类算法,输出层的学习采用线性最小二乘法;全监督算法中所有参数的调整基于梯度下降法,它是一种有监督学习算法,能够选出性能优良的参数。实验表明,在不同的信噪比下,全监督算法较之聚类算法有更高的识别率。  相似文献   

In this paper, we present an on-line learning neural network model, Dynamic Recognition Neural Network (DRNN), for real-time speech recognition. The property of accumulative learning of the DRNN makes it very suitable for real-time speech recognition with on-line learning. A comparison between the DRNN and Hidden Markov Model (HMM) shows that the computational complexity of the former is lower than that of the latter in both training and recognition. Encouraging results are obtained when the DRNN is tested on a BUPT digit database (Mandarin) and on the on-line learning of twenty isolated English computer command words.  相似文献   

语音情感识别研究进展*   总被引:4,自引:1,他引:4  
首先介绍了语音情感识别系统的组成,重点对情感特征和识别算法的研究现状进行了综述,分析了主要的语音情感特征,阐述了代表性的语音情感识别算法以及混合模型,并对其进行了分析比较。最后,指出了语音情感识别技术的可能发展趋势。  相似文献   

庄志豪  傅洪亮  陶华伟  杨静  谢跃  赵力 《计算机应用研究》2021,38(11):3279-3282,3348
针对不同语料库之间数据分布差异问题,提出一种基于深度自编码器子域自适应的跨库语音情感识别算法.首先,该算法采用两个深度自编码器分别获取源域和目标域表征性强的低维情感特征;然后,利用基于LMMD(local maximum mean discrepancy)的子域自适应模块,实现源域和目标域在不同低维情感类别空间中的特征分布对齐;最后,使用带标签的源域数据进行有监督地训练该模型.在eNTERFACE库为源域、Berlin库为目标域的跨库识别方案中,所提算法的跨库识别准确率相比于其他算法提升了5.26%~19.73%;在Berlin库为源域、eNTERFACE库为目标域的跨库识别方案中,所提算法的跨库识别准确率相比于其他算法提升了7.34%~8.18%.因此,所提方法可以有效地提取不同语料库的共有情感特征并提升了跨库语音情感识别的性能.  相似文献   

The two- or three-layered neural networks (2LNN, 3LNN) which originated from stereovision neural networks are applied to speech recognition. To accommodate sequential data flow, we consider a window through which the new acoustic data enter and from which the final neural activities are output. Inside the window, a recurrent neural network develops neural activity toward a stable point. The process is called winner-take-all (WTA) with cooperation and competition. The resulting neural activities clearly showed recognition of continuous speech of a word. The string of phonemes obtained is compared with reference words by using a dynamic programming method. The resulting recognition rate was 96.7% for 100 words spoken by nine male speakers, compared with 97.9% by a hidden Markov model (HMM) with three states and a single gaussian distribution. These results, which are close to those of HMM, seem important because the architecture of the neural network is very simple, and the number of parameters in the neural net equations is small and fixed. This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January 26–28, 2000  相似文献   

基于LM算法的神经网络语音识别   总被引:2,自引:0,他引:2  
葛玲  贾志成  夏克文  王霞 《计算机工程与设计》2006,27(14):2534-2536,2539
由于语音识别中朵用标准BP算法存在的训练速度慢、容易陷入局部极小等问题,提出一种基于稳定、快速的Levenberg-Marquardt算法的神经网络语音识别方法,主要包括语音信号预处理、特征提取、网络结构优化设计、网络学习训练和语音识别等过程。其中网络隐含层节点数的选取采用黄金分割优选法。试验仿真表明,LM算法明显提高了网络训练速度,减少了训练时间,其效果优越于标准BP算法。  相似文献   

为了解决传统径向基(Radial basis function,RBF)神经网络在语音识别任务中基函数中心值和半径随机初始化的问题,从人脑对语音感知的分层处理机理出发,提出利用大量无标签数据初始化网络参数的无监督预训练方式代替传统随机初始化方法,使用深度自编码网络作为语音识别的声学模型,分析梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)和基于Gammatone听觉滤波器频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)下非特定人小词汇量孤立词的抗噪性能。实验结果表明,深度自编码网络在MFCC特征下较径向基神经网络表现出更优越的抗噪性能;而与经典的MFCC特征相比,GFCC特征在深度自编码网络下平均识别率相对提升1.87%。  相似文献   

大词表连续语音识别系统由多个组件构成,识别错误受多种因素的影响。系统开发者需要分析错误发生的不同原因。根据语音识别的基本理论给出了对错误进行分类分析的原理,将识别错误按错误原因分为解码错误、声学模型错误、语言模型错误、声学和语言复合错误四大类,并对分类后的错误做了统计分析。实验证明,识别错误的分类分析为系统的改进提供了参考依据。  相似文献   

噪声鲁棒语音识别研究综述*   总被引:3,自引:1,他引:2  
针对噪声环境下的语音识别问题,对现有的噪声鲁棒语音识别技术进行讨论,阐述了噪声鲁棒语音识别研究的主要问题,并根据语音识别系统的构成将噪声鲁棒语音识别技术按照信号空间、特征空间和模型空间进行分类总结,分析了各种鲁棒语音识别技术的特点、实现,以及在语音识别中的应用。最后展望了进一步的研究方向。  相似文献   

Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents a Thai spelling analysis to develop a Thai spelling speech recognizer. The Thai phonetic characteristics, alphabet system and spelling methods have been analyzed. As a training resource, two alternative corpora, a small spelling speech corpus and an existing large continuous speech corpus, are used to train hidden Markov models (HMMs). Then their recognition results are compared to each other. To solve the problem of utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed has been taken into account. Two alternative language models, bigram and trigram, are used for investigating performance of spelling speech recognition. Our approach achieves up to 98.0% letter correction rate, 97.9% letter accuracy and 82.8% utterance correction rate when the language model is trained based on trigram and the acoustic model is trained from the small spelling speech corpus with eight Gaussian mixtures.  相似文献   

基于遗传算法和小波神经网络的语音识别研究   总被引:1,自引:0,他引:1  
小波神经网络算法(WNN)易陷入局部极小,收敛速度慢,全局搜索能力弱,而遗传算法(GA)具有高度并行、随机、自适应搜索性能和全局寻优的特点。因此,将遗传算法和小波神经网络结合起来形成一种训练神经网络的混合算法——GA-WNN算法。仿真实验结果表明,该算法有效地缩短了识别时间,提高了网络训练速度和语音的识别率。  相似文献   

柏财通  崔翛龙  郑会吉  李爱 《计算机应用》2022,42(10):3217-3223
针对标注神经网络训练数据的成本日益增加与噪声干扰阻碍语音识别系统性能提升的问题,提出一种基于自监督知识迁移的鲁棒性语音识别模型的模型训练算法。首先,在预处理阶段提取原始语音样本的三个人工特征;然后,在训练阶段将特征提取网络生成的高级特征分别通过三个浅层网络来拟合预处理阶段提取的人工特征;同时,把特征提取前端与语音识别后端进行交叉训练,并合并它们的损失函数;最后,通过梯度反向传播令特征提取网络学会提取更有助于去噪语音识别的高级特征,从而实现人工知识迁移与去噪,并高效利用了训练数据。在军事装备控制的应用场景下,基于加噪后的THCHS-30、希尔贝壳数据集AISHELL-1与ST-CMDS这三个开源中文语音识别数据集以及军事装备控制指令的数据集上进行测试,实验结果表明,基于自监督知识迁移的鲁棒性语音识别模型的模型训练算法词错率可以降低到0.12,不仅可以实现对鲁棒性语音识别模型的模型训练,同时通过自监督知识迁移提高了训练样本的利用率,可完成装备控制任务。  相似文献   

In mandarin all-syllable recognition,many insert errors occur due to the influence of non-consonant syllables.Introducing the duration model into the recognition process is a direct way to lessen these errors.But that usually could not work well as expected,for the duration is sensitive to speech rate.Hence,aiming at this problem,a novel context dependent duration distribution normalized by speech rate is proposed in this paper and applied to a speech recognition system based on the frame of improved Hidden Markov Model (HMM).To realize this algorithm,the authors employ a new method to estimate the speech rate of a sentence; then compute the duration probability combined with speech rate;and finally implement this duration information in the post-processing stage.With little change in the recognition process and resource demand,the duration model is adopted efficiently in the system.The experimental results indicate that the syllable error rates decrease significantly in two different speech corpora.Especially for the insertions,the error rates reduce about sixty to eighty percent.  相似文献   

基于高斯混合模型的汉语方言辨识系统   总被引:1,自引:0,他引:1  
建立了一个基于高斯混合模型的汉语方言辨识系统,并给出了模型参数的估计方法,讨论了特征参数和高斯混合数对系统辨识的影响,实验结果表明,系统对同一省内的三种不同方言的辨识率平均可以达到84.17%。  相似文献   

