首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The use of a speech recognition system with telephone channel environments, or different microphones, requires channel equalisation. In speech recognition, the speech model provides a bank of statistical information that can be used in the channel identification and equalisation process. The authors consider HMM-based channel equalisation, and present results demonstrating that substantial improvement can be obtained through the equalisation process. An alternative method, for speech recognition, is to use a feature set which is more robust to channel distortion. Channel distortions result in an amplitude tilt of the speech cepstrum, and therefore differential cepstral features provide a measure of immunity to channel distortions. In particular the cepstral-time feature matrix, in addition to providing a framework for representing speech dynamics, can be made robust to channel distortions. The authors present results demonstrating that a major advantage of cepstral-time matrices is their channel insensitive character  相似文献   

2.
孙林慧  张蒙  梁文清 《信号处理》2022,38(12):2519-2531
实际语音分离时,混合语音的说话人性别组合相关信息往往是未知的。若直接在普适的模型上进行分离,语音分离效果欠佳。为了更好地进行语音分离,本文提出一种基于卷积神经网络-支持向量机(CNN-SVM)的性别组合判别模型,来确定混合语音的两个说话人是男-男、男-女还是女-女组合,以便选用相应性别组合的分离模型进行语音分离。为了弥补传统单一特征表征性别组合信息不足的问题,本文提出一种挖掘深度融合特征的策略,使分类特征包含更多性别组合类别的信息。本文的基于CNN-SVM性别组合分类的单通道语音分离方法,首先使用卷积神经网络挖掘梅尔频率倒谱系数和滤波器组特征的深度特征,融合这两种深度特征作为性别组合的分类特征,然后利用支持向量机对混合语音性别组合进行识别,最后选择对应性别组合的深度神经网络/卷积神经网络(DNN/CNN)模型进行语音分离。实验结果表明,与传统的单一特征相比,本文所提的深度融合特征可以有效提高混合语音性别组合的识别率;本文所提的语音分离方法在主观语音质量评估(PESQ)、短时客观可懂度(STOI)、信号失真比(SDR)指标上均优于普适的语音分离模型。  相似文献   

3.
李洪伟  马琳  李海峰 《信号处理》2023,39(4):639-648
语音是人类表达思想和感情交流最重要的工具,是人类文化的重要组成部分。语音情感识别作为情感计算中的重要课题已经成为国际上的研究热点,受到越来越多的关注。已有神经科学研究表明,大脑是产生调节情感的物质基础。因此,在语音情感的研究中,我们不能仅考虑语音信号自身,还应将大脑的活动信号融入语音情感识别中,以实现更高准确率的情感识别。基于上述思想,本文提出了一种基于核典型相关分析(KCCA)的语音特征提取方法。该方法将语音特征与脑电图(EEG)特征映射到高维希尔伯特空间,并计算二者的最大相关系数。KCCA将语音特征在高维希尔伯特空间上向与脑电特征相关性最大的方向投影,最终得到包含脑电信息的语音特征。本文方法将与语音情感相关的脑电信息融入语音情感特征提取中,所提特征能够更准确的表征情感。同时,本方法在理论上具有良好的可迁移性,当所提脑电特征足够准确与具有代表性时,KCCA建模得到的投影向量具有通用性,可直接用于新的语音情感数据集中而无需重新采集和计算相应的脑电信号。在自建语音情感数据库与公开语音情感数据库MSP-IMPROV上的实验结果表明,使用投影语音特征进行语音情感分类的方法优于使用原始音频特征...  相似文献   

4.
The speech signal is decomposed through adapted local trigonometric transforms. The decomposed signal is classified by M uniform sub-bands for each subinterval. The energy of each sub-band is used as a speech feature. This feature is applied to vector quantisation and the hidden Markov model. The new speech feature shows a slightly better recognition rate than the cepstrum for speaker independent speech recognition. The new speech feature also shows a lower standard deviation between speakers than does the cepstrum  相似文献   

5.
语音的基频(也称音高、基音周期或F0)及其变化规律是语音信号的一个重要特征,在语音情绪识别、声纹识别中有重要的应用。而语音基频的提取一直是语音信号处理中的难点,这也是语音基频特征未能广泛应用于语音识别等应用的重要原因,因此准确高效的提取音高在语音信号处理中能够有重要的意义。本文基于归一化自相关函数,结合倒频谱方法,提取了一种改进的基于归一化自相关的语音基频提取算法,实验证明该方法在基频提取中取得了较好的结果。  相似文献   

6.
一种基于非线性特征的应力影响下变异语音识别方法   总被引:2,自引:1,他引:1  
王玉伟  张磊  韩纪庆 《信号处理》2002,18(5):484-486
考虑到变异语音产生的非线性特点,本文提出了一种基于TEO能量算子倒谱特征的应力影响下变异语音识别方法。先将语音信号分割成21个不同频带的信号,然后计算TEO能量,最后进行对数运算和离散余弦变换。对航空模拟飞行器中采集的小词表特定人的识别实验,采用非线性分析的基于TEO能量算子倒谱特征的方法,能有效地提高变异语音的识别性能,比传统的基于MFCC特征的方法识别率提高了11.3%。  相似文献   

7.
基于语音静音段特征的手机来源识别方法   总被引:1,自引:0,他引:1  
手机来源识别已成为多媒体取证领域重要的热点问题.提出了一种基于语音静音段特征的手机来源识别方法,该方法先通过使用自适应端点检测算法得到语音的静音段;然后将静音段的梅尔频谱系数(MFC)的均值作为分类特征;最后结合WEKA平台的CfsSubsetEval评价函数按照最佳优先(BestFirst)搜索进行特征选择,并采用支持向量机(SVM)对手机来源进行识别.实验部分对23款主流型号的手机进行了分类,结果表明所提特征具有较好的分类性能,在TIMIT数据库和自建的CKC-SD数据库上,平均识别准确率分别为99.23%和99.00%.另外,与语音段MFC特征和梅尔倒谱系数(MFCC)特征进行了对比,实验结果证明所提特征具有更加优越的性能.  相似文献   

8.
基于倒谱特征的带噪语音端点检测   总被引:44,自引:0,他引:44       下载免费PDF全文
胡光锐  韦晓东 《电子学报》2000,28(10):95-97
在语音识别系统中产生错误识别的原因之一是端点检测有误差.在高信噪比情况下,正确地确定语音的端点并不困难.然而,大多数实际的语音识别系统需工作在低信噪比情况下,一些常规的端点检测方法,例如基于能量的端点检测方法在噪声环境下不能有效地工作.本文利用倒谱特征来检测语音端点,提出了带噪语音端点检测的两个算法,第一个算法利用倒谱距离代替短时能量作为判决的门限,第二个算法改进了基于隐马尔柯夫模型(HMM)的语音检测以适应噪声的变化,实验结果表明本方法可得到高正确率的带噪语音端点检测.  相似文献   

9.
一种语音特征参数子分量分析与有效性评价的新方法   总被引:2,自引:0,他引:2  
语音信号中包含语义和说话人个性两大特征,其有效提取和强化对语音识别和说话人识别有着非常重要的意义。本文提出了一种语音特征参数中语义和个性特征子分量分析与有效性评价的4S方法,对语义和个性特征的成份比例进行分析,并通过量化指标评判特征参数对语音识别和说话人识别的有效性。运用4S分析方法对目前常用的特征参数LPC, LPCC和MFCC的子分量分析与有效性评价结果表明,所有的特征参数都更多地包含了语义特征信息,语义特征和说话人个性特征的成份比例因子LIR分别为1.30、1.44和1.61,并且,三种参数对语音识别和说话人识别的有效性均呈现出依次提高的特性。  相似文献   

10.
This letter proposes an effective and robust speech feature extraction method based on statistical analysis of Pitch Frequency Distributions (PFD) for speaker identification. Compared with the conventional cepstrum, PFD is relatively insensitive to Additive White Gaussian Noise (AWGN), but it does not show good performance for speaker identification, even if under clean environments. To compensate this shortcoming, PFD and conventional cepstrum are combined to make the ultimate decision, instead of simply taking one kind of features into account. Experimental results indicate that the hybrid approach can give outstanding improvement for text-independent speaker identification under noisy environments corrupted by AWGN.  相似文献   

11.
In this article we have reviewed a wide variety of techniques based on the identification of missing spectral features that have proved effective in reducing the error rates of automatic speech recognition systems. These approaches have been conspicuously effective in ameliorating the effects of transient maskers such as impulsive noise or background music. We described two broad classes of missing feature algorithms: feature-vector imputation algorithms (which restore unreliable components of incoming feature vectors) and classifier-modification algorithms (which dynamically reconfigure the classifier itself to cope with the effects of unreliable feature components). We reviewed the mathematics of four major missing feature techniques: the feature-imputation techniques of cluster-based reconstruction and covariance-based reconstruction, and the classifier-modification methods of class-conditional imputation and marginalization. We also discussed the ways in which the common feature extraction procedures of cepstral analysis, temporal-difference features, and mean subtraction can be handled by speech recognition systems that make use of missing feature techniques. We concluded with a discussion of a small number of selected experimental results. These results confirm the effectiveness of all types of missing feature approaches discussed in ameliorating the effects of both stationary and transient noise, as well as the particular effectiveness of both soft masks and fragment decoding.  相似文献   

12.
In this paper, a low-power, low-voltage speech processing system is presented. The system is intended to he used in remote speech recognition applications where feature extraction is performed on terminal and high-complexity recognition tasks and moved to a remote server accessed through a radio link. The proposed system is based on a CMOS feature extraction chip for speech recognition that computes 15 cepstrum parameters, each 8 ms, and dissipates 30 μW at 0.9-V supply. Single-cell battery operation is achieved. Processing relies on a novel feature extraction algorithm using 1-bit A/D conversion of the input speech signal. The chip has been implemented as a gate array in a standard 0.5-μm, three-metal CMOS technology. The average energy required to process a single word of the TI46 speech corpus is 10 μJ. It achieves recognition rates over 98% in isolated-word speech recognition tasks  相似文献   

13.

Majority of the automatic speech recognition systems (ASR) are trained with neutral speech and the performance of these systems are affected due to the presence of emotional content in the speech. The recognition of these emotions in human speech is considered to be the crucial aspect of human-machine interaction. The combined spectral and differenced prosody features are considered for the task of the emotion recognition in the first stage. The task of emotion recognition does not serve the sole purpose of improvement in the performance of an ASR system. Based on the recognized emotions from the input speech, the corresponding adapted emotive ASR model is selected for the evaluation in the second stage. This adapted emotive ASR model is built using the existing neutral and synthetically generated emotive speech using prosody modification method. In this work, the importance of emotion recognition block at the front-end along with the emotive speech adaptation to the ASR system models were studied. The speech samples from IIIT-H Telugu speech corpus were considered for building the large vocabulary ASR systems. The emotional speech samples from IITKGP-SESC Telugu corpus were used for the evaluation. The adapted emotive speech models have yielded better performance over the existing neutral speech models.

  相似文献   

14.
一种带噪语音信号端点检测方法研究   总被引:2,自引:1,他引:1  
端点检测是语音识别中的一个重要环节.当信噪比较低时,传统的端点检测方法不能有效的工作,影响了系统的识别率.为此,本文提出了一种更有效的端点检测算法--基于LPC美尔倒谱特征的端点检测方法,它是基于LPC距离方法的一种改进.实验证明,该算法在低信噪比的情况下,能够准确的检测出语音信号.通过对三种不同的端点检测算法的比较,证明了基于LPC美尔倒谱特征算法的检测正确率较高.  相似文献   

15.
语音信号去混响原理与技术   总被引:1,自引:0,他引:1  
语音信号去混响技术在通信、语言识别等方面有重要应用。介绍了国内外相关研究动态和方法,阐述了声音混响过程和倒谱法去混响原理,简要介绍了传声器阵列-倒谱法去混响技术。  相似文献   

16.
语音信号互信息估计的非线性搜索算法及识别应用   总被引:6,自引:0,他引:6  
基于互信息理论的语音识别方法不仅考虑了语音信号的时变分布特征,并且考虑了语音信号的统计分布特征,能有效地提高同类模式的凝聚度,减少非同类模式间的耦合性,在语音识别实验和实际应用中反映出良好的识别精度和很高的运行效率,与其它方法相比更适合嵌入式系统的语音识别应用。本文提出了一种互信息估计的非线性搜索算法,这一算法能够有效地处理语音信号时变分布特征的非线性波动,进一步提高语音模式互信息匹配的精度。  相似文献   

17.
以使用嵌入武操作系统PocketPC的个人数字助理(PDA)为实验平台研究了基于非特定人语音命令识别的可定制声控拨号器。针对PDA存绪空问和运算能力的限制,在保证性能的前提下从严格控制搜索空间和提高解码速度出发,提出了结合搜索路径分数差值实时调整剪枝宽度的动态调整直方图剪技策略,提出了利用速查表加速似然计算的方法,并在通过实验验证舌采用较少维数的特征、结合扩展声韵母进行声学建模等措施,有效地解决了上述问题.在实际PDA设备上实验表明,在词表大小为200个人名时,识别正确率达98.70%,而识别速度比采用标准算法的参考系统提高了约80倍,同时节省了约30%搜索存储空间。  相似文献   

18.
简志华  杨震 《信号处理》2007,23(3):383-387
本文提出了一种改进的倒谱域特征参数补偿算法GMCSM。根据语音信号的时变特性,GMCSM算法使用广义自回归条件异方差(Generalized Auto-Regressive Conditional Heteroscedasticity,GARCH)模型对语音信号的方差进行建模。实验数据表明,与常规倒谱相减法CSM和MEMCSM相比,GMCSM能够更有效地补偿因加性噪声引起的倒谱特征参数失真,减少识别的错误率,特别是在信噪比较低的情况下,GMCSM的性能更为显著。  相似文献   

19.
建立了一种基于自组织神经网络的语音识别系统。对语音信号进行了预处理,提取了语音信号的线性预测系数、线性预测倒谱系数和Mel倒谱特征系数,建立了基于自组织神经网络的识别判决模型。深入分析和改进了自组织神经网络的分类聚类能力,通过加强训练和设定阈值函数的方法,有效地确定了边界神经元的归属,划分出了合理的输出模式类。验证了自组织神经网络适合于处理孤立词语音识别,并具有快速性和结构简单等特征。MATLAB仿真实验表明,语音识别率达到96%。  相似文献   

20.
Video action recognition is an important topic in computer vision tasks. Most of the existing methods use CNN-based models, and multiple modalities of image features are captured from the videos, such as static frames, dynamic images, and optical flow features. However, these mainstream features contain much static information including object and background information, where the motion information of the action itself is not distinguished and strengthened. In this work, a new kind of motion feature is proposed without static information for video action recognition. We propose a quantization of motion network based on the bag-of-feature method to learn significant and discriminative motion features. In the learned feature map, the object and background information is filtered out, even if the background is moving in the video. Therefore, the motion feature is complementary to the static image feature and the static information in the dynamic image and optical flow. A multi-stream classifier is built with the proposed motion feature and other features, and the performance of action recognition is enhanced comparing to other state-of-the-art methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号