首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
语音音质评价从主体上可分为主观评价和客观评价两种。由于音质好坏最终取决于人的主观感受,所以在语音系统中主要采用主观评价的方法。但是这种方法费时费力,同时受到测试条件和测试人员主观因素的影响,降低了测试结果的可靠性。针对上述缺点,设计了一种客观评价语音音质的测试设备实现方案,该测试设备基于E1接口,采用ITU-T P.862 PESQ算法模型。  相似文献   

2.
本文采用MFCC和COH两种客观失真测度,对语音通信的干扰效果进行了客观评价研究。结果表明:两种客观测度分别在特定环境下应用具有良好的主、客观相关性。因此,可以得出结论:这两种客观测度用于语音通信干扰效果的评价具有一定的有效性。  相似文献   

3.
本文提出了一种新的基于GMM和非均匀线性预测倒谱系数(NLPC)的客观语,音质量评估方法.首先,通过Bark双线性变换(BBT)对线性频谱进行频谱弯折,弯折后的频谱符合人耳听觉感知的非均匀特性.然后通过对非均匀谱的线性预测计算出NLPC.提取参考语音的NLPC用来对高斯混合模型进行训练.通过训练对参考语音建立参考模型.由参考模型和失真语音的NLPC向量可以得到它们之间的一致性测度.最后,通过多元自适应回归样条函数建立主观MOS分和一致性测度之间的映射关系,可以得到对MOS分的客观预测模型.通过这一模型进行语音质量的客观评价.实验表明,提出算法的性能要好于ITU-T P.563标准中的算法.  相似文献   

4.
语音干扰效果客观评估模板优化分析   总被引:1,自引:0,他引:1  
语音通信质量客观评估方法是当前声学技术研究领域的一个重要课题,在介绍语音音质评估一般方法的基础上,首先利用最小二乘法建立了基于巴克谱测度的语音通信质量评估模板,通过对该模板进行主、客观分析,引入了BP网络建模方法,大大减小了以往的最小二乘法在语音干扰效果客观评估建模中的误差,确立了基于BP神经网络的语音干扰效果评估模型,并通过实验数据加以验证.  相似文献   

5.
鄢田云  云霞  靳蕃  朱庆军 《电子学报》2004,32(8):1282-1285
针对汉语连续语音,本文提出了采用径向基函数神经网络(RBFNNs),对基于输出的语音质量进行客观评价的一种新方法——RBFOBSQ(Output-Based Speech Quality Using RBFNNs).该方法采用Mel倒谱对语音系统输出端的待测语音信号进行特征参数提取,然后通过RBF神经网络完成特征参数到主观评价MOS分的非线性映射,其映射值即为仅依赖于输出的客观音质评价结果,其与主观评价MOS分的相关度,当采用训练集样本时达到0.92以上,而采用测试集样本时达到0.88以上.  相似文献   

6.
基于流特性和真值程度的VoIP语音质量单端客观评价   总被引:1,自引:0,他引:1  
成卫青  龚俭  丁伟 《通信学报》2008,29(4):30-39
提出了一种仅利用IP流特性预测VoIP感知服务质量(PQoS)的非侵入单端客观评价方法--FSPAV,其关键是定义了3个与用户感知相关的流特性测度.不需要同步时钟或解析应用协议,仅需要监测用户主机接收到的包含对端用户语音数据的IP分组计算测度.使用个体真值程度度量,将通话片段的3个测度测量值映射成一个通话质量客观评价值,计算过程中还能得出每个特性的优劣程度.在互联网上利用VoIP软件QQ和Skype进行多次语音通话实验,实验结果显示主、客观评价值之间具有相当高的相关性,表明了本方法的有效性.  相似文献   

7.
在通信干扰效果客观评价中,一般采用美尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)、线性预测倒谱系数(Linear Predictin Cepstrum Coefficient,LPCC)等客观测度表示通信受干扰程度,但存在各种测度鲁棒性差的问题,即在某些条件下一种客观测度有效,而在某些条件下可能完全失效。针对这一特点,采用随机森林(Random Forest,RF)对性能较好的多种客观测度进行融合,形成新的评价系统,以与主观评价拟合的一致性为标准,衡量评价系统的性能优劣。用超短波语音通信干扰的实测数据对新的评价系统进行验证,结果表明其具有比单一客观测度更好的性能,并可以通过随机选择训练样本以及随机选择每一个样本的特征维,有效避免过拟合现象。  相似文献   

8.
语音质量客观评价的一步策略   总被引:6,自引:0,他引:6       下载免费PDF全文
付强  易克初  田斌  张知易 《电子学报》2001,29(7):885-887
本文提出了一种基于一步策略的语音质量客观评价新方法.它利用前向神经网络的多维非线性映射原理,将传统的包含平均失真测度计算和平均失真测度到MOS的映射这两个步骤合为一步来实现.其特点是可以充分地反映人耳听觉系统的感知特性,且计算简便.在统计学意义上它还是MOS的一致估计.主客观评价结果的相关性实验表明,一步法的相关系数可达到0.95左右,明显优于两步法.  相似文献   

9.
陈静  赵凌伟 《无线电工程》2012,42(10):13-15,19
对小波变换原理进行简要分析的基础上,对小波变换与小波美倒谱(MFCC)方法相结合进行语音客观音质评价的方法进行了研究,给出了小波美倒谱语音音质评估原理及计算流程。使用MFCC方法和小波美倒谱算法分别计算原始语音文件与受扰语音文件的失真距离,并将失真距离与主观评测结果进行相关分析,得出相应算法的相关系数和方差值,通过对比表明小波美倒谱语音评估方法对于以倒谱域参数为基础的客观评价方法有很大改善。  相似文献   

10.
言语可懂度是语言声厅堂音质评价的重要指标,工程实践中常用言语可懂度客观评价参量对厅堂音质进行预测和评价.当前应用最为广泛的言语可懂度客观评价参量为语音传输指数(Speech Transmission Index,STI)和言语可懂度指数(Speech Intelligibility Index,SII),二者是基于单耳听闻或双耳等响听闻模式开发的单通道模型,没有考虑双耳听闻对言语可懂度的优势.因此,首先对单通道言语可懂度客观评价参量及其局限性进行概述,其次综述双耳言语可懂度的影响因素,分析3类双耳言语可懂度客观评价参量的优点和局限性,最后对当前双耳言语可懂度客观评价参量应用仍需解决的问题进行分析.  相似文献   

11.
As the use of voice response systems employing synthetic speech becomes more widespread in consumer products, industrial and military applications, and aids for the handicapped, it will be necessary to develop reliable methods of comparing different synthesis systems and of assessing how human observers perceive and respond to the speech generated by these systems. The selection of a specific voice response system for a particular application depends on a wide variety of factors only one of which is the inherent intelligibility of the speech generated by the synthesis routines. In this paper, we describe the results of several studies that applied measures of phoneme intelligibility, word recognition, and comprehension to assess the perception of synthetic speech. Several techniques were used to compare performance of different synthesis systems with natural speech and to learn more about how humans perceive synthetic speech generated by rule. Our findings suggest that the perception of synthetic speech depends on an interaction of several factors including the acoustic-phonetic properties of the speech signal, the requirements of the perceptual task, and the previous experience of the listener. Differences in perception between natural speech and high-quality synthetic speech appear to be related to the redundancy of the acoustic-phonetic information encoded in the speech signal.  相似文献   

12.
There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.  相似文献   

13.
Neural networks for statistical recognition of continuous speech   总被引:4,自引:0,他引:4  
In recent years there has been a significant body of work, both theoretical and experimental, that has established the viability of artificial neural networks (ANN's) as a useful technology for speech recognition. It has been shown that neural networks can be used to augment speech recognizers whose underlying structure is essentially that of hidden Markov models (HMM's). In particular, we have demonstrated that fairly simple layered structures, which we lately have termed big dumb neural networks (BDNN's), can be discriminatively trained to estimate emission probabilities for an HMM. Recently simple speech recognition systems (using context-independent phone models) based on this approach have been proved on controlled tests, to be both effective in terms of accuracy (i.e., comparable or better than equivalent state-of-the-art systems) and efficient in terms of CPU and memory run-time requirements. Research is continuing on extending these results to somewhat more complex systems. In this paper, we first give a brief overview of automatic speech recognition (ASR) and statistical pattern recognition in general. We also include a very brief review of HMM's, and then describe the use of ANN's as statistical estimators. We then review the basic principles of our hybrid HMM/ANN approach and describe some experiments. We discuss some current research topics, including new theoretical developments in training ANN's to maximize the posterior probabilities of the correct models for speech utterances. We also discuss some issues of system resources required for training and recognition. Finally, we conclude with some perspectives about fundamental limitations in the current technology and some speculations about where we can go from here  相似文献   

14.
A wide variety of speech recognition distortion measures have been proposed and tested, including some especially effective ones. It is shown that there is a general framework, based on the concepts of information theory, linking most of these measures. The distortion measure between any two speech spectra can be defined in terms of the distortions between the associated probability distributions. This general framework defines three broad families of distortion measures for speech recognition and provides a consistent way of combining the energy and the spectral information of a phonetic event. In addition, the cepstral-domain representation for several distortion measures is derived, allowing comparison of these measures in a domain that also yields convenient equations for their practical implementation  相似文献   

15.
This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, mel-cepstrum, energy, and their time differences of speech waveforms as their speech features. However, these systems never have good performance if the selected features are not suitable for speech recognition. Since the recognition rate is the only performance measure of speech recognition system, it is hard to judge how suitable the selected feature is. To solve this problem, it is essential to analyze the feature itself, and measure how good the feature itself is. Good speech features should contain all of the class-related information and as small amount of the class-irrelevant variation as possible. In this paper, we suggest a method to measure the class-related information and the amount of the class-irrelevant variation based on the Shannon's information theory. Using this method, we compare the mel-scaled FFT, cepstrum, mel-cepstrum, and wavelet features of the TIMIT speech data. The result shows that, among these features, the mel-scaled FFT is the best feature for speech recognition based on the proposed measure.  相似文献   

16.
The well-known (minimum phase) prediction error filter, which is based on correlation function (second-order statistics), has been popularly used to deconvolve seismic signals as well as speech signals. However, it is found that there remains phase distortion in the predictive deconvolved signals since the source wavelet and the vocal track filter can be non-minimum phase. In this paper, we apply two cumulant (higher-order statistics) based inverse filter criteria, which can simultaneously remove the amplitude distortion and the phase distortion of non-minimum phase system, to deconvolving seismic signals and speech signals. Some simulation results and experimental results with real speech data are provided to demonstrate these two criteria.  相似文献   

17.
欧世峰  赵晓晖 《电子学报》2007,35(10):2007-2013
通过讨论纯净语音分量的概率分布特征以及相邻分量间的统计相关特性,在自适应K-L变换(KLT,Karhunen-Loève Transform)域给出了一种新的语音信号统计模型,然后基于该信号模型,利用最大后验(MAP,Maximum a Posterior)估计理论提出了一种新型的单通道语音增强算法.该算法充分考虑到在KLT域相邻时刻语音分量间存在的相关信息,利用信号的高斯模型假设条件,以联合概率密度函数的形式将这种相关信息融合到MAP中,获得纯净语音分量的估计.算法不仅结构简单利于实现,且有效地避免了传统算法对语音分量估计的不足.仿真结果表明本文算法在客观和主观测试中都具有较好的语音增强效果.  相似文献   

18.
正反向隐马尔可夫模型及其在连续语音识别中的应用   总被引:1,自引:0,他引:1  
本文针对语音信号中客观存在的正、反向依赖特性,明确提出了用条件概率的概念来定量表述语音信号的这种正、反向的马尔可大依赖关系,提出了描述语音信号这种正反向依赖关系的正反向隐马尔可夫模型(HMM),并用实验证明了仅仅利用语音反向依赖关系语音识别同样也能获得相当可观的识别性能。接着,本文针对孤立字和连续语音两种不同的识别任务,研究了在语音识别中同时利用这两种依赖信息的方法,并提出了一种连续语音识别中的新的搜索算法──正反向分半混合搜索。这种方法利用基于正向HMM的正向Viterbi搜索和基于反向HMM的反向Viterbi搜索的中间结果来有效地结合正反向依赖信息,实验证明正反向分半混合搜索方法确实一致地优于单用任何一种依赖信息的单向搜索识别方法。  相似文献   

19.
While several proactive acoustic feedback (Larsen-effect) cancellation schemes have been presented for speech applications with short acoustic feedback paths as encountered in hearing aids, these schemes fail with the long impulse responses inherent to, for instance, public address systems. We derive a new prediction error method (PEM)-based scheme (referred to as PEM-AFROW) which identifies both the acoustic feedback path and the nonstationary speech source model. A cascade of a short- and a long-term predictor removes the coloring and periodicity in voiced speech segments, which account for the unwanted correlation between the loudspeaker signal and the speech source signal. The predictors calculate row operations which are applied to prewhiten the speech source signal, resulting in a least squares system that is solved recursively by means of normalized least mean square or recursive least squares algorithms. Simulations show that this approach is indeed superior to earlier approaches whenever long acoustic channels are dealt with.  相似文献   

20.
李洪伟  马琳  李海峰 《信号处理》2023,39(4):639-648
语音是人类表达思想和感情交流最重要的工具,是人类文化的重要组成部分。语音情感识别作为情感计算中的重要课题已经成为国际上的研究热点,受到越来越多的关注。已有神经科学研究表明,大脑是产生调节情感的物质基础。因此,在语音情感的研究中,我们不能仅考虑语音信号自身,还应将大脑的活动信号融入语音情感识别中,以实现更高准确率的情感识别。基于上述思想,本文提出了一种基于核典型相关分析(KCCA)的语音特征提取方法。该方法将语音特征与脑电图(EEG)特征映射到高维希尔伯特空间,并计算二者的最大相关系数。KCCA将语音特征在高维希尔伯特空间上向与脑电特征相关性最大的方向投影,最终得到包含脑电信息的语音特征。本文方法将与语音情感相关的脑电信息融入语音情感特征提取中,所提特征能够更准确的表征情感。同时,本方法在理论上具有良好的可迁移性,当所提脑电特征足够准确与具有代表性时,KCCA建模得到的投影向量具有通用性,可直接用于新的语音情感数据集中而无需重新采集和计算相应的脑电信号。在自建语音情感数据库与公开语音情感数据库MSP-IMPROV上的实验结果表明,使用投影语音特征进行语音情感分类的方法优于使用原始音频特征...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号