期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王霞马俊晖王光艳张艳《计算机工程与应用》2017,53(19):114-117

针对语音编码的音质评价算法性能已十分明确,但对于面罩语音不一定适用。讨论了语音质量评价算法对空气语音与面罩语音在不同噪声环境下的适用性。采用主观意见得分和三种客观评价测度对多种信噪比的带噪语音和增强语音进行评价,包括分段信噪比、改进的巴克谱失真（MBSD）和语音感知质量评价（PESQ）,根据与主观评价的一致性判断客观评价方法的适用性。增强算法采用维纳滤波法和对数谱最小均方误差法（LSA-MMSE）,噪声采用粉红噪声、海浪噪声。仿真结果表明,语音质量评价算法的适用性与语音类型、信噪比、背景噪声、增强算法种类有关。粉红噪声环境下,PESQ不适合评价经维纳滤波增强的空气语音;MBSD算法只适用于评价经LSA-MMSE增强的面罩语音。海浪噪声环境下,PESQ适用于评价面罩语音,MBSD不适合评价面罩语音。相似文献

2.

Computational auditory models in predicting noise reduction performance for wideband telephony applications

Nazanin Pourmand Vijay Parsa Angela Weaver 《International Journal of Speech Technology》2013,16(4):363-379

The performance of several noise reduction algorithms intended for wideband telephony was evaluated both subjectively and objectively. The chosen algorithms were based on statistical modeling, spectral subtraction, Wiener filtering, or subspace modelling principles. A customized wideband noise reduction database containing speech samples corrupted by three types of background noises at three SNR levels, along with their enhanced versions was created. The overall quality of the speech samples in the database was subsequently rated by a group of listeners with normal hearing capabilities. Comprehensive statistical analyses were performed to assess the reliability of the subjective data, and to assess the performance of noise reduction algorithms across varied noisy conditions. The subjective quality ratings were then used to investigate the performance of several auditory model-based objective quality metrics. Key results from these investigations include: (a) there was a high degree of inter- and intra-subject reliability in the subjective ratings, (b) noise reduction algorithms enhance speech quality for only a subset of the noise conditions, and (c) auditory model-based metrics perform similarly in predicting speech quality ratings, when speech quality scores pertaining to a particular noise condition were averaged. 相似文献

3.

Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice

《IEEE transactions on audio, speech, and language processing》2006,14(6):2006-2013

It is essential to ensure quality of service (QoS) when offering a speech recognition service for use in noisy environments. This means that the recognition performance in the target noise environment must be investigated. One approach is to estimate the recognition performance from a distortion value, which represents the difference between noisy speech and its original clean version. Previously, estimation methods using the segmental signal-to-noise ratio (SNRseg), the cepstral distance (CD), and the perceptual evaluation of speech quality (PESQ) have been proposed. However, their estimation accuracy has not been verified for the case when a noise reduction algorithm is adopted as a preprocessing stage in speech recognition. We, therefore, evaluated the effectiveness of these distortion measures by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. The results showed that in each case the distortion measure correlates well with the word accuracy when the estimators used are optimized for each individual noise reduction algorithm. In addition, it was confirmed that when a single estimator, optimized for all the noise reduction algorithms, is used, the PESQ method gives a more accurate estimate than SNRseg and CD. Furthermore, we have proposed the use of artificial voice of several seconds duration instead of a large amount of real speech and confirmed that a relatively accurate estimate can be obtained by using the artificial voice. 相似文献

4.

Noisy speech enhancement based on an adaptive threshold and a modified hard thresholding function in wavelet packet domain

Tahsina Farah Sanam Celia Shahnaz 《Digital Signal Processing》2013,23(3):941-951

This paper proposes a speech enhancement approach, which statistically determines an adaptive threshold using the Teager energy operated WP coefficients of noisy speech. The obtained threshold is employed upon the WP coefficients of the noisy speech by employing a modified hard thresholding function. Extensive simulations in the presence of different noises indicate that this new method is very effective for both white noise and color noise reduction from speech, resulting in enhanced speech with better speech quality. Several standard objective measures and subjective observations show that the proposed method outperforms recent state-of-the-art thresholding based approaches from high to low level SNRs. 相似文献

5.

基于新阈值函数和自适应阈值的小波包语音增强研究

刘冲冲邹翔周正仙《计算机应用研究》2017,34(11)

针对传统的小波包语音增强算法增强后的语音失真严重的问题,本文提出了一种基于自适应阈值和新阈值函数的小波包语音增强算法。该算法在小波包域将带噪语音加窗分帧,基于相邻帧快速傅立叶变换功率谱的互相关值,计算各帧存在语音的概率,然后通过语音存在概率对传统通用小波包阈值进行调整,使得阈值在非语音帧中较大,在语音帧中较小,实现阈值的自适应调整,可以在最大程度消除噪声的同时,尽可能的保留语音,减小语音失真。本文还设计了一种新阈值函数,克服了传统硬阈值函数不连续和软阈值函数会带来恒定偏差的缺点,进一步减小了语音失真。本文采用TIMIT 数据库和NOISEX-92 数据库中的语音和噪声进行了大量的模拟实验,主观评比和客观评比结果均证明本文提出的语音增强算法比现有的两种算法有更好的增强效果,采用本文算法增强后的语音失真更小,听觉效果更好。相似文献

6.

增强型语音可懂度的评价

马建芬张雪英《计算机工程与应用》2012,48(32):5-8

提出一种与主观评价相关性较高的可懂度客观评价算法。传统的基于频域分段信噪比的可懂度评价算法与主观评价的相关性不高,原因在于没有分别计算谱衰减畸变和谱放大畸变这两种畸变。为了克服这一缺点,提出将增强语音分解为衰减畸变、放大倍数小于6.02 dB的放大畸变、放大倍数大于6.02 dB的放大畸变三部分,分别计算其频域信噪比,用多线性回归方法综合这三种畸变值,使其与主观可懂值的相关值达到最高。实验结果表明,用这种方法对句子的可懂度评价结果与主观评价的相关值达到0.91。相似文献

7.

Spectral difference for statistical model-based speech enhancement in speech recognition

Soojeong Lee Joon-Hyuk Chang 《Multimedia Tools and Applications》2017,76(23):24917-24929

In this paper, we propose a statistical model-based speech enhancement technique using the spectral difference scheme for the speech recognition in virtual reality. In the analyzing step, two principal parameters, the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter in noise estimation, are uniquely determined as optimal operating points according to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. An efficient mapping function is also presented to provide an index of the metric table associated with the spectral difference so that operating points can be determined according to various noise conditions for an on-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis under the metric table of the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms. 相似文献

8.

基于巴克谱的语音质量评估算法研究

包晓刚胡剑凌徐盛《数据采集与处理》2004,19(1):16-20

提出了一种语音主观质量的客观评估算法，该算法在巴克谱的基础上计算原始语音与重建语音之间的失真度，并考虑了弱音帧与噪声帧的存在对语音质量评估的影响。文中同时给出了结合巴克谱失真和弱音与噪声帧比率的语音质量评估公式，并将计算结果与平均意见分(MOS)进行了比较。数值实验表明，本文提出的增强型巴克谱失真测度(IBSD)与MOS之间具有很强的相关性．能客观地评价出语音信号的主观质量，适用于各种语音编码、语音通信系统。相似文献

9.

Reduction of residual noise based on eigencomponent filtering for speech enhancement

Kewen Huang Yimin Liu Yuanquan Hong 《International Journal of Speech Technology》2018,21(4):877-886

In this paper, residual noise of corrupted speech observations is further restrained based on eigencomponent (an eigenvalue and its corresponding eigenvector) filtering. Three relevant algorithms are proposed to obtain the core eigencomponents that deeply affect enhancement quality of speech fragments by joint diagonalization of clean speech and noise covariance matrix. In addition, the generalized inverse matrix transform is introduced to the recovery of enhanced speech signal for the issue of matrix irreversibility after eigencomponents are filtered. Experiment results show that the proposed methods work better than many other methods under various conditions on both noise reduction and speech distortion. 相似文献

10.

拉格朗日乘子受控子空间语音增强

崔晓《计算机工程与应用》2014,(9):182-185,257

针对频域受限子空间语音增强在构造增强矩阵时,采用固定拉格朗日乘子,使得减小语音畸变和提高语音可懂度的过程中,有音乐噪声残留,提出一种变拉格朗日乘子的算法。利用听觉特性中较强的频率成分对噪声进行掩蔽,通过掩蔽阈值的频率域与子空间特征值之间的变换算法,用变量控制子空间拉格朗日乘子计算增益函数的对角矩阵。对比实验和试听结果表明,提出算法增强的语音信号不仅信噪比有较大提高,语音质量主观感知度也有明显改善。相似文献

11.

Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Suman Senapati 《International Journal of Speech Technology》2013,16(4):439-459

This paper investigates the problem of speech enhancement when only a single microphone is used and the statistics of the interfering noise and speech are not available a priori. Thus it seeks to address a pitfall of many current enhancement techniques and look towards a system which would have application in the real world. This paper focuses on Log Gabor Wavelet (LGW) based Long Term Squared Spectral Amplitude estimator using the Maximum a Posteriori (MAP) criterion. To begin with, long term cepstral mean subtraction technique with LGW is proposed to suppress telephone channel and handset effect from the speech signals. Then a novel speech enhancer by MAP based Bayesian Bivariate Model is developed to suppress the background noise. This work also introduces an inter-scale dependency between the coefficients and their parents by a Circularly Symmetric probability density function related to the family of Spherically Invariant Random Process (SIRPs). The corresponding joint estimator is derived by MAP estimation theory. The inter-scale noise variance of the coefficients is kept constant which gives closed form solution. Consideration of speech presence uncertainty (SPU) estimator is another contribution to the proposed estimator. Therefore, in this paper, the main contributions are; (i) combination of LGW, SIRPs and SPU for background noise reduction, (ii) LGW and Long Term Cepstral Mean Subtraction to reduce the effects of both telephone channel and handsets, (iii) circularly Symmetric probability density function to exploit the inter-scale dependency between the coefficients and their parents and corresponding joint estimators are derived by MAP estimation theory, (iv) the inter-scale noise variance of the coefficients is kept constant which gives closed form solution, (v) idea refines the estimate of the magnitudes by scaling them by the SPU probability. Extensive comparisons are done among the proposed and existing speech enhancement algorithms on NOIZEUS speech database which has different types of noise. We report the subjective and objective evaluations encompassing four classes of algorithms: spectral subtractive, subspace, statistical model based and Wiener type against the proposed methods. Experimental results show that the proposed estimator yields a higher improvement in Segmental SNR (SSNR), lower Log Area Ratio (LAR), Weighted Spectral Slope (WSS) distortion, higher Perceptual Evaluation of Speech Quality (PESQ) and Mean Opinion Score (MOS) compared to the existing speech enhancement algorithms. For SSNR measure, the proposed methods show 2 dB of improvement than existing methods for almost every Noise sources. For MOS measure, the proposed methods show improvement than existing methods for almost every Noise sources. Therefore the proposed methods are aiming to enhance the speech quality as well as intelligibility at a time. 相似文献

12.

Analysis and Comparison of Multichannel Noise Reduction Methods in a Common Framework

Yiteng Huang Benesty J. Jingdong Chen 《IEEE transactions on audio, speech, and language processing》2008,16(5):957-968

Noise reduction for speech enhancement is a useful technique, but in general it is a challenging problem. While a single-channel algorithm is easy to use in practice, it inevitably introduces speech distortion to the desired speech signal while reducing noise. Today, the explosive growth in computational power and the continuous drop in the cost and size of acoustic electric transducers are driving the interest of employing multiple microphones in speech processing systems. This opens new opportunities for noise reduction. In this paper, we present an analysis of three multichannel noise reduction algorithms, namely Wiener filter, subspace, and spatial-temporal prediction, in a common framework. We intend to investigate whether it is possible for the multichannel noise reduction algorithms to reduce noise without speech distortion. Finally, we justify what we learn via theoretical analyses by simulations using real impulse responses measured in the varechoic chamber at Bell Labs. 相似文献

13.

An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion

《Computer Speech and Language》2014,28(6):1269-1286

This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise for speech estimation. Third, we present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement. 相似文献

14.

A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments

Navneet Upadhyay Abhijit Karmakar 《International Journal of Speech Technology》2014,17(2):117-132

In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech. 相似文献

15.

基于改进谱平滑策略的IMCRA算法及其语音增强

张建伟陶亮周健王华彬《计算机工程与应用》2017,53(1):153-157

噪声谱估计算法在单通道语音增强方法中起着重要作用,为了改善噪声谱估计算法对噪声的估计和更新能力,结合最小统计（MS）算法,对改进的基于控制的递归平均（IMCRA）噪声谱估计算法的递归平均参数进行改进,并用一阶递归的方式对平滑功率谱的最小值进行改进。采用谱减法对含噪语音信号作去噪处理,从客观和主观两方面对不同算法的性能进行评价,对比分析不同噪声不同信噪比下增强前后语音的分段信噪比（segSNR）、PESQ得分、MOS得分。实验结果表明,提出的方法能够更好地跟踪噪声信号变化,改善语音质量。相似文献

16.

On using multivariate polynomial regression model with spectral difference for statistical model-based speech enhancement

《Journal of Systems Architecture》2016

In this paper, we propose a statistical model-based speech enhancement technique using a multivariate polynomial regression (MPR) based on spectral difference scheme. In the analyzing step, three principal parameters, the weighting parameter in the decision-directed (DD) method, the long-term smoothing parameter for the noise estimation, and the control parameter of the minimum gain value are estimated as optimal operating points technique by using to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. Thus, we apply the MPR technique by incorporating the spectral differences as independent variables in order to estimate the optimal operating points. The MPR technique offers an effective scheme to represent complex nonlinear input-output relationship between the optimal operating points and spectral differences so that operating points can be determined according to various noise conditions in the off-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis through the regression according to the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms. 相似文献

17.

Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System

《IEEE transactions on audio, speech, and language processing》2006,14(6):2049-2063

In this paper, the family of conditional minimum mean square error (MMSE) spectral estimators is studied which take on the form$(E(X_p^alpha/vert X_p+D_pvert))^1/alpha$, where$X_p$is the clean speech spectrum, and$D_p$is the noise spectrum, resulting in a Generalized MMSE estimator (GMMSE). The degree of noise suppression versus musical tone artifacts of these estimators is studied. The tradeoffs in selection of$(alpha)$, across noise spectral structure and signal-to-noise ratio (SNR) level, are also considered. Members of this family of estimators include the Ephraim–Malah (EM) amplitude estimator and, for high SNRs, the Wiener Filter. It is shown that the colorless residual noise observed in the EM estimator is a characteristic of this general family of estimators. An application of these estimators in an auditory enhancement scheme using the masking threshold of the human auditory system is formulated, resulting in the GMMSE-auditory masking threshold (AMT) enhancement method. Finally, a detailed evaluation of the proposed algorithms is performed over the phonetically balanced TIMIT database and the National Gallery of the Spoken Word (NGSW) audio archive using subjective and objective speech quality measures. Results show that the proposed GMMSE-AMT outperforms MMSE and log-MMSE enhancement methods using a detailed phoneme-based objective quality analysis. 相似文献

18.

Objective Speech Quality Estimation for Analog Mobile Channels: Problems and Solutions

Tibor Fegyó Máté Szarvas Péter Tatai Géza Gordos 《International Journal of Speech Technology》2000,3(3-4):277-287

This article describes an objective speech quality assessment method developed for the Hungarian NMT-450 cellular mobile telephone system. The method is based on a psychoacoustic front end followed by a cognitive modeling component. Special problems of the NMT system, such as handover (HOV), the effects of automatic gain control (AGC), and the intrusion of signaling noise, are addressed in the cognitive module. Correlation between the subjective and objective quality measures is maximized by finding a transformation that linearizes their relationship. A correlation of 0.95 is achieved on an independent test set between the subjective speech quality and the proposed objective quality measure. 相似文献

19.

A Multiresolution Model of Auditory Excitation Pattern and Its Application to Objective Evaluation of Perceived Speech Quality

《IEEE transactions on audio, speech, and language processing》2006,14(6):1912-1923

This paper proposes a multiresolution model of auditory excitation pattern and applies it to the problem of objective evaluation of subjective wideband speech quality. The model uses wavelet packet transform for time-frequency decomposition of the input signal. The selection of the wavelet packet tree is based on an optimality criterion formulated to minimize a cost function based on the critical band structure. The models of the different auditory phenomena are reformulated for the multiresolution framework. This includes the proposition of duration dependent outer and middle ear weighting, multiresolution spectral spreading, and multiresolution temporal smearing. As an application, the excitation pattern is used to define an objective measure of auditory distortion of a distorted speech signal compared to the undistorted one. The performance of this objective measure is evaluated with a database of various kinds of NOISEX-92 degraded wideband speech signals in predicting the subjective mean opinion score (MOS) and is compared with the fast Fourier transform (FFT)-based ITU-T PESQ P.862.2 algorithm. The proposed measure is found to achieve comparable correlation between subjective MOS and objective MOS as PESQ P.862.2, with a trend suggesting better correlation for the nonstationary degradations compared to the stationary ones. Further refinement of the measure for distortion types other than additive noise is anticipated. 相似文献

20.

Speech enhancement using MMSE estimation under phase uncertainty

Ravikumar Kandagatla P. V. Subbaiah 《International Journal of Speech Technology》2017,20(2):373-385

Most of the speech enhancement algorithms process the amplitudes of speech, but the phase of noisy speech is left unprocessed as it may cause undesired artifacts. Recently, short time Fourier transform based single channel speech enhancement algorithms are developed by considering uncertain prior knowledge of phase. The uncertain knowledge of the phase is obtained from the phase reconstruction algorithms. The goal of this paper is to develop joint minimum mean square error estimate of complex speech coefficients given uncertainty phase (CUP) information by considering Nagakami probability density function (PDF) and gamma PDF as speech spectral amplitude priors and generalized gamma PDF for noise prior. Estimators like amplitudes given uncertainty phase, which uses uncertain phase only for amplitude estimation and not for phase improvement are developed. Experimental results shows that incorporating uncertain phase information improves quality and intelligibility of speech. Also novel phase-blind estimators are developed using Nagakami PDF/gamma as speech priors and generalized gamma as noise prior. Finally comparison of all estimators using uncertain prior phase information is discussed and how initial phase information affects the enhancement process is analyzed with novel estimators. For comparison of all the derived estimators, the speech signals uttered by male and female speakers are taken from TIMIT database. The proposed CUP estimators outperforms the existing algorithms in terms of objective performance measure segmental signal to noise ratio, phase signal to noise ratio, perceptual evaluation of speech quality, short time objective intelligibility. 相似文献