首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
This paper proposes a multiresolution model of auditory excitation pattern and applies it to the problem of objective evaluation of subjective wideband speech quality. The model uses wavelet packet transform for time-frequency decomposition of the input signal. The selection of the wavelet packet tree is based on an optimality criterion formulated to minimize a cost function based on the critical band structure. The models of the different auditory phenomena are reformulated for the multiresolution framework. This includes the proposition of duration dependent outer and middle ear weighting, multiresolution spectral spreading, and multiresolution temporal smearing. As an application, the excitation pattern is used to define an objective measure of auditory distortion of a distorted speech signal compared to the undistorted one. The performance of this objective measure is evaluated with a database of various kinds of NOISEX-92 degraded wideband speech signals in predicting the subjective mean opinion score (MOS) and is compared with the fast Fourier transform (FFT)-based ITU-T PESQ P.862.2 algorithm. The proposed measure is found to achieve comparable correlation between subjective MOS and objective MOS as PESQ P.862.2, with a trend suggesting better correlation for the nonstationary degradations compared to the stationary ones. Further refinement of the measure for distortion types other than additive noise is anticipated.  相似文献   

2.
This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise for speech estimation. Third, we present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement.  相似文献   

3.
《Ergonomics》2012,55(5):719-736
Noise abatement in office environments often focuses on the reduction of background speech intelligibility and noise level, as attainable with frequency-specific insulation. However, only limited empirical evidence exists regarding the effects of reducing speech intelligibility on cognitive performance and subjectively perceived disturbance. Three experiments tested the impact of low background speech (35 dB(A)) of both good and poor intelligibility, in comparison to silence and highly intelligible speech not lowered in level (55 dB(A)). The disturbance impact of the latter speech condition on verbal short-term memory (n = 20) and mental arithmetic (n = 24) was significantly reduced during soft and poorly intelligible speech, but not during soft and highly intelligible speech. No effect of background speech on verbal-logical reasoning performance (n = 28) was found. Subjective disturbance ratings, however, were consistent over all three experiments with, for example, soft and poorly intelligible speech rated as the least disturbing speech condition but still disturbing in comparison to silence. It is concluded, therefore, that a combination of objective performance tests and subjective ratings is desirable for the comprehensive evaluation of acoustic office environments and their alterations.  相似文献   

4.
Evaluation of Objective Quality Measures for Speech Enhancement   总被引:1,自引:0,他引:1  
In this paper, we evaluate the performance of several objective measures in terms of predicting the quality of noisy speech enhanced by noise suppression algorithms. The objective measures considered a wide range of distortions introduced by four types of real-world noise at two signal-to-noise ratio levels by four classes of speech enhancement algorithms: spectral subtractive, subspace, statistical-model based, and Wiener algorithms. The subjective quality ratings were obtained using the ITU-T P.835 methodology designed to evaluate the quality of enhanced speech along three dimensions: signal distortion, noise distortion, and overall quality. This paper reports on the evaluation of correlations of several objective measures with these three subjective rating scales. Several new composite objective measures are also proposed by combining the individual objective measures using nonparametric and parametric regression analysis techniques.  相似文献   

5.
In this paper, we propose a statistical model-based speech enhancement technique using the spectral difference scheme for the speech recognition in virtual reality. In the analyzing step, two principal parameters, the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter in noise estimation, are uniquely determined as optimal operating points according to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. An efficient mapping function is also presented to provide an index of the metric table associated with the spectral difference so that operating points can be determined according to various noise conditions for an on-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis under the metric table of the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms.  相似文献   

6.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

7.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

8.
This paper investigates the problem of speech enhancement when only a single microphone is used and the statistics of the interfering noise and speech are not available a priori. Thus it seeks to address a pitfall of many current enhancement techniques and look towards a system which would have application in the real world. This paper focuses on Log Gabor Wavelet (LGW) based Long Term Squared Spectral Amplitude estimator using the Maximum a Posteriori (MAP) criterion. To begin with, long term cepstral mean subtraction technique with LGW is proposed to suppress telephone channel and handset effect from the speech signals. Then a novel speech enhancer by MAP based Bayesian Bivariate Model is developed to suppress the background noise. This work also introduces an inter-scale dependency between the coefficients and their parents by a Circularly Symmetric probability density function related to the family of Spherically Invariant Random Process (SIRPs). The corresponding joint estimator is derived by MAP estimation theory. The inter-scale noise variance of the coefficients is kept constant which gives closed form solution. Consideration of speech presence uncertainty (SPU) estimator is another contribution to the proposed estimator. Therefore, in this paper, the main contributions are; (i) combination of LGW, SIRPs and SPU for background noise reduction, (ii) LGW and Long Term Cepstral Mean Subtraction to reduce the effects of both telephone channel and handsets, (iii) circularly Symmetric probability density function to exploit the inter-scale dependency between the coefficients and their parents and corresponding joint estimators are derived by MAP estimation theory, (iv) the inter-scale noise variance of the coefficients is kept constant which gives closed form solution, (v) idea refines the estimate of the magnitudes by scaling them by the SPU probability. Extensive comparisons are done among the proposed and existing speech enhancement algorithms on NOIZEUS speech database which has different types of noise. We report the subjective and objective evaluations encompassing four classes of algorithms: spectral subtractive, subspace, statistical model based and Wiener type against the proposed methods. Experimental results show that the proposed estimator yields a higher improvement in Segmental SNR (SSNR), lower Log Area Ratio (LAR), Weighted Spectral Slope (WSS) distortion, higher Perceptual Evaluation of Speech Quality (PESQ) and Mean Opinion Score (MOS) compared to the existing speech enhancement algorithms. For SSNR measure, the proposed methods show 2 dB of improvement than existing methods for almost every Noise sources. For MOS measure, the proposed methods show improvement than existing methods for almost every Noise sources. Therefore the proposed methods are aiming to enhance the speech quality as well as intelligibility at a time.  相似文献   

9.
Most speech enhancement algorithms are based on the assumption that speech and noise are both Gaussian in the discrete cosine transform (DCT) domain. For further enhancement of noisy speech in the DCT domain, we consider multiple statistical distributions (i.e., Gaussian, Laplacian and Gamma) as a set of candidates to model the noise and speech. We first use the goodness-of-fit (GOF) test in order to measure how far the assumed model deviate from the actual distribution for each DCT component of noisy speech. Our evaluations illustrate that the best candidate is assigned to each frequency bin depending on the Signal-to-Noise-Ratio (SNR) and the Power Spectral Flatness Measure (PSFM). In particular, since the PSFM exhibits a strong relation with the best statistical fit we employ a simple recursive estimation of the PSFM in the model selection. The proposed speech enhancement algorithm employs a soft estimate of the speech absence probability (SAP) separately for each frequency bin according to the selected distribution. Both objective and subjective tests are performed for the evaluation of the proposed algorithms on a large speech database, for various SNR values and types of background noise. Our evaluations show that the proposed soft decision scheme based on multiple statistical modeling or the PSFM provides further speech quality enhancement compared with recent methods through a number of subjective and objective tests.  相似文献   

10.
Public telephone systems transmit speech across a limited frequency range, about 300–3400 Hz, called narrowband (NB) which results in a significant reduction of quality and intelligibility of speech. This paper proposes a fully backward compatible novel method for bandwidth extension of NB speech. The method uses magnitude spectrum data hiding technique to provide a perceptually better wideband speech signal. Code excited linear prediction parameters are extracted from the down sampled frequency shifted version of the high frequency components of speech signal existing above NB, which are spread by using pseudo-noise codes, and are embedded in the low amplitude high-frequency regions of the magnitude spectrum of NB speech signal. The embedded information is extracted at the receiving end to reconstruct the wideband speech signal. Theoretical and simulation analyses show that the proposed method is robust to quantization and channel noises. The comparison category rating listening and log spectral distortion tests clearly show that the reconstructed wideband signal gives a much better performance in terms of speech quality when compared to some of the existing speech bandwidth extension methods employing data hiding.  相似文献   

11.
In recognition of high-quality wideband speech codecs, several standardization activities have been conducted, resulting in the selection of a wideband speech codec called adaptive multi-rate wideband (AMR-WB). The algebraic code-excited linear prediction (ACELP) technique is recommended in AMR-WB, and it is noted that most of the complexity in the ACELP structure comes from the codebook search. In this paper, a new method is proposed for codebook search based on the behavior of backward filtered target signal, d(n), introduced in ITU-T G.722.2 recommendation. To optimize the proposed scheme, five optimization algorithms (i.e., modified genetic algorithm, particle swarm optimization with dynamic inertia weight, bee colony optimization, modified differential evolution, and imperialist competition algorithm) are investigated. Experimental results show that the reduction in codebook search operations of the proposed method is able to reach up to 59 percent as compared with ITU-T G.722.2 recommendation. Meanwhile, BCO-based codebook search scheme has better convergence speed without significant degradation in quality metrics, such as segmental signal-to-noise ratio, mean opinion score, and perceptual evaluation of speech quality, when used in an AMR-WB speech codec.  相似文献   

12.
This paper proposes two new non-reference image quality metrics that can be adopted by the state-of-the-art image/video denoising algorithms for auto-denoising. The first metric is proposed based on the assumption that the noise should be independent of the original image. A direct measurement of this dependence is, however, impractical due to the relatively low accuracy of existing denoising method. The proposed metric thus tackles the homogeneous regions and highly-structured regions separately. Nevertheless, this metric is only stable when the noise level is relatively low. Most denoising algorithms reduce noise by (weighted) averaging repeated noisy measurements. As a result, another metric is proposed for high-level noise based on the fact that more noisy measurements will be required when the noise level increases. The number of measurements before converging is thus related to the quality of noisy images. Our patch-matching based metric proposes to iteratively find and add noisy image measurements for averaging until there is no visible difference between two successively averaged images. Both metrics are evaluated on LIVE2 (Sheikh et al. in LIVE image quality assessment database release 2: 2013) and TID2013 (Ponomarenko et al. in Color image database tid2013: Peculiarities and preliminary results: 2005) data sets using standard Spearman and Kendall rank-order correlation coefficients (ROCC), showing that they subjectively outperforms current state-of-the-art no-reference metrics. Quantitative evaluation w.r.t. different level of synthetic noisy images also demonstrates consistently higher performance over state-of-the-art non-reference metrics when used for image denoising.  相似文献   

13.
A new method for the objective assessment and prediction of perceived audio quality is introduced. It represents an expansion of the speech quality measure$q_C$, introduced by Hansen and Kollmeier, and is based on a psychoacoustically validated, quantitative model of the “effective” peripheral auditory processing by Dau To evaluate the audio quality of a given distorted signal relative to a corresponding high-quality reference signal, the auditory model is employed to compute “internal representations” of the signals, which are partly assimilated in order to account for assumed cognitive aspects. The linear cross correlation coefficient of the assimilated internal representations represents the perceptual similarity measure (PSM). PSM shows good correlations with subjective quality ratings if different types of audio signals are considered separately, whereas a better accuracy of signal-independent quality prediction is achieved by a second quality measure$ PSM_t$represented by the fifth percentile of the sequence of instantaneous audio quality PSM(t). The new measures were evaluated using a large database of subjective listening tests that were originally carried out on behalf of the International Telecommunication Union (ITU) and Moving Pictures Experts Group (MPEG) for the evaluation of various low bit-rate audio codecs. Additional tests with data unknown in the development phase of the model were carried out. Except for linear distortions, the new method shows a higher prediction accuracy than the ITU-R recommendation BS.1387 (“PEAQ”) for the tested data.  相似文献   

14.
In this paper, the family of conditional minimum mean square error (MMSE) spectral estimators is studied which take on the form$(E(X_p^alpha/vert X_p+D_pvert))^1/alpha$, where$X_p$is the clean speech spectrum, and$D_p$is the noise spectrum, resulting in a Generalized MMSE estimator (GMMSE). The degree of noise suppression versus musical tone artifacts of these estimators is studied. The tradeoffs in selection of$(alpha)$, across noise spectral structure and signal-to-noise ratio (SNR) level, are also considered. Members of this family of estimators include the Ephraim–Malah (EM) amplitude estimator and, for high SNRs, the Wiener Filter. It is shown that the colorless residual noise observed in the EM estimator is a characteristic of this general family of estimators. An application of these estimators in an auditory enhancement scheme using the masking threshold of the human auditory system is formulated, resulting in the GMMSE-auditory masking threshold (AMT) enhancement method. Finally, a detailed evaluation of the proposed algorithms is performed over the phonetically balanced TIMIT database and the National Gallery of the Spoken Word (NGSW) audio archive using subjective and objective speech quality measures. Results show that the proposed GMMSE-AMT outperforms MMSE and log-MMSE enhancement methods using a detailed phoneme-based objective quality analysis.  相似文献   

15.
目的 由于光在水中的衰减/散射以及微生物对光的吸收/反射等影响,水下图像通常存在色偏、模糊、光照不均匀以及对比度过低等诸多质量问题。研究人员对此提出了许多不同的水下图像增强算法。为了探究目前已有的水下图像增强算法的性能和图像质量客观评价方法是否适用于评估水下图像,本文开展大规模主观实验来对比不同水下图像增强算法在真实水下图像数据集上的性能,并对现有图像质量评价方法用于评估水下图像的准确性进行测试。方法 构建了一个真实的水下图像数据集,其中包含100幅原始水下图像以及对应的1 000幅由10种主流水下图像增强方法增强后的图像。基于成对比较的策略开展水下图像主观质量评价,进一步对主观评价得到的结果进行分析,包括一致性分析、收敛性分析以及显著性检验。最后将10种现有主流的无参考图像质量评价在本文数据集上进行测试,检验其在真实水下图像数据集上的评价性能。结果 一致性分析中,该数据集包含的主观评分有较高的肯德尔一致性系数,其值为0.41;收敛性分析中,所收集的投票数量与图像数量足够得到稳定的主观评分;表明本文构建的数据集具有良好的有效性与可靠性。此外,目前对比自然图像的无参考图像质量评价方法并不适用于水下图像数据集,验证了水下图像与自然图像的巨大差异。结论 本文构建的真实水下图像数据集为未来水下图像质量客观评价方法以及水下图像增强算法的研究提供了参考与支持。所涉及的图像以及所有收集的用户数据,都在项目主页(https://github.com/yia-yuese/RealUWIQ-dataset)上公开。  相似文献   

16.
In this paper, we proposed a new speech enhancement system, which integrates a perceptual filterbank and minimum mean square error–short time spectral amplitude (MMSE–STSA) estimation, modified according to speech presence uncertainty. The perceptual filterbank was designed by adjusting undecimated wavelet packet decomposition (UWPD) tree, according to critical bands of psycho-acoustic model of human auditory system. The MMSE–STSA estimation (modified according to speech presence uncertainty) was used for estimation of speech in undecimated wavelet packet domain. The perceptual filterbank provides a good auditory representation (sufficient frequency resolution), good perceptual quality of speech and low computational load. The MMSE–STSA estimator is based on a priori SNR estimation. A priori SNR estimation, which is a key parameter in MMSE–STSA estimator, was performed by using “decision directed method.” The “decision directed method” provides a trade off between noise reduction and signal distortion when correctly tuned. The experiments were conducted for various noise types. The results of proposed method were compared with those of other popular methods, Wiener estimation and MMSE–log spectral amplitude (MMSE–LSA) estimation in frequency domain. To test the performance of the proposed speech enhancement system, three objective quality measurement tests (SNR, segSNR and Itakura–Saito distance (ISd)) were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of proposed speech enhancement system. The proposed speech enhancement system provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise.  相似文献   

17.

Subjective experiments are considered the most reliable way to assess the perceived visual quality. However, observers’ opinions are characterized by large diversity: in fact, even the same observer is often not able to exactly repeat his first opinion when rating again a given stimulus. This makes the Mean Opinion Score (MOS) alone, in many cases, not sufficient to get accurate information about the perceived visual quality. To this aim, it is important to have a measure characterizing to what extent the observed or predicted MOS value is reliable and stable. For instance, the Standard deviation of the Opinions of the Subjects (SOS) could be considered as a measure of reliability when evaluating the quality subjectively. However, we are not aware of the existence of models or algorithms that allow to objectively predict how much diversity would be observed in subjects’ opinions in terms of SOS. In this work we observe, on the basis of a statistical analysis made on several subjective experiments, that the disagreement between the quality as measured by means of different objective video quality metrics (VQMs) can provide information on the diversity of the observers’ ratings on a given processed video sequence (PVS). In light of this observation we: i) propose and validate a model for the SOS observed in a subjective experiment; ii) design and train Neural Networks (NNs) that predict the average diversity that would be observed among the subjects’ ratings for a PVS starting from a set of VQMs values computed on such a PVS; iii) give insights into how the same NN based approach can be used to identify potential anomalies in the data collected in subjective experiments.

  相似文献   

18.
In this paper, we propose a statistical model-based speech enhancement technique using a multivariate polynomial regression (MPR) based on spectral difference scheme. In the analyzing step, three principal parameters, the weighting parameter in the decision-directed (DD) method, the long-term smoothing parameter for the noise estimation, and the control parameter of the minimum gain value are estimated as optimal operating points technique by using to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. Thus, we apply the MPR technique by incorporating the spectral differences as independent variables in order to estimate the optimal operating points. The MPR technique offers an effective scheme to represent complex nonlinear input-output relationship between the optimal operating points and spectral differences so that operating points can be determined according to various noise conditions in the off-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis through the regression according to the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms.  相似文献   

19.
In last 10 years, several noise reduction (NR) algorithms have been proposed to be combined with the blind source separation techniques to separate speech and noise signals from blind noisy observations. More often, techniques use voice activity detector (VAD) systems for the optimal solution. In this paper, we propose a new backward blind source separation (BBSS) structure that uses the input correlation properties to provide: (i) high convergence rates and good tracking capabilities, since the acoustic environments imply long and time-variant noise paths, and (ii) low misalignment and robustness against different noise type variations and double-talk. The proposed algorithm has an automatic behavior to enhance noisy speech signals, and do not need any VAD systems to separate speech and noise signals. The obtained results in terms of several objective criteria show the good performance properties of the proposed algorithm in comparison with state-of-the-art algorithms.  相似文献   

20.
针对语音编码的音质评价算法性能已十分明确,但对于面罩语音不一定适用。讨论了语音质量评价算法对空气语音与面罩语音在不同噪声环境下的适用性。采用主观意见得分和三种客观评价测度对多种信噪比的带噪语音和增强语音进行评价,包括分段信噪比、改进的巴克谱失真(MBSD)和语音感知质量评价(PESQ),根据与主观评价的一致性判断客观评价方法的适用性。增强算法采用维纳滤波法和对数谱最小均方误差法(LSA-MMSE),噪声采用粉红噪声、海浪噪声。仿真结果表明,语音质量评价算法的适用性与语音类型、信噪比、背景噪声、增强算法种类有关。粉红噪声环境下,PESQ不适合评价经维纳滤波增强的空气语音;MBSD算法只适用于评价经LSA-MMSE增强的面罩语音。海浪噪声环境下,PESQ适用于评价面罩语音,MBSD不适合评价面罩语音。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号