期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘鹏《计算机系统应用》2018,27(12):187-191

提出了低信噪比下高可懂度的基于分段信噪比相对均方根（RMS）的语音增强子空间算法.现有的多数语音增强算法在低信噪比的恶劣条件下,改善带噪语音质量的同时通常会伴有语音可懂度的降低.一个重要原因是这些算法大都仅基于最小均方误差（MMSE）来抑制语音失真,却忽略了语音增强算法所导致的语音失真对差异类型语音分段的可懂度影响程度不同.为了改进这一缺点,提出了基于短时信噪比RMS对语音分段进行分类,然后调整处于信噪比中均方根语音分段的增益矩阵分量,来减小语音失真对增强语音可懂度的影响.客观评价实验说明,改进算法可以改善增强语音可懂度归一化协方差评价法（NCM）的评测值.主观试听实验说明,改进算法的确提升了增强后语音的可懂度. 相似文献

2.

A Dual-Microphone Speech Enhancement Algorithm Based on the Coherence Function 总被引：3，自引：0，他引：3

Yousefian N Loizou PC 《IEEE transactions on audio, speech, and language processing》2011,20(2):599-609

A novel dual-microphone speech enhancement technique is proposed in the present paper. The technique utilizes the coherence between the target and noise signals as a criterion for noise reduction and can be generally applied to arrays with closely-spaced microphones, where noise captured by the sensors is highly correlated. The proposed algorithm is simple to implement and requires no estimation of noise statistics. In addition, it offers the capability of coping with multiple interfering sources that might be located at different azimuths. The proposed algorithm was evaluated with normal hearing listeners using intelligibility listening tests and compared against a well-established beamforming algorithm. Results indicated large gains in speech intelligibility relative to the baseline (front microphone) algorithm in both single and multiple-noise source scenarios. The proposed algorithm was found to yield substantially higher intelligibility than that obtained by the beamforming algorithm, particularly when multiple noise sources or competing talker(s) were present. Objective quality evaluation of the proposed algorithm also indicated significant quality improvement over that obtained by the beamforming algorithm. The intelligibility and quality benefits observed with the proposed coherence-based algorithm make it a viable candidate for hearing aid and cochlear implant devices. 相似文献

3.

Approaching speech intelligibility enhancement with inspiration from Lombard and Clear speaking styles

《Computer Speech and Language》2014,28(2):629-647

Lombard and Clear speech represent two acoustically and perceptually distinct speaking styles that humans employ to increase intelligibility. For Lombard speech, increased spectral energy in a band spanning the range of formants is consistent, effectively augmenting loudness, while vowel space expansion is exhibited in Clear speech, indicating greater articulation. On the other hand, analyses in the first part of this work illustrate that Clear speech does not exhibit significant spectral energy boosting, nor does the Lombard effect invoke an expansion of vowel space. Accordingly, though these two acoustic phenomena are largely attributed with the respective intelligibility gains of the styles, present analyses would suggest that they are mutually exclusive in human speech production. However, these phenomena can be used to inspire signal processing algorithms that seek to exploit and ultimately compound their respective intelligibility gains, as is explored in the second part of this work. While Lombard-inspired spectral shaping has been shown to successfully increase intelligibility, Clear speech-inspired modifications to expand vowel space are rarely explored. With this in mind, the latter part of this work focuses mainly on a novel frequency warping technique that is shown to achieve vowel space expansion. The frequency warping is then incorporated into an established Lombard-inspired Spectral Shaping method that pairs with dynamic range compression to maximize speech audibility (SSDRC). Finally, objective and subjective evaluations are presented in order to assess and compare the intelligibility gains of the different styles and their inspired modifications. 相似文献

4.

Measuring the naturalness of synthetic speech

Howard C. Nusbaum Alexander L. Francis Anne S. Henly 《International Journal of Speech Technology》1995,1(1):7-19

Even the highest quality synthetic speech generated by rule sounds unlike human sppech. As the intelligibility of rule-based synthetic speech improves, and the number of applications for synthetic speech increases, the naturalness of synthetic speech will become an important factor in determining its use. In order to improve this aspect of the quality of synthetic speech it is necessary to have diagnostic tests that can measure naturalness. Currently, all of the available metrics for evaluating the acceptability of synthetic speech do not distinguish sufficiently between measuring overall acceptability (including naturalness) and simply measuring the ability of listeners to extract intelligible information from the signal. In this paper we propose a new methodology for measuring the naturalness of particular aspects of synthesized speech, independent of the intelligibility of the speech. Although naturalness is a multidimensional, subjective quality of speech, this methodology makes it possible to assess the separate contributions of prosodic, segmental, and source characteristics of the utterance. In two experiments, listeners reliably differentiated the naturalness of speech produced by two male talkers and two text-to-speech systems. Furthermore, they reliably differentiated between the two text-to-speech systems. The results of these experiments demonstrate that perception of naturalness is affected by information contained within the smallest part of speech, the glottal pulse, and by information contained within the prosodic structure of a syllable. These results shown that this new methodology does provide a solid basis for measuring and diagnosing the naturalness of synthetic speech. 相似文献

5.

Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion

《Computer Speech and Language》2014,28(2):665-686

This paper describes speech intelligibility enhancement for Hidden Markov Model (HMM) generated synthetic speech in noise. We present a method for modifying the Mel cepstral coefficients generated by statistical parametric models that have been trained on plain speech. We update these coefficients such that the glimpse proportion – an objective measure of the intelligibility of speech in noise – increases, while keeping the speech energy fixed. An acoustic analysis reveals that the modified speech is boosted in the region 1–4 kHz, particularly for vowels, nasals and approximants. Results from listening tests employing speech-shaped noise show that the modified speech is as intelligible as a synthetic voice trained on plain speech whose duration, Mel cepstral coefficients and excitation signal parameters have been adapted to Lombard speech from the same speaker. Our proposed method does not require these additional recordings of Lombard speech. In the presence of a competing talker, both modification and adaptation of spectral coefficients give more modest gains. 相似文献

6.

The listening talker: A review of human and algorithmic context-induced modifications of speech

《Computer Speech and Language》2014,28(2):543-571

Speech output technology is finding widespread application, including in scenarios where intelligibility might be compromised – at least for some listeners – by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output. 相似文献

7.

A Generalized Time–Frequency Subtraction Method for Robust Speech Enhancement Based on Wavelet Filter Banks Modeling of Human Auditory System

Yu Shao Chip-Hong Chang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(4):877-889

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods. 相似文献

8.

A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. 总被引：2，自引：0，他引：2

Yu Shao Chip-Hong Chang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(4):877-889

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods. 相似文献

9.

Evaluation of Objective Quality Measures for Speech Enhancement 总被引：1，自引：0，他引：1

Yi Hu Loizou P.C. 《IEEE transactions on audio, speech, and language processing》2008,16(1):229-238

In this paper, we evaluate the performance of several objective measures in terms of predicting the quality of noisy speech enhanced by noise suppression algorithms. The objective measures considered a wide range of distortions introduced by four types of real-world noise at two signal-to-noise ratio levels by four classes of speech enhancement algorithms: spectral subtractive, subspace, statistical-model based, and Wiener algorithms. The subjective quality ratings were obtained using the ITU-T P.835 methodology designed to evaluate the quality of enhanced speech along three dimensions: signal distortion, noise distortion, and overall quality. This paper reports on the evaluation of correlations of several objective measures with these three subjective rating scales. Several new composite objective measures are also proposed by combining the individual objective measures using nonparametric and parametric regression analysis techniques. 相似文献

10.

Speech enhancement using MMSE estimation under phase uncertainty

Ravikumar Kandagatla P. V. Subbaiah 《International Journal of Speech Technology》2017,20(2):373-385

Most of the speech enhancement algorithms process the amplitudes of speech, but the phase of noisy speech is left unprocessed as it may cause undesired artifacts. Recently, short time Fourier transform based single channel speech enhancement algorithms are developed by considering uncertain prior knowledge of phase. The uncertain knowledge of the phase is obtained from the phase reconstruction algorithms. The goal of this paper is to develop joint minimum mean square error estimate of complex speech coefficients given uncertainty phase (CUP) information by considering Nagakami probability density function (PDF) and gamma PDF as speech spectral amplitude priors and generalized gamma PDF for noise prior. Estimators like amplitudes given uncertainty phase, which uses uncertain phase only for amplitude estimation and not for phase improvement are developed. Experimental results shows that incorporating uncertain phase information improves quality and intelligibility of speech. Also novel phase-blind estimators are developed using Nagakami PDF/gamma as speech priors and generalized gamma as noise prior. Finally comparison of all estimators using uncertain prior phase information is discussed and how initial phase information affects the enhancement process is analyzed with novel estimators. For comparison of all the derived estimators, the speech signals uttered by male and female speakers are taken from TIMIT database. The proposed CUP estimators outperforms the existing algorithms in terms of objective performance measure segmental signal to noise ratio, phase signal to noise ratio, perceptual evaluation of speech quality, short time objective intelligibility. 相似文献

11.

A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments

Navneet Upadhyay Abhijit Karmakar 《International Journal of Speech Technology》2014,17(2):117-132

In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech. 相似文献

12.

Improving listeners' experience for movie playback through enhancing dialogue clarity in soundtracks

《Digital Signal Processing》2016

This paper presents a method for improving users' quality of experience through processing of movie soundtracks. The dialogue clarity enhancement algorithms were introduced for detecting dialogue in movie soundtrack mixes and then for amplifying the dialogue components. The front channel signals (left, right, center) are analyzed in the frequency domain. The selected partials in the center channel signal, which yield high disparity between left and right channels, are detected as dialogue. Subsequently, the dialogue frequency components are boosted to achieve an increased dialogue intelligibility. Techniques for reduction of artifacts in the processed signal are also introduced. It is done through smoothing in the time domain and in the frequency domain, applied to reduce unpleasant artifacts. The results of objective and subjective tests are provided, which prove that an increased dialogue intelligibility is achieved with the aid of the proposed algorithm. The algorithm is particularly applicable in mobile devices while listening in changing conditions and in the presence of noise. 相似文献

13.

Synthesis of Arabic from short sound clusters

Yousif A. El-Imam 《Computer Speech and Language》2001,15(4):355

This article describes an unrestricted vocabulary text-to-speech (TTS) conversion system for the synthesis of Standard Arabic (SA) speech. The system uses short phonetic clusters that are derived from the Arabic syllables to synthesize Arabic. Basic and phonetic variants of the synthesis units are defined after qualitative and quantitative analyses of the phonetics of SA. A speech database of the synthesis units and their phonetic variations is created and the units are tested to control their segmental quality. Besides the types of synthesis unit used, their enhancement with phonetic variants, and their segmental quality control, the production of good quality speech also depends on waveform analysis and the method used to concatenate the synthesis units together. Waveform analysis is needed to condition the selected synthesis units at their junctures to produce synthesized speech of better quality. The types of speech juncture between contiguous units, the phonetic characteristics of the sounds surrounding the junctures and the concatenation artifacts occurring across the junctures are important and will be discussed. The results of waveform analysis and smoothing algorithms will be presented. The intelligibility of synthesized Arabic by a standard intelligibility test method that is adapted to suit the Arabic phonetic characteristics and scoring the results of the tests will also be dealt with. 相似文献

14.

The Effect of Lexical Complexity on Intelligibility

Alexander L. Francis Howard C. Nusbaum 《International Journal of Speech Technology》1999,3(1):15-25

Most intelligibility tests are based on the use of monosyllabictest stimuli. This constraint eliminates the ability to measurethe effects of lexical stress patterns, complex phonotacticorganizations, and morphological complexity on intelligibility.Since these aspects of lexical structure affect speechproduction (e.g., by changing syllable duration), it is likelythat they affect the structure of acoustic-phonetic patterns.Thus, to the extent that text-to-speech systems fail to modifyacoustic-phonetic patterns appropriately in polysyllabic words,intelligibility may suffer. This means that while most standardintelligibility tests may accurately estimate theintelligibility of monosyllabic words, this estimate may notgeneralize as well to predict the intelligibility of words withmore complex lexical structures. The present study was carriedout to measure how words varying in lexical complexity differ inintelligibility. Monosyllabic, bisyllabic, and polysyllabicwords were used varying in morphological complexity(monomorphemic or polymorphemic). Listeners transcribed thesestimuli spoken by two human talkers and two text-to-speechsystems varying in speech quality. The results indicate thatlexical complexity does affect the measured intelligibility ofsynthetic speech and should be manipulated in order toaccurately predict the performance of text-to-speech systemswith unrestricted natural text. 相似文献

15.

Speech energy redistribution for intelligibility improvement in noise based on a perceptual distortion measure

《Computer Speech and Language》2014,28(4):858-872

A speech pre-processing algorithm is presented that improves the speech intelligibility in noise for the near-end listener. The algorithm improves intelligibility by optimally redistributing the speech energy over time and frequency according to a perceptual distortion measure, which is based on a spectro-temporal auditory model. Since this auditory model takes into account short-time information, transients will receive more amplification than stationary vowels, which has been shown to be beneficial for intelligibility of speech in noise. The proposed method is compared to unprocessed speech and two reference methods using an intelligibility listening test. Results show that the proposed method leads to significant intelligibility gains while still preserving quality. Although one of the methods used as a reference obtained higher intelligibility gains, this happened at the cost of decreased quality. Matlab code is provided. 相似文献

16.

An adaptive post-filtering method producing an artificial Lombard-like effect for intelligibility enhancement of narrowband telephone speech

《Computer Speech and Language》2014,28(2):619-628

Post-filtering can be used in mobile communications to improve the quality and intelligibility of speech. Energy reallocation with a high-pass type filter has been shown to work effectively in improving the intelligibility of speech in difficult noise conditions. This paper introduces a post-filtering algorithm that adapts to the background noise level as well as to the fundamental frequency of the speaker and models the spectral effects observed in natural Lombard speech. The introduced method and another post-filtering technique were compared to unprocessed telephone speech in subjective listening tests in terms of intelligibility and quality. The results indicate that the proposed method outperforms the reference method in difficult noise conditions. 相似文献

17.

多通道助听器语音降噪算法研究

奚吉梁瑞宇王国伟仇晓梅马安骏《计算机工程与应用》2014,50(11):237-240

维纳滤波算法是改善噪声环境下听障患者语音理解度的常用算法之一。针对传统维纳滤波算法噪声谱估计偏差大的问题,提出一种基于改进的多通道维纳滤波算法的助听器语音降噪算法。算法首先结合人耳听觉特性和助听器响度补偿的特点,将语音信号进行Gammatone分解为多路子带信号。然后在每个子带内用基于先验信噪比估计的维纳滤波器进行语音增强处理。最后通过综合子带信号,得到增强的语音。此外,为了改善维纳滤波算法噪声谱估计的问题,提出一种基于包络估计的语音活动检测算法,并用于改善维纳滤波性能。实验结果表明,与传统维纳滤波法相比,该方法能更有效地抑制残留噪声,提高语音可懂度,具有较高的实用价值。相似文献

18.

基于解相关变步长的改进型语音增强算法

王瑜琳田学隆高雪利《计算机应用》2013,33(6):1746-1749

复杂环境中噪声干扰严重影响语音信号的质量,无法正确传达语义,因此语音增强处理十分必要。传统语音增强技术存在适应性差、输入信号高度相关时收敛速度慢等问题。综合变步长最小均方(VSSLMS)算法与解相关的优点,提出了一种改进的语音增强算法,优化自适应滤波算法中步长的大小和权矢量的更新方向,提高语音降噪收敛速度。同时算法引入了连续块处理理论归一化权矢量,以提高其在嵌入式系统实现上的稳定性。仿真测试表明该算法收敛速度快、跟踪性能强,能有效去除强噪语音信号中的噪声,提高语音的清晰度与可懂度。相似文献

19.

The effect of bone conduction microphone locations on speech intelligibility and sound quality

McBride M Tran P Letowski T Patrick R 《Applied ergonomics》2011,(3):495-502

This paper presents the results of three studies of intelligibility and quality of speech recorded through a bone conduction microphone (BCM). All speech signals were captured and recorded using a Temco HG-17 BCM. Twelve locations on or close to the skull were selected for the BCM placement. In the first study, listeners evaluated the intelligibility and quality of the bone conducted speech signals presented through traditional earphones. Listeners in the second study evaluated the intelligibility and quality of signals presented through a loudspeaker. In the third study the signals were reproduced through a bone conduction headset; however, signal evaluation was limited to speech intelligibility only. In all three studies, the Forehead and Temple BCM locations yielded the highest intelligibility and quality rating scores. The Collarbone location produced the least intelligible and lowest quality signals across all tested BCM locations. 相似文献

20.

Multi-band frequency compression for reducing the effects of spectral masking

P. N. Kulkarni P. C. Pandey D. S. Jangamashetti 《International Journal of Speech Technology》2007,10(4):219-227

Sensorineural hearing loss is associated with widening of auditory filter bandwidths, leading to increased spectral masking and degraded speech perception. Multi-band frequency compression can be used for reducing the effects of spectral masking. In this technique, the speech spectrum is divided into a number of analysis bands and spectral samples in each of these bands are compressed towards the band center by a constant compression factor. Implementation of the scheme with different types of frequency mappings, bandwidths, and segmentation for processing is investigated. Listening tests conducted for assessing the quality and intelligibility of the processed speech gave best results for critical bandwidth based compression using spectral segment mapping and pitch-synchronous processing. 相似文献