期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Hybrid Signal-and-Link-Parametric Speech Quality Measurement for VoIP Communications

《IEEE transactions on audio, speech, and language processing》2008,16(8):1579-1589

A hybrid signal-and-link-parametric approach to speech quality measurement for voice-over-Internet protocol (VoIP) communications is described. Connection parameters are used to determine a base quality representative of the transmission link. Degradation factors, computed from perceptual features extracted from the decoded speech signal, are used to quantify distortions not captured by the connection parameters. The algorithm is tested on speech degraded by acoustic noise, temporal clippings, and noise suppression artifacts, thus simulating degradations present in wireless-VoIP tandem connections. Hybrid measurement is shown to overcome the limitations of pure link parametric and pure signal-based measurement methods, resulting in better measurement accuracy for modern VoIP communications. In addition, the proposed algorithm incurs modest computational overhead relative to pure link parametric measurement and attains up to 88% reduction in processing time relative to the ITU-T standard P.563 signal-based algorithm. 相似文献

2.

基于增益字典查询的语音增强算法

庞亮陈亮张翼鹏黄清泉《计算机科学》2015,42(10):16-19

对于基于统计模型的语音增强算法,不同分布模型对应于不同的增益函数,由于语音信号的不确定性,没有一种分布函数能准确对语音和噪声谱的分布建模,因此任何一种固定的统计模型均会存在一定的误差。所以提出一种增益字典查询的语音增强算法,该算法通过采用对数谱失真准则对一个语音噪声库进行增益的训练,得到一个增益的字典,其中输入为先验信噪比和后验信噪比的估计值。最后采用ITU-T P.826 PESQ、分段信噪比、总信噪比和对数谱失真对该算法进行了测试,并与基于高斯分布模型、拉普拉斯分布模型的算法进行了对比。实验结果表明,该算法无论在非平稳噪声还是平稳噪声环境下都比其他几种算法增强效果好,且音乐噪声和残留背景噪声也可以得到很好的抑制。相似文献

3.

Evolutionary speech quality estimation in VoIP

Adil Raja R. M. A. Azad Colin Flanagan Conor Ryan 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(1):89-94

Estimating the quality of Voice over Internet Protocol (VoIP) as perceived by humans is considered a formidable task. This is partly due to the relatively large number of variables that are involved as determinants of quality. Moreover, discerning the significance of one variable over the other is difficult. In this paper a novel approach based on genetic programming (GP) is presented. It maps the effect of network traffic parameters on listeners’ perception of speech quality. The ITU-T Recommendation P.862 (PESQ) algorithm is used as a reference model in this research. The GP discovered models that provide effective VoIP quality estimation are highly correlated to ITU-T Recommendation P.862 (PESQ). They also outperform the ITU-T Recommendation P.563 in estimating the effect that packet loss has on speech quality. The GP discovered models prove suited to real-time and in vivo evaluation of VoIP calls. Additionally, they are deployable on a wide variety of hardware platforms. 相似文献

4.

Low-Complexity, Nonintrusive Speech Quality Assessment

《IEEE transactions on audio, speech, and language processing》2006,14(6):1948-1956

Monitoring of speech quality in emerging heterogeneous networks is of great interest to network operators. The most efficient way to satisfy such a need is through nonintrusive, objective speech quality assessment. In this paper, we describe a low-complexity algorithm for monitoring the speech quality over a network. The features used in the proposed algorithm can be computed from commonly used speech-coding parameters. Reconstruction and perceptual transformation of the signal is not performed. The critical advantage of the approach lies in generating quality assessment ratings without explicit distortion modeling. The results from the performed experiments indicate that the proposed nonintrusive objective quality measure performs better than the ITU-T P.563 standard. 相似文献

5.

Improved automatic detection of creak

John Kane Thomas Drugman Christer Gobl 《Computer Speech and Language》2013,27(4):1028-1047

This paper describes a new algorithm for automatically detecting creak in speech signals. Detection is made by utilising two new acoustic parameters which are designed to characterise creaky excitations following previous evidence in the literature combined with new insights from observations in the current work. In particular the new method focuses on features in the Linear Prediction (LP) residual signal including the presence of secondary peaks as well as prominent impulse-like excitation peaks. These parameters are used as input features to a decision tree classifier for identifying creaky regions. The algorithm was evaluated on a range of read and conversational speech databases and was shown to clearly outperform the state-of-the-art. Further experiments involving degradations of the speech signal demonstrated robustness to both white and babble noise, providing better results than the state-of-the-art down to at least 20 dB signal to noise ratio. 相似文献

6.

针对发音质量评测的声学模型优化算法

严可魏思戴礼荣《中文信息学报》2013,27(1):98-108

在发音质量评测研究中,传统仅用发音标准的数据进行声学建模,难以描述实际测试面临的非标准发音,使得训练与测试的失配在所难免。针对上述问题,该文提出一种利用覆盖各种发音的数据,根据最小化机器分与人工分均方误差准则进行声学模型优化的算法。实验在普通话水平考试现场3 685份数据(其中498份测试,3 187份训练)上进行。实验表明采用优化算法得到的针对发音质量的评测声学模型相比传统建模方式得到的声学模型有显著的优势。相似文献

7.

Modelling non-stationary noise with spectral factorisation in automatic speech recognition

Antti Hurmalainen Jort F. Gemmeke Tuomas Virtanen 《Computer Speech and Language》2013,27(3):763-779

Speech recognition systems intended for everyday use must be able to cope with a large variety of noise types and levels, including highly non-stationary multi-source mixtures. This study applies spectral factorisation algorithms and long temporal context for separating speech and noise from mixed signals. To adapt the system to varying environments, noise models are acquired from the context, or learnt from the mixture itself without prior information. We also propose methods for reducing the size of the bases used for speech and noise modelling by 20–40 times for better practical applicability. We evaluate the performance of the methods both as a standalone classifier and as a signal-enhancing front-end for external recognisers. For the CHiME noisy speech corpus containing non-stationary multi-source household noises at signal-to-noise ratios ranging from +9 to ?6 dB, we report average keyword recognition rates up to 87.8% using a single-stream sparse classification algorithm. 相似文献

8.

一种改进型谱减算法的语音增强研究

王路露刘光灿夏旭《计算机工程与应用》2014,50(19):210-213

针对低信噪比条件下基本谱减算法存在降噪效果不佳,产生音乐噪声过大,语音可懂度不高的问题,提出了一种改进型的谱减算法。算法先计算语音信号的倒谱距离值,检测出噪音段和语音段,用动态计算的噪声值代替基本谱减法采用的噪声统计均值;根据当前帧和噪声帧的倒谱距离比值动态设置谱减系数,改进了传统算法中谱减系数保持不变的缺点;同时采用三种方法抑制音乐噪声。仿真实验表明,在低信噪比情况下,改进型的谱减算法可以有效降噪,提高信噪比和可懂度,达到语音增强的目的。相似文献

9.

Recognition of human speech phonemes using a novel fuzzy approach

《Applied Soft Computing》2007,7(3):828-839

相似文献

10.

Improving speech understanding in communication headsets: Simulation of adaptive subband processing for speech in noise

《International Journal of Industrial Ergonomics》2013,43(6):526-535

Speech communication headsets are necessary for many high-noise environments to maintain interaction between individuals and facilitate safe working conditions. However, current hearing protection devices intended to protect hearing health can impede speech communication or expose persons to sound pressure levels (SPLs) that could lead to excessive noise exposure if a communication signal is presented improperly. This paper explores an adaptive subband communication algorithm, based on a delayless subband active noise reduction architecture, intended to adjust the communication channel gain to provide an appropriate speech signal power in relation to the instantaneous environmental noise power. The method monitors SPLs underneath the ear cup of a communication headset to provide a target speech signal-to-noise ratio without exceeding safe noise exposure thresholds. A series of computer simulations derived from a real-world communication headset model are used to compare the method developed with a traditional passive attenuation headset and a commercial active noise reduction design. The simulations demonstrate the ability of the adaptive subband communication algorithm to adjust automatically the speech signal gain for improved intelligibility while maintaining healthy noise exposure levels.Relevance to industryThe electro-acoustic performance of an active speech communication headset is explored by simulation. The concept integrates a subband active noise control algorithm with an adaptive gain control structure to improve speech intelligibility in a noisy environment. The concept automatically selects appropriate communication channel gain levels without exceeding hearing damage thresholds or requiring user input, and is directly applicable to a practical device. 相似文献

11.

Environment dependent noise tracking for speech enhancement

Nitish Krishnamurthy John H. L. Hansen 《International Journal of Speech Technology》2013,16(3):303-312

Numerous efforts have focused on the problem of reducing the impact of noise on the performance of various speech systems such as speech recognition, speaker recognition, and speech coding. These approaches consider alternative speech features, improved speech modeling, or alternative training for acoustic speech models. This study presents an alternative viewpoint by approaching the same problem from the noise perspective. Here, a framework is developed to analyze and use the noise information available for improving performance of speech systems. The proposed framework focuses on explicitly modeling the noise and its impact on speech system performance in the context of speech enhancement. The framework is then employed for development of a novel noise tracking algorithm for achieving better speech enhancement under highly evolving noise types. The first part of this study employs a noise update rate in conjunction with a target enhancement algorithm to evaluate the need for tracking in many enhancement algorithms. It is shown that noise tracking is more beneficial in some environments than others. This is evaluated using the Log-MMSE enhancement scheme for a corpus of four noise types consisting of Babble (BAB), White Gaussian (WGN), Aircraft Cockpit (ACN), and Highway Car (CAR) using the Itakura-Saito (IS) (Gray et al. in IEEE Trans. Acoust. Speech Signal Process. 28:367–376, 1980) quality measure. A test set of 200 speech utterances from the TIMIT corpus are used for evaluations. The new Environmentally Aware Noise Tracking (EA-NT) method is shown to be superior in comparison with the contemporary noise tracking algorithms. Evaluations are performed for speech degraded using a corpus of four noise types consisting of: Babble (BAB), Machine Gun (MGN), Large Crowd (LCR), and White Gaussian (WGN). Unlike existing approaches, this study provides an effective foundation for addressing noise in speech by emphasizing noise modeling so that available resources can be used to achieve more reliable overall performance in speech systems. 相似文献

12.

一种基于EEMD域统计模型的话音激活检测算法

吴其前张雄伟《数据采集与处理》2012,27(1)

该文提出了一种基于EEMD域统计模型的话音激活检测算法。算法首先利用总体平均经验模态分解(Ensemble Empirical Mode Decomposition,EEMD)对带噪语音进行分解,得到信号的本征模式函数(Intrinsic Mode Function,IMF)分量,选择与原信号的相关性最高的两个分量相加组成主分量;然后对主分量进行频域分解,引入统计模型,求出EEMD域特征参数;最后利用噪声与语音的EEMD域特征参数的不同来进行语音激活检测。实验结果表明,在不同信噪比情况下,本文算法性能优于目前常用的 VAD算法,特别在噪声强度大时体现出明显的优势。相似文献

13.

HMM-Based Gain Modeling for Enhancement of Speech in Noise

David Y. Zhao W. Bastiaan Kleijn 《IEEE transactions on audio, speech, and language processing》2007,15(3):882-892

Accurate modeling and estimation of speech and noise gains facilitate good performance of speech enhancement methods using data-driven prior models. In this paper, we propose a hidden Markov model (HMM)-based speech enhancement method using explicit gain modeling. Through the introduction of stochastic gain variables, energy variation in both speech and noise is explicitly modeled in a unified framework. The speech gain models the energy variations of the speech phones, typically due to differences in pronunciation and/or different vocalizations of individual speakers. The noise gain helps to improve the tracking of the time-varying energy of nonstationary noise. The expectation-maximization (EM) algorithm is used to perform offline estimation of the time-invariant model parameters. The time-varying model parameters are estimated online using the recursive EM algorithm. The proposed gain modeling techniques are applied to a novel Bayesian speech estimator, and the performance of the proposed enhancement method is evaluated through objective and subjective tests. The experimental results confirm the advantage of explicit gain modeling, particularly for nonstationary noise sources 相似文献

14.

基于ＥＥＭＤ域统计模型的话音激活检测算法

吴其前张雄伟《数据采集与处理》2012,27(1):51-56

提出了一种基于EEMD域统计模型的话音激活检测算法。算法首先利用总体平均经验模态分解(Ensemble empirical mode decomposition,EEMD)对带噪语音进行分解,得到信号的本征模式函数(Intrinsicmode function,IMF)分量,选择与原信号的相关性最高的两个分量相加组成主分量;然后对主分量进行频域分解,引入统计模型,求出EEMD域特征参数;最后利用噪声与语音的EEMD域特征参数的不同来进行语音激活检测。实验结果表明,在不同信噪比情况下,本文算法性能优于目前常用的VAD算法,特别在噪声强度大时体现出明显的优势。相似文献

15.

Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion

《Computer Speech and Language》2014,28(2):665-686

This paper describes speech intelligibility enhancement for Hidden Markov Model (HMM) generated synthetic speech in noise. We present a method for modifying the Mel cepstral coefficients generated by statistical parametric models that have been trained on plain speech. We update these coefficients such that the glimpse proportion – an objective measure of the intelligibility of speech in noise – increases, while keeping the speech energy fixed. An acoustic analysis reveals that the modified speech is boosted in the region 1–4 kHz, particularly for vowels, nasals and approximants. Results from listening tests employing speech-shaped noise show that the modified speech is as intelligible as a synthetic voice trained on plain speech whose duration, Mel cepstral coefficients and excitation signal parameters have been adapted to Lombard speech from the same speaker. Our proposed method does not require these additional recordings of Lombard speech. In the presence of a competing talker, both modification and adaptation of spectral coefficients give more modest gains. 相似文献

16.

基于隐马尔可夫模型的语音激活检测算法

李强陈浩陈丁当《计算机应用》2016,36(11):3212-3216

针对现有基于隐马尔可夫模型（HMM）的语音激活检测（VAD）算法对噪声的跟踪性能不佳的问题,提出采用Baum-Welch算法对具有不同特性的噪声进行训练,并生成相应噪声模型,建立噪声库的方法。在语音激活检测时,根据待测语音背景噪声的不同,动态地匹配噪声库中的噪声模型;同时,为了适应语音信号的实时处理,降低了语音参数提取的复杂度,并对判决阈值提出改进,以保证语音信号帧间的相关性。在不同噪声环境下对改进算法进行性能测试并与自适应多速率编码（AMR）标准、国际电信联盟电信标准分局（ITU-T）的G.729B标准比较,测试结果表明,改进算法在实时语音信号处理中能够有效提高检测的准确率及噪声跟踪能力。相似文献

17.

A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

Jinyu Li Li Deng Dong Yu Yifan Gong Alex Acero 《Computer Speech and Language》2009,23(3):389-405

In this paper, we present our recent development of a model-domain environment robust adaptation algorithm, which demonstrates high performance in the standard Aurora 2 speech recognition task. The algorithm consists of two main steps. First, the noise and channel parameters are estimated using multi-sources of information including a nonlinear environment-distortion model in the cepstral domain, the posterior probabilities of all the Gaussians in speech recognizer, and truncated vector Taylor series (VTS) approximation. Second, the estimated noise and channel parameters are used to adapt the static and dynamic portions (delta and delta–delta) of the HMM means and variances. This two-step algorithm enables joint compensation of both additive and convolutive distortions (JAC). The hallmark of our new approach is the use of a nonlinear, phase-sensitive model of acoustic distortion that captures phase asynchrony between clean speech and the mixing noise.In the experimental evaluation using the standard Aurora 2 task, the proposed Phase-JAC/VTS algorithm achieves 93.32% word accuracy using the clean-trained complex HMM backend as the baseline system for the unsupervised model adaptation. This represents high recognition performance on this task without discriminative training of the HMM system. The experimental results show that the phase term, which was missing in all previous HMM adaptation work, contributes significantly to the achieved high recognition accuracy. 相似文献

18.

一种改进的单声道混合语音分离方法

李鹏关勇刘文举徐波《自动化学报》2009,35(8):1087-1093

在回顾了基于语音客观质量评估和计算听觉场景分析的单声道混合语音分离方法的基础上, 针对该方法所采用的ITU-T P.563语音客观质量评估标准存在的使用限制以及计算量大的缺点, 提出了一种采用基于时域包络表示的语音客观质量评估算法来替代P.563算法的单声道混合语音分离方法. 该方法在几乎不降低原方法分离性能的前提下, 大大节约了算法运行所需的时间和资源消耗. 相似文献

19.

A novel fast nonstationary noise tracking approach based on MMSE spectral power estimator

《Digital Signal Processing》2019

Estimating the noise power spectral density (PSD) from the corrupted speech signal is an essential component for speech enhancement algorithms. In this paper, a novel noise PSD estimation algorithm based on minimum mean-square error (MMSE) is proposed. The noise PSD estimate is obtained by recursively smoothing the MMSE estimation of the current noise spectral power. For the noise spectral power estimation, a spectral weighting function is derived, which depends on the a priori signal-to-noise ratio (SNR). Since the speech spectral power is highly important for the a priori SNR estimate, this paper proposes an MMSE spectral power estimator incorporating speech presence uncertainty (SPU) for speech spectral power estimate to improve the a priori SNR estimate. Moreover, a bias correction factor is derived for speech spectral power estimation bias. Then, the estimated speech spectral power is used in “decision-directed” (DD) estimator of the a priori SNR to achieve fast noise tracking. Compared to three state-of-the-art approaches, i.e., minimum statistics (MS), MMSE-based approach, and speech presence probability (SPP)-based approach, it is clear from experimental results that the proposed algorithm exhibits more excellent noise tracking capability under various nonstationary noise environments and SNR conditions. When employed in a speech enhancement system, improved speech enhancement performances in terms of segmental SNR improvements (SSNR+) and perceptual evaluation of speech quality (PESQ) can be observed. 相似文献

20.

基于噪声估计的二值掩蔽语音增强算法

下载免费PDF全文

曹龙涛李如玮鲍长春吴水才《计算机工程与应用》2015,51(17):222-227

针对现有的助听器语音增强算法在非平稳噪声环境下,残留大量背景噪声的同时还引入了“音乐噪声”,致使增强语音可懂度和信噪比不理想等问题。提出了一种基于噪声估计的二值掩蔽语音增强算法,该算法利用人耳听觉感知理论,结合人耳的听觉特性和耳蜗的工作机理。采用最小值控制递归平均（Minima-Controlled Recursive Averaging,MCRA）算法获得估计噪声和初步增强语音;将估计噪声和初步增强语音分别通过可以模拟人工耳蜗模型的gammatone滤波器组进行滤波处理,得到各自的时频表示形式;利用人耳的听觉掩蔽特性,计算含噪语音在时频域的二值掩蔽;利用二值掩蔽得到增强语音。实验结果表明：该算法很大程度上去除了谱减法引入的“音乐噪声”,与基于MCRA谱减法相比,增强语音的语言可懂度指数（Speech Intelligibility Index,SII）、主观语音质量评估（Perceptual Evaluation of Speech Quality,PESQ）和信噪比（Signal to Noise Ratio,SNR）都得到了提高。相似文献