首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
语音质量是评价通信系统的一项重要指标。现有的语音质量感知评估算法采用基于Bark谱的感知模型,其算法复杂度较大,并且对于人耳的频率选择性的模拟存在不足。针对这一问题,本文提出一种新的客观语音质量评估方法,采用更加符合人耳听觉特性的Gammatone滤波器组提取特征参数,计算原始语音与失真语音的平均失真距离,并由主观平均意见分值和归一化平均失真距离之间的映射关系求出客观平均意见分值。实验表明,与感知评估方法相比,本文所提出算法的计算复杂度大大降低,同时保持了客观平均意见分值与主观平均意见分值之间的高相关度。  相似文献   

2.
谐波正弦语音模型因固定帧长不能使每个谐波得到最佳分辨率,而分辨率决定着语音的建模效果。因此提出小波多分辨率的谐波正弦语音模型,将谐波语音信号通过小波变换分解成多分辨率子带信号,利用谐波正弦语音模型对这些子带信号独立建模,将建模后的各子带信号相加合成。仿真实验显示该模型的信号重构误差降低约两个数量级,通过PESQ软件测试得到的MOS分值约提高0.3。  相似文献   

3.
针对传统的小波包语音增强算法增强后的语音失真严重的问题,本文提出了一种基于自适应阈值和新阈值函数的小波包语音增强算法。该算法在小波包域将带噪语音加窗分帧,基于相邻帧快速傅立叶变换功率谱的互相关值,计算各帧存在语音的概率,然后通过语音存在概率对传统通用小波包阈值进行调整,使得阈值在非语音帧中较大,在语音帧中较小,实现阈值的自适应调整,可以在最大程度消除噪声的同时,尽可能的保留语音,减小语音失真。本文还设计了一种新阈值函数,克服了传统硬阈值函数不连续和软阈值函数会带来恒定偏差的缺点,进一步减小了语音失真。本文采用TIMIT 数据库和NOISEX-92 数据库中的语音和噪声进行了大量的模拟实验,主观评比和客观评比结果均证明本文提出的语音增强算法比现有的两种算法有更好的增强效果,采用本文算法增强后的语音失真更小,听觉效果更好。  相似文献   

4.
In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.  相似文献   

5.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

6.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

7.
基于小波理论的多分辨率多传感器数据融合   总被引:1,自引:0,他引:1  
小波变换的多尺度特点非常适合多尺度信号的处理,可以用于多分辨率多传感器数据融合,本文研究了不波变换的特征,提出基于小波包变换的多分辨率多传感器的数据融合算法,算法不需要把小波系数当成白噪声处理,并一能够有效地降低向量和矩阵维数,减少运算,有较好的滤波性能,同时采用双正交小波包变换,这可以克服基于正交小波包变换的多尺度滤波中正交小波因不具有线性相而产生恢复失真的缺陷,进一步提高滤波性能。  相似文献   

8.
In this paper, we proposed a new speech enhancement system, which integrates a perceptual filterbank and minimum mean square error–short time spectral amplitude (MMSE–STSA) estimation, modified according to speech presence uncertainty. The perceptual filterbank was designed by adjusting undecimated wavelet packet decomposition (UWPD) tree, according to critical bands of psycho-acoustic model of human auditory system. The MMSE–STSA estimation (modified according to speech presence uncertainty) was used for estimation of speech in undecimated wavelet packet domain. The perceptual filterbank provides a good auditory representation (sufficient frequency resolution), good perceptual quality of speech and low computational load. The MMSE–STSA estimator is based on a priori SNR estimation. A priori SNR estimation, which is a key parameter in MMSE–STSA estimator, was performed by using “decision directed method.” The “decision directed method” provides a trade off between noise reduction and signal distortion when correctly tuned. The experiments were conducted for various noise types. The results of proposed method were compared with those of other popular methods, Wiener estimation and MMSE–log spectral amplitude (MMSE–LSA) estimation in frequency domain. To test the performance of the proposed speech enhancement system, three objective quality measurement tests (SNR, segSNR and Itakura–Saito distance (ISd)) were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of proposed speech enhancement system. The proposed speech enhancement system provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise.  相似文献   

9.
This paper addresses a novel approach to investigate, study and simulate computation of high band (HB) feature extraction based on linear predictive coding (LPC) and mel frequency cepstral coefficient (MFCC) techniques. Further, HB features are embedded into encoded bitstream of proposed global system for mobile (GSM) full rate (FR) 06.10 coder using joint source coding and data hiding before being transmitted to receiving terminal. At receiver, HB features are extracted to reproduce HB portion of speech and for the same different extension of excitation techniques are applied and their results evaluated in terms of quality (intelligibility and naturalness) and bandwidth. MATLAB based e-test bench is created for implementing the proposed artificial bandwidth extension (ABE) coder following series of simulations, that are carried out to discover and gain insight about the performance of it using subjective [mean opinion score (MOS)] and objective [perceptual evaluation of speech quality (PESQ)] analysis. The results obtained for both the analyses advocate that proposed ABE coder outperforms proposed GSM FR NB (legacy GSM FR) coder. While the fact remains that, compared to LPC based parameterizations over ABE coder, MFCC parameterization results in higher speech intelligibility which is evident from obtained slightly better PESQ and MOS scores.  相似文献   

10.
为了克服低信噪比输入下,语音增强造成清音弱分量损失,导致信号重构失真的问题,提出了一种新的语音增强方法。该方法采用小波包拟合语音感知模型的临界带,按子带能量对语音清浊音分离,然后对清音和浊音信号分别作8层和4层小波包分解,在阈值计算上采用Bark子带小波包自适应节点阈值算法,在Bark子带实时跟踪噪声水平,有效保护清音中高频弱分量,减少失真。通过与传统语音增强方法的仿真对比实验,证实该方法在低信噪比输入时,具有明显优势,输出信噪比高,语音失真度低。将该方法与谱减法相结合,进行语音二次增强,能进一步提高增强语音质量。  相似文献   

11.
针对传统软、硬阈值函数去噪方法增强的语音存在失真的问题,提出一种新阈值函数的小波包语音增强算法,同时给出了新阈值函数和新的Bark尺度小波包分解结构。新阈值函数在小波包系数绝对值大于给定阈值的区间内,灵活地结合了软、硬阈值函数;在小波包系数绝对值小于给定阈值的区间内,用一种非线性函数代替传统阈值函数中的简单置零,实现了阈值函数的平缓过渡;新的60个频带Bark尺度小波包分解结构能更好地模拟人耳的听觉感知特性。仿真实验结果表明,在高斯白噪声和有色噪声背景下,与传统软、硬阈值函数去噪方法相比,新算法有效提高了增强语音信噪比和分段信噪比,减少了语音失真,具有更好的去噪效果。  相似文献   

12.
提出了一种基于最佳小波包变换和SPIHT编码的语音信号压缩编码方法。该方法首先对语音信号进行小波包变换,求解最佳小波树,进行动态位分配,再用改进的SPIHT算法对变换后的小波系数进行压缩编码。并且采用了熵编码的方法进一步提高了压缩比。实验表明,该方法在较高的压缩比下能获得较好的信号重构质量,计算复杂度低,延迟小。  相似文献   

13.
对语音信号直接进行压缩感知处理,通常压缩的效率不高。针对此问题提出了一种基于压缩感知和小波变换的方法,首先用小波变换的方法对语音信号进行级数分解,然后采用压缩感知的方法对小波低频系数进行压缩,并丢弃高频系数,重构语音信号时高频系数用随机信号来取代。采用此种小波变换的方法,与直接采用压缩感知的方法相比,前者的语音信号MOS值稍有降低,但压缩率比直接压缩感知的方法降低了一倍,说明此方法可大大提高压缩的效率。  相似文献   

14.
噪声谱估计算法在单通道语音增强方法中起着重要作用,为了改善噪声谱估计算法对噪声的估计和更新能力,结合最小统计(MS)算法,对改进的基于控制的递归平均(IMCRA)噪声谱估计算法的递归平均参数进行改进,并用一阶递归的方式对平滑功率谱的最小值进行改进。采用谱减法对含噪语音信号作去噪处理,从客观和主观两方面对不同算法的性能进行评价,对比分析不同噪声不同信噪比下增强前后语音的分段信噪比(segSNR)、PESQ得分、MOS得分。实验结果表明,提出的方法能够更好地跟踪噪声信号变化,改善语音质量。  相似文献   

15.
针对基于高斯分布的谱减语音增强算法,增强语音出现噪声残留和语音失真的问题,提出了基于拉普拉斯分布的最小均方误差(MMSE)谱减算法。首先,对原始带噪语音信号进行分帧、加窗处理,并对处理后每帧的信号进行傅里叶变换,得到短时语音的离散傅里叶变换(DFT)系数;然后,通过计算每一帧的对数谱能量及谱平坦度,进行噪声帧检测,更新噪声估计;其次,基于语音DFT系数服从拉普拉斯分布的假设,在最小均方误差准则下,求解最佳谱减系数,使用该系数进行谱减,得到增强信号谱;最后,对增强信号谱进行傅里叶逆变换、组帧,得到增强语音。实验结果表明,使用所提算法增强的语音信噪比(SNR)平均提高了4.3 dB,与过减法相比,有2 dB的提升;在语音质量感知评估(PESQ)得分方面,与过减法相比,所提算法平均得分有10%的提高。该算法有更好的噪声抑制能力和较小的语音失真,在SNR和PESQ评价标准上有较大提升。  相似文献   

16.
Voice over Internet Protocol (VoIP) is one of the fastest growing technologies in the world. In VoIP speech signals are transmitted over the same network used for data communications. The internet is not a robust network and is subjected to delay, jitter, and packet loss. It is very important to measure and monitor the quality of service (QoS) the users experience in VoIP networks; this is not an easy task and usually requires subjective tests. In this paper we have analyzed three non-intrusive models to measure and monitor voice quality using Random Neural Networks (RNN). A RNN is an open queuing network with positive and negative signals. We have assessed the voice quality based on various parameters i.e. delay, jitter, packet loss, and codec. In our approach we have used the Mean Opinion Score (MOS) calculated using a Perceptual Evaluation of Speech Quality (PESQ) algorithm to generate data for training the RNN model. We have studied two feed-forward models and a recurrent architecture. We have found that the simple feed-forward architecture has produced the most accurate results compared to the other two architectures.  相似文献   

17.
This paper proposes modification in the transmission of excitation codevector and its non-zero pulse sign magnitude using “codebook partition and label assignment” approach, which in turn reduces the number of bits required to transmit it through the communication channel in legacy CS-ACELP 8 kbps speech codec. The proposed approach uses the excitation codebook structure of forward mode standard G.729E 11.8 kbps with two non-zero pulses per track which avoids the use of two algebraic codebook structure for forward mode as well as for backward mode of G.729E with least significant pulse replacement approach for finding optimized excitation codevector. Proposed modification in legacy 8 kbps CS-ACELP (80 bits/10 ms) speech codec actuates the bit rate of 10.6 kbps (106 bits/10 ms) with a better objective and subjective analysis in stark contrast with legacy 8 kbps CS-ACELP speech coder and also avoids the switching of codebook modes of standard 11.8 kbps (G.729E) CS-ACELP speech coder. This paper also aims to propose the reduction in the number of searches in the final codevector of excitation structure by considering initial codevector as a final codevector which improves the quality of the speech compared to the output speech quality of legacy G.729 CS-ACELP working at 8 kbps. Both legacy CS-ACELP 8 kbps speech codec and proposed CS-ACELP 10.6 kbps are implemented in MATLAB. Subjective and objective analysis are carried out on a proposed CS-ACELP 10.6 kbps speech codec in order to evaluate its performance and the results obtained are then cross- compared with the results of legacy CS-ACELP (8 kbps) using set of tables and graphs. It is evident from obtained results that both PESQ and MOS scores are quite comparable for each set of wave files even though bitrates are reduced. Consistency and efficiency of proposed algorithm is assured by calculating the population mean of 95% confidence interval based on obtained objective and subjective parameter results.  相似文献   

18.
This paper addresses the problem of single-channel speech enhancement of low (negative) SNR of Arabic noisy speech signals. For this aim, a binary mask thresholding function based coiflet5 mother wavelet transform is proposed for Arabic speech enhancement. The effectiveness of binary mask thresholding function based coiflet5 mother wavelet transform is compared with Wiener method, spectral subtraction, log-MMSE, test-PSC and p-mmse in presence of babble, pink, white, f-16 and Volvo car interior noise. The noisy input speech signals are processed at various levels of input SNR range from ?5 to ?25 dB. Performance of the proposed method is evaluated with the help of PESQ, SNR and cepstral distance measure. The results obtained by proposed binary mask thresholding function based coiflet5 wavelet transform method are very encouraging and shows that the proposed method is much helpful in Arabic speech enhancement than other existing methods.  相似文献   

19.
Dysfluency and stuttering are a break or interruption of normal speech such as repetition, prolongation, interjection of syllables, sounds, words or phrases and involuntary silent pauses or blocks in communication. Stuttering assessment through manual classification of speech dysfluencies is subjective, inconsistent, time consuming and prone to error. This paper proposes an objective evaluation of speech dysfluencies based on the wavelet packet transform with sample entropy features. Dysfluent speech signals are decomposed into six levels by using wavelet packet transform. Sample entropy (SampEn) features are extracted at every level of decomposition and they are used as features to characterize the speech dysfluencies (stuttered events). Three different classifiers such as k-nearest neighbor (kNN), linear discriminant analysis (LDA) based classifier and support vector machine (SVM) are used to investigate the performance of the sample entropy features for the classification of speech dysfluencies. 10-fold cross validation method is used for testing the reliability of the classifier results. The effect of different wavelet families on the classification performance is also performed. Experimental results demonstrate that the proposed features and classification algorithms give very promising classification accuracy of 96.67% with the standard deviation of 0.37 and also that the proposed method can be used to help speech language pathologist in classifying speech dysfluencies.  相似文献   

20.
小波包分解下的多窗谱估计语音增强算法   总被引:1,自引:0,他引:1       下载免费PDF全文
查诚  杨平  潘平 《计算机工程》2012,38(5):291-292
传统谱减法是基于短时傅里叶变换的单一分辨率算法,具有较大方差。为此,提出一种基于小波包分解下的多窗谱估计语音增强算法。将含噪语音在小波包下分解成不同频段,在不同频段下进行多窗谱谱减运算,并逐一进行小波包重构,以得到去噪后的语音信号。仿真结果表明,该算法能提高含噪语音的信噪比,降低语言失真度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号