首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 188 毫秒
1.
针对HMM语音合成算法,固定参数的后置滤波器无法适应不同失真程度的频谱导致合成语音自然度下降,提出了一种基于后置滤波器参数自适应的语音合成改进算法。该方法根据语音谱的平坦度自适应选择最优的短时滤波参数来对合成语音频谱的共振峰区域增强;使用长时后置滤波器优化合成语音的基频谐波结构来减轻合成语音基频的不连续性。仿真实验结果表明,该方法能够有效地减轻语音的频谱过平滑,主观测试结果表明,合成语音的自然度得以提高。  相似文献   

2.
语音谱参数的增强双预测多级矢量量化的码本设计方法   总被引:1,自引:0,他引:1  
表征语音谱参数的线性预测编码(LPC)参数被广泛用于各种语音编码算法。甚低位率语音编码算法要求使用尽可能少的位率编码语音谱参数。文章提出了语音谱参数的增强双预测多级矢量量化算法(EDPMSVQ)的码本设计方法。这种改进的多级矢量量化方法充分利用语音谱参数的短时相关和长时相关特性,采用了有记忆的多级矢量量化算法(MSVQ),对语音谱参数的每一维分别使用不同的预测系数;并且通过利用相邻语音帧间语音谱参数的强相关和弱相关的不同特点,采用了分别对应于强相关和弱相关的两个预测值集合,进一步减小了语音谱参数编码位率。增强双预测多级矢量量化方法能够实现20位的语音谱参数近似“透明”量化,同时能够使语音谱参数量化时的计算复杂度略有减少,所需的存储空间大为减少。  相似文献   

3.
针对噪声环境下传统的基音检测算法精度不高的问题,提出一种改进的基音检测算法。采用变步长最小均方(LMS)自适应滤波器对带噪语音信号进行减噪,计算减噪后语音的自相关函数(ACF)归一化值、改进平均幅度差函数(MAMDF)、倒谱,建立非线性组合函数,突出基音周期处的峰值。基于置信区间和相邻语音帧的基频差值进行平滑处理,减少基音提取的错误。仿真结果表明,与传统算法相比,该算法检测准确率提高了至少6.7%,在低信噪比环境下鲁棒性也明显提高。  相似文献   

4.
在二维的时频域网格结构中,相邻点上语音信号的存在与否是相关的,传统的马尔可夫链不能对二维的时频相关性进行自适应的建模.基于语音信号在时频域中的相关性,提出了一种利用二维的相关模型估计语音掩模的方法.该方法将时频域中带噪语音信号的对数功率谱划分为语音和非语音类,利用时域中的状态转移概率和前向因子描述语音信号的时域相关性,同时利用频域中的状态转移概率和邻域因子描述语音信号的频域相关性.通过全局的统计最优化,该模型将时域相关性和频域相关性相结合.给出了该模型的序贯化更新方法,逐帧更新模型并估计语音出现概率.在当前已知对数功率谱和模型参数的条件下,通过最大化后验概率得到的语音信号状态矩阵可以作为语音掩模的最优估计.将该方法与几种现有的语音掩模在线估计方法进行比较,实验结果显示出了该方法的优越性.  相似文献   

5.
参数音频编码中的正弦波参数提取   总被引:1,自引:0,他引:1  
研究了语音编码中基于倒谱的基音检测算法,并把它应用到参数音频编码中,提出了基于倒谱的正弦波参数提取法。本文根据倒谱的特性,结合音频信号的特征,利用前后帧的相关性和各次谐波的能量,有效保证了基频及其各次谐波在较大频率范围内的正确提取。另外结合谐波和单谱线提取,提高了算法效率。  相似文献   

6.
为了有效抑制非平稳背景噪音对语音处理系统的严重干扰,提出了一种基于长短时能量均值的活动语音检测算法.该算法基于两个合理的假设,一个是基于语音隐含成分集的稀疏分解,不但能尽可能地保留含噪语音中的语音信息,还能在一定程度上消除非语音类噪音的干扰;另一个是对上述稀疏分解的语音进行重构,该重构信号中语音段的时域能量高于非语音段的时域能量.在上述两个假设的基础上,采用重构信号的时域能量作为音频特征,以当前帧为中心,并将与其相邻的特定数量帧的短时能量均值作为当前帧的得分值;以当前帧及其之前特定数量帧的长时能量均值作为判决阈值,进而提出了以当前帧的短时能量均值和长时能量均值大小作为判断条件的活动语音检测算法.实验结果显示,该算法能有效地区分低信噪比(平稳噪音和非平稳噪音)条件下的语音和非语音片段,并且其性能优于基于单Gaussian分布的似然比算法.  相似文献   

7.
提出了一种语音主观质量的客观评估算法,该算法在巴克谱的基础上计算原始语音与重建语音之间的失真度,并考虑了弱音帧与噪声帧的存在对语音质量评估的影响。文中同时给出了结合巴克谱失真和弱音与噪声帧比率的语音质量评估公式,并将计算结果与平均意见分(MOS)进行了比较。数值实验表明,本文提出的增强型巴克谱失真测度(IBSD)与MOS之间具有很强的相关性.能客观地评价出语音信号的主观质量,适用于各种语音编码、语音通信系统。  相似文献   

8.
林帆  徐明星 《计算机科学》2006,33(4):164-167
本文探讨了基于时域的语音切分算法,在前人研究的基础上,提出一种改进算法——自适应、前后搜索和检测短时脉冲噪音算法。该算法主要利用语音信号的短时参数,采用统计的方法定出切分所需要的阈值,根据背景音和静音过零率的不同,进一步搜索符合要求的静音帧,同时滤去短时脉冲噪音。实验证明,该算法准确率很高,有很好的鲁棒性,允许误差在60ms的范围内,对于原始语音切分错误率为5.04%;在信噪比(SNR)大于等于2dB的情况下。对带噪语音的切分错误率为10%~20%。  相似文献   

9.
本文研究了一种语音信号处理方法、装置及系统,涉及无线通讯技术领域。该方法包括:获取语音信号,并将该语音信号转换为数字编码流,然后对数字编码流进行时域分割,得到多个子帧,并将设定连续数量个子帧划分为一个超帧,以子帧为单位对每个超帧进行超帧同步和乱序处理,得到加扰处理后的加扰编码信号。由于将语音信号在时域内进行分割,通过增加或缩小每个子帧的时长,以及增加或缩小超帧包含的子帧数量,可以增加乱序的长度和数量,使得乱序的组合数量大大增加,从而能够满足实际需要。另外,与频域分割不同,时域的分割很容易实现,并且可以确保各个子时隙(子帧)之间完全独立,相互不影响,从而提高了语音通信质量。  相似文献   

10.
新一代高效视频编码HEVC(High Efficiency Video Coding)标准中,λ域的码率控制采用自适应码率分配策略,其主要根据前一参考帧对应参考单元的码控参数来调整当前编码单元的码率分配参数。但是,这种码率分配策略下的编码结果不适合于人类视觉特性。提出一种基于空-时域视觉敏感度的码率分配策略,采用一种恰可感知失真JND(Just Noticeable Difference)模型来获取一帧图像空域上的最大视觉失真值。通过对最大失真值计算,得到每个编码树单元CTU(Coding Tree Unit)空域敏感度权重。利用相邻帧在时域上的变化,获取帧级的码率分配比例。根据帧间变化得到自下而上的视觉关注度,从而计算出每个编码树单元在时域上的敏感度权重。结合空域与时域的视觉敏感度,获得基于空-时域的码率分配权重,将码率按照视觉特性合理的分配到每个编码树单元。实验结果显示,该算法可以有效提高码控的主观质量,同时降低码率波动。  相似文献   

11.
针对基音周期检测实时性的要求,提出了基于小波变换的实时语音基音周期检测算法。该算法利用小波变换极值与信号突变点之间的关系,将小波域波形与时域波形相结合,采取自适应基准、多特征参数提取小波系数极大值,并在2.5ms时间内捕捉并检测到新的基音脉冲位置。实验表明,该算法对语音和残差信号取得了较好结果。  相似文献   

12.
A novel and robust pitch estimation method is presented in this paper. The basic idea is to reshape the speech signal using a combination of the dominant harmonic modification (DHM) and data adaptive time domain filtering techniques. The noisy speech signal is filtered within the ranges of fundamental frequencies to obtain the pre-filtered signal (PFS). The dominant harmonic (DH) of the PFS is determined and enhanced its amplitude. Normalized autocorrelation function (NACF) is applied to that modified signal. Then empirical mode decomposition (EMD) based data adaptive time domain filtering is applied to the NACF signal. Partial reconstruction is performed in EMD domain. The pitch period is determined from the partially reconstructed signal. The experimental results show that the proposed method performs better than the other recently developed methods for noisy and clean speech signals in terms of gross and fine pitch errors.  相似文献   

13.
Recent advances in speech coding have made wideband coding feasible at the bit-rates sufficient for mobile communication. Here we propose a novel hybrid harmocic Code Excited Linear Prediction (CELP) scheme for highband coding of band-split scalable wideband codec, where the low-band (0–4?kHz) is critically subsampled and coded selectively using existing narrowband codecs such as 5.4 kbps and 6.3 kbps G.723.1, 8 kbps G.729, and 11.8 kbps G.729E. The high-band signal is divided into stationary mode (SM) and non-stationary mode (NSM) components based on its unique characteristics. In the SM portion, the high-band signal is compressed using a multi-stage coding that combines the sinusoidal model and CELP. The first stage coding applies the damping factor matching pursuit (MP) algorithm without either the Over-Lap-Add (OLA) or smoothly interpolative synthesis schemes and the second stage utilizes CELP with the circular codebook. In the NSM portion, the high-band signals are coded by CELP with both pulse and circular codebooks by applying the complexity-reduced algorithm. To ensure scalability in highband coding, two enhancement layers are used to increase the number of pulses and control the quantizing sinusoidal parameter numbers. This paper describes the new algorithm and discuses novel techniques for efficient bandwidth wideband speech coding and subjective quality performance. For efficient bit allocation and enhanced performance, the pitch of the high-band codec is estimated using the quantized pitch parameter in low-band codec. An informal listening test, rated the subjective speech quality as comparable to that obtainable with G.722.2 as the fullband wideband codec and G.722.2 as the highband codec, the recent standardized band-split wideband codec.  相似文献   

14.
提出了一种基于PAD三维情绪模型的情感语音韵律转换方法。选取了11种典型情感,设计了文本语料,录制了语音语料,利用心理学的方法标注了语音语料的PAD值,利用五度字调模型对情感语音音节的基频曲线建模。在此基础上,利用广义回归神经网络(Generalized Regression Neural Network,GRNN)构建了一个情感语音韵律转换模型,根据情感的PAD值和语句的语境参数预测情感语音的韵律特征,并采用STRAIGHT算法实现了情感语音的转换。主观评测结果表明,提出的方法转换得到的11种情感语音,其平均EMOS(Emotional Mean Opinion Score)得分为3.6,能够表现出相应的情感。  相似文献   

15.
3.2 kbps MMBE声码器的研究   总被引:1,自引:0,他引:1  
提出了一种改进的多带激励(modified multi-band excitation,MMBE)语音压缩编码算法以适用于低码率通信系统。作为一种多带激励(Multi-badn excitation,MBE)编码器,MMBE根据重构信号谱与原始信号谱之间的相似程度来进行基音估计和浊清音判决,而其中基音参数的估计准确性将直接影响编码器的性能。文中提出的MMBE算法采用了一种改进的基音估计算法以及基间  相似文献   

16.
This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETSI) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD.  相似文献   

17.
基于前置滤波和小波变换的带噪语音基音周期检测方法   总被引:10,自引:0,他引:10  
根据语音信号的基音周期范围有限和在声门闭合时刻语音信号出现锐变的特点,提出一种基于前置滤波和小波变换的基音周期检测方法。带噪语音信号经过3阶椭圆低通滤波器滤波后,采用以二次样条小波作为小波函数,进行一级小波变换检测语音信号的锐变点,再计算基音周期。实验表明,本文提出的基音周期检测方法,与平均幅度差函数(AMDF)和自相关函数(ACF)方法相比,提高了提取基音周期的准确率;与多尺度小波变换的基音周期检测方法相比,减小了计算量,削弱了噪声信号和语音的共振峰对基音周期检测的影响。  相似文献   

18.
Low-rate multimode multiband spectral coding of speech   总被引:1,自引:0,他引:1  
At bit rates of 4 kbps and below, conventional time-domain algorithms such as CELP fail to retain high voice quality and robust performance against background noise as their waveform-matching ability is curtailed by the severely limited codebook space. Spectral coding, on the other hand, offers an effective parametric model, amenable for low-rate implementation. Instead of performing waveform matching, spectral coders preserve only the perceptually important spectral attributes of the speech signal. Spectral coding algorithms encompass a broad family of emerging low-rate speech coding techniques, the common goal being the representation of the short-term spectrum of input speech with a limited set of spectral parameters and the synthesis of the output speech with a set of sinusoids. Pitch, frequency-domain voicing information, and a varying number of spectral magnitudes are the usual parameters of spectral coders. In this paper, we present the enhanced multiband excitation (EMBE) coder as an illustration of this new generation of low-rate spectral coders. The distinguishing features of EMBE are: (a) signaladaptive multimode spectral modeling and parameter quantization, (b) two-band signal-adaptive frequency-domain voicing decision, (c) a novel VQ scheme for the efficient encoding of the variable-dimension spectral magnitude vectors at low-rates, and (d) multi-class selective protection of spectral parameters from channel errors. A 4 kbps implementation of the EMBE spectral coding algorithm with 2.9 kbps source coding and 1.1 kbps for channel coding was specifically designed for satellite-based communication systems, targeting good voice quality at low bit rates and robust performance against channel errors. Fundamental concepts of the EMBE spectral coding algorithm, implementation details, and performance comparisons of the 4 kbps EMBE coder with earlier coders are reported.  相似文献   

19.
Prosody modification involves changing the pitch and duration of speech without affecting the message and naturalness. This paper proposes a method for prosody (pitch and duration) modification using the instants of significant excitation of the vocal tract system during the production of speech. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the linear prediction (LP) residual of speech signals by using the property of average group-delay of minimum phase signals. The modification of pitch and duration is achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified residual is used to excite the time-varying filter, whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is good and is without any significant distortion. The proposed method is evaluated using waveforms, spectrograms, and listening tests. The performance of the method is compared with linear prediction pitch synchronous overlap and add (LP-PSOLA) method, which is another method for prosody manipulation based on the modification of the LP residual. The original and the synthesized speech signals obtained by the proposed method and by the LP-PSOLA method are available for listening at http://speech.cs.iitm.ernet.in/Main/result/prosody.html.  相似文献   

20.
This paper proposes modification in the transmission of excitation codevector and its non-zero pulse sign magnitude using “codebook partition and label assignment” approach, which in turn reduces the number of bits required to transmit it through the communication channel in legacy CS-ACELP 8 kbps speech codec. The proposed approach uses the excitation codebook structure of forward mode standard G.729E 11.8 kbps with two non-zero pulses per track which avoids the use of two algebraic codebook structure for forward mode as well as for backward mode of G.729E with least significant pulse replacement approach for finding optimized excitation codevector. Proposed modification in legacy 8 kbps CS-ACELP (80 bits/10 ms) speech codec actuates the bit rate of 10.6 kbps (106 bits/10 ms) with a better objective and subjective analysis in stark contrast with legacy 8 kbps CS-ACELP speech coder and also avoids the switching of codebook modes of standard 11.8 kbps (G.729E) CS-ACELP speech coder. This paper also aims to propose the reduction in the number of searches in the final codevector of excitation structure by considering initial codevector as a final codevector which improves the quality of the speech compared to the output speech quality of legacy G.729 CS-ACELP working at 8 kbps. Both legacy CS-ACELP 8 kbps speech codec and proposed CS-ACELP 10.6 kbps are implemented in MATLAB. Subjective and objective analysis are carried out on a proposed CS-ACELP 10.6 kbps speech codec in order to evaluate its performance and the results obtained are then cross- compared with the results of legacy CS-ACELP (8 kbps) using set of tables and graphs. It is evident from obtained results that both PESQ and MOS scores are quite comparable for each set of wave files even though bitrates are reduced. Consistency and efficiency of proposed algorithm is assured by calculating the population mean of 95% confidence interval based on obtained objective and subjective parameter results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号