首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
WI语音编码中相位信息的量化与重建   总被引:1,自引:0,他引:1  
陈悦  鲍长春 《信号处理》2005,21(1):164-167
在低比特率语音编码中,人们往往认为人耳对相位信息不敏感而忽略了相位对语音质量的影响,导致语音粗糙、刺耳甚至音调发生改变.为了获得高质量的声码器,语音的相位信息是不能不考虑的.本文分析了一种利用感觉加权的相位谱分析合成(A-b-S)矢量量化方法,并在波形内插编码器中对SEW的相位信息进行量化,在合成端采用相位的三次多项式插值方法进行重建.实验发现,该方法大大改善了重建语音效果,明显提高了语音的自然度和清晰度.主观A/B测试结果显示,该方法相比采用老年男子的固定相位法和基于最小相位模型的由幅度谱通过倒谱法重建相位谱的方法,经4~6个比特的相位量化可使合成语音质量得到显著的改善,尤其对女声,改善更为明显.  相似文献   

2.
基于贝叶斯阴阳机的2kb/s NMF-WI语音编码算法   总被引:3,自引:1,他引:2       下载免费PDF全文
郭莉莉  鲍长春 《电子学报》2009,37(5):1146-1153
 本文提出了一种改进型的基于非负矩阵分解(Nonnegative Matrix Factorization,NMF)的特征波形(Characteristic Waveform,CW)分解算法,一方面应用惩罚次胜者竞争学习算法(Rival Penalized Competitive Learning,RPCL)和贝叶斯阴阳机(Bayesian Ying-Yang,BYY)和谐学习算法,来计算NMF分解阶数,在没有明显降低语音质量的前提下,降低了编码器的复杂度;另一方面根据CW 的能量与编码矩阵的能量间的变化关系,提出了相位谱的混合自回归合成方法,提高了语音的自然度.最后,开发出一套改进型2kb/s NMF-WI低复杂度语音编码方法,采用基于K-L散度的NMF迭代算法和收敛速度更快的基矢量Mel刻度分带初始化方法,按照基音周期的统计分布将特征波形分为6类,在CW分解模块,复杂度下降了10MOPS,语音质量提高,与采用4bit散布矢量量化相位谱的2.16kb/s NMF-WI语音编码器的语音质量相当.  相似文献   

3.
基于非负矩阵分解的2kb/s波形内插语音编码算法   总被引:1,自引:0,他引:1       下载免费PDF全文
张鹏  鲍长春  郭莉莉 《电子学报》2008,36(4):632-638
在波形内插(Waveform Interpolation,WI)语音编码器中,如何低延时、高精度并且低复杂度的分解和量化特征波形(Characteristic Waveform,CW)一直是该编码模型的研究热点和难点.本文提出用非负矩阵分解(Non-negative Matrix Factorization,NMF)方法来分解语音特征波形.该分解方法仅需要当前帧的语音信号,不会给编码器带来额外的延时;为了提高分解精度,本文在CW分解之前先对CW按照其子帧的最大基音周期进行分类,然后按不同类别进行分解.另外,本文结合耳蜗模型提出了NMF的基矢量分带初始化算法,将CW的分解精度提高到与二阶奇异值分解相当的水平;为了降低WI编码器的计算复杂度,本文去除了传统WI编码器中的特征波形对齐模块,同时将NMF的分解阶数设定为16以折中CW分解的计算复杂度和分解精度.最后,本文基于矩阵量化技术,对非负矩阵分解后的编码矩阵采用分裂式矩阵量化方案来量化.主观A/B测试表明,本文提出的2kb/s NMF-WI编码器的合成语音质量接近于2.4kb/s SVD-WI编码器.MOS分测试表明,本文提出的2kb/s NMF-WI编码器的合成语音质量稍差于2.4kb/s MELP编码器.  相似文献   

4.
李靓  鲍长春 《信号处理》2004,20(6):545-547
在低速率参数语音编码算法中,如何用有限的比特数有效地量化幅度谱是一个关键问题。本文对波形内插语音编码模型中快渐变波形幅度的量化问题进行了深入研究和分析,提出了一种基于矢量变维和DCT的REW幅度感觉加权量化方案,该方法降低了编码比特数,减少了存储和计算复杂度,增强了编码语音的感性质量。主观听力测试结果表明该量化方案在每帧4比特时的WI语音编解码质量要优于用基于DCT的REW幅度矩阵量化方案在每帧10比特时的重建语音质量。  相似文献   

5.
基于离散余弦变换的波形内插语音编码算法   总被引:2,自引:0,他引:2       下载免费PDF全文
刘靖宇  鲍长春  李如玮 《电子学报》2009,37(7):1599-1605
 针对波形内插(Waveform Interpolation,WI)语音编码的特征波形分解问题,本文首先提出了基于离散余弦变换(Discrete Cosine Transform,DCT)的特征波形分解方法,避免了复杂的特征波形对齐运算;其次,针对WI的相位重建问题,提出了清/浊音相位判决和浊音相位分类的方法,提高了重建语音质量;最后,分别构建了速率为2.0kbps和1.6kbps的DCT-WI声码器.主观MOS分表明,2.0kbps的DCT-WI声码器质量优于2.4kbps MELP声码器,1.6kbps的DCT-WI声码器亦取得了良好的听觉效果.  相似文献   

6.
该文提出了一种特征波形提取速率自适应于输入语音帧特性的波形内插编码方案。基于双加权长时预测增益最大原则并利用前向基音判决实现了较为可靠的基音周期估计算法,用基音周期、浊音度和波表面平坦度决定波形提取速率以及SEW(Slowly Evolving Waveform)和REW(Rapidly Evolving Waveform)的更新速率。实验证明,该文提出的波形内插(WI)编码算法相比固定波形提取速率的WI算法在平均码率和计算复杂度上均有一定程度的降低,且合成语音质量明显优于4.8kbps的CELP语音编码算法。  相似文献   

7.
由于传统特征波形内插语音编码算法对特征波形相位信息的忽略,以及对特征波形的整体对齐,往往造成语音高频谐波分量丢失,从而导致语音的噪声感。为了提高合成语音的质量,该文引入语音多带清浊音标志,并以此为依据对波形内插编码模型中的慢渐变波形和快渐变波形的相位谱进行估计,在语音合成时则对特征波形采取部分对齐的方法,最后提出了一种基于多带的2.4 kbit/s特征波形内插算法。与传统算法相比,新算法明显提高了语音的清晰度。与标准2.4 kbit/sMELP算法相比,该算法合成语音质量亦略显优势。  相似文献   

8.
张鹏  鲍长春 《信号处理》2005,21(Z1):160-163
WI编码器中特征波形(CW-Characteristic Waveform)的分解与量化一直是该编码器研究的热点问题.传统的WI编码器将残差信号表示为渐变的特征波形,然后通过线性相位非因果FIR低通滤波器把CW分解为慢渐变波形和快渐变波形,分别表示语音的准周期成分和类噪声成分.这种分解方法不仅不能完全去除SEW与REW之间的相关性,而且还增加了额外一帧的延时,本文通过对现有的基于奇异值分解(SVD-Singular Value Decomposition)的特征波形分解方法的研究,深入剖析了CW奇异值分解后U、∑和V的物理意义,并提出一种有效降低SVD复杂度的算法.  相似文献   

9.
矢量量化技术是一种既能高效压缩数码率,又能保持语音质量在编码方法,它不但能用于波形编码,而且能用于参数编码,本文主要论述了矢量量化在参数压缩编码中的应用,即应用模拟退火方法设计矢量量化器,对语音cep参数库进行压缩,通过语音倒谱参数库压缩前后,语音正确识别率听变化来评价所设计矢量量化器的性能,文章中提出了适用于语音倒谱参数的模拟退火时间表,对于所涉及的扰动范围,扰动次数方面主要参数进行了一定的探讨  相似文献   

10.
朱娜娜  鲍长春  李靓 《通信学报》2004,25(11):70-76
基于传统的波形内插语音编码模型,提出了一种新的2kbit/s语音编码方案。该方案在编码端去除了传统方法中复杂的对齐运算,在译码端用三次B样条插值取代传统的线性插值。慢渐变波形只量化低频分量,而快渐变波形用正交多项式拟合,并采用合成-分析技术对其进行矢量量化。DRT测试结果表明,该2kbit/s语音编码方法能获得高可懂度的重建语音。  相似文献   

11.
In this paper, block constrained trellis coded vector quantization (BC‐TCVQ) is presented for quantizing the line spectrum frequency parameters of the wideband speech codec. Both a predictive structure and a safety‐net concept are combined into BC‐TCVQ to develop the predictive BC‐TCVQ. The performance of this quantization is compared with that of the linear predictive coding vector quantizer used in the AMR‐WB codec, demonstrating reductions in spectral distortion.  相似文献   

12.
李晓明  鲍长春 《信号处理》2013,29(10):1274-1282
为有效解决现有单一模型编码器无法在中低速率对语音和音频信号进行高质量通用编码的问题,本文借助语音与音频信号的谐波特性,建立了一种对语音和音频信号统一编码的方法。首先,本文利用经验模态分解(Empirical Mode Decomposition, EMD)提取输入信号的谐波成分;其次,利用感知匹配追踪算法,并结合正弦参数建模对谐波成分进行参数提取与量化;第三,对于量化谐波后的残差进行抖动格型矢量量化,以提升重建音频的主观听觉质量,并最终实现一套包含24kbps和32kbps码率的宽带语音与音频通用编码器;最后,对所提算法进行了客观PESQ/PEAQ和主观A/B测试,并与ITU-T G.722.1和G.722.2编码器进行了比较,实验结果表明,所提编码器对语音和音频信号的编码质量均优于参考编码器。   相似文献   

13.
基于奇异值分解的低速率波形内插语音编码算法   总被引:8,自引:7,他引:1       下载免费PDF全文
王贵平  鲍长春  张鹏 《电子学报》2006,34(1):135-140
波形内插(WI)语音编码模型作为当今最具潜力的低速率语音编码方案之一,因其良好的性能,越来越受到人们的重视.本文基于一种奇异值分解(SVD)的特征波形分解方法,利用语音信号的感知特性,将二维特征波形的幅度谱分成基本矩阵、过渡矩阵和补充矩阵,并采用了不同的量化方法,有效地降低了运算复杂度;另外,本文根据语音信号时变特性,将三个矩阵分为三种组合模式表示特征波形幅度谱,并引入周期因子和能量熵来衡量矩阵周期程度,解决了奇异值分解后参数难于量化的问题,提高了编码效率.主观A/B测试表明,本文提出的2.4kbps SVD-WI编码器的重建语音质量略好于2.4kbps MELP编码器.  相似文献   

14.
高质量的4 kb/s散布脉冲CELP语音编码算法   总被引:11,自引:0,他引:11  
鲍长春 《电子学报》2003,31(2):309-313
本文提出了一种散布脉冲CELP(DP-CELP)语音编码算法,激励矢量由特殊结构的代数码书与固定形式的散布脉冲的卷积获得,这种激励源有效地改善了重建语音质量,但未增加代数码书搜索的复杂度.非正式的主观听力测试表明,这种4 kb/s DP-CELP语音编码算法的合成语音质量非常接近G.723.1中6.3 kb/s语音编码器.  相似文献   

15.
We present a new class of nonlinear block codes called source-optimized channel codes (SOCCs), which are particularly designed for parametric source encoding of speech, audio, and video. In contrast to conventional channel codes, the new codes are not optimized for minimizing residual bit-error rate, but maximizing the signal-to-noise ratio of transmitted source codec parameters. The decoding of SOCCs is not based on bit-error correction, but on parameter estimation. We compare SOCCs with other approaches to joint source/channel coding such as channel-optimized vector quantization, channel-constrained vector quantization, unequal error protection, and source-controlled channel decoding. In terms of performance, SOCCs show better robustness if under channel mismatch conditions. For real-world applications, SOCCs are attractive, since the separation of source and channel codec is preserved.  相似文献   

16.
We present a practical video coding algorithm for use at very low bit rates. For efficient coding at very low bit rates, it is important to intelligently allocate bits within a frame, and so a powerful variable-rate algorithm is required. We use vector quantization to encode the motion-compensated residue signal in an H.263-like framework. For a given complexity, it is well understood that structured vector quantizers perform better than unstructured and unconstrained vector quantizers. A combination of structured vector quantizers is used in our work to encode the video sequences. The proposed codec is a multistage residual vector quantizer, with transform vector quantizers in the initial stages. The transform-VQ captures the low-frequency information, using only a small portion of the bit budget, while the later stage residual VQ captures the high-frequency information, using the remaining bits. We used a strategy to adaptively refine only areas of high activity, using recursive decomposition and selective refinement in the later stages. An entropy constraint was used to modify the codebooks to allow better entropy coding of the indexes. We evaluate the performance of the proposed codec, and compare this data with the performance of the H.263-based codec. Experimental results show that the proposed codec delivered significantly better perceptual quality along with better quantitative performance  相似文献   

17.
Design algorithms and simulation results are presented for vector quantizers for Fourier transformed data. Transforming the data prior to quantization has two potential advantages. First, each sample in the transform domain depends on many samples in the original domain. Thus, even scalar quantization in the transform domain is a form of vector quantization or block source coding in the original waveform domain and the basic coding theorems of information theory show that such block codes can provide better performance than scalar codes, even for memoryless sources. Second, vector quantization of Fourier transformed speech waveforms provides distinctly better subjective quality than ordinary vector quantization of the waveform using codes of comparable complexity. While the system is, of course, more complicated due to the need to take Fourier transforms, its envisioned application is as a coder for the output of FFT chips currently available or under development. The proposed implementation of a Fourier transform vector quantizer (FTVQ) uses a product code structure, providing different codes for different coefficient vectors corresponding to different frequency bands. This is a form of subband coding and yields a simple means of optimizing bit allocations among the subcodes. Two coding structures with corresponding distortion measures are considered: those that quantize vectors of pairs of real and imaginary coefficients and those that quantize separate vectors of magnitude and phase coefficients. Both structures yield good performance for the given complexity in comparison to waveform vector quantizers. For speech coding, a magnitude-phase FTVQ yields better subjective quality than a real-imaginary FTVQ when the rate allocation is properly chosen.  相似文献   

18.
贾懋珅  鲍长春 《电子学报》2009,37(10):2291-2297
 基于国际电信联盟标准化组织(ITU-T)编码标准G.729.1,本文提出了一种嵌入式变速率立体声语音与音频编码方法.本算法利用G.729.1和改进的调制叠接变换(Modulated Lapped Transform,MLT)编码技术对输入信号的中值与边带信息进行分层编码,形成具有嵌入式结构的码流.编码器可处理宽带和超宽带的立体声信号,宽带立体声信号编码的最大码率为48kb/s,超宽带立体声信号编码的最大速率为64kb/s.实现结果表明,本编码器的编码质量均达到了ITU-T对G.EV-VBR立体声编码的指标要求.  相似文献   

19.
The authors describe several adaptive block transform speech coding systems based on vector quantization of linear predictive coding (LPC) parameters. Specifically, the authors vector quantize the LPC parameters (LPCVQ) associated with each speech block and transmit the index of the code vector as overhead information. This code vector will determine the short-term spectrum of the block and, in turn, can be used for optimal bit allocation among the transform coefficients. In order to get a better estimate of the speech spectrum, the authors also consider the possibility of incorporating pitch information in the coder. In addition, entropy-coded zero-memory quantization of the transform coefficients is considered as an alternative to Lloyd-Max quantization. An adaptive BTC scheme based on LPCVQ and using entropy-coded quantizers is developed. Extensive simulations are used to evaluate the performance of this scheme  相似文献   

20.
The performance of a vector quantizer can be improved by using a variable-rate code. Three variable-rate vector quantization systems are applied to speech, image, and video sources and compared to standard vector quantization and noiseless variable-rate coding approaches. The systems range from a simple and flexible tree-based vector quantizer to a high-performance, but complex, jointly optimized vector quantizer and noiseless code. The systems provide significant performance improvements for subband speech coding, predictive image coding, and motion-compensated video, but provide only marginal improvements for vector quantization of linear predictive coefficients in speech and direct vector quantization of images. Criteria are suggested for determining when variable-rate vector quantization may provide significant performance improvement over standard approaches  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号