首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 203 毫秒
1.
薛二娟  鲍长春  李如玮 《电子学报》2010,38(7):1574-1579
 本文针对波形内插(WI)语音编码模型和参数量化等技术进行了研究,并最终提出了一种基于二维非负矩阵分解的1kb/s波形内插(2DNMF-WI)语音编码算法. 文中采用二维非负矩阵分解(2D-NMF)方法来分解语音特征波形(CW),该分解方法在行和列两个方向上同时压缩CW幅度谱矩阵的维数,使得CW幅度谱矩阵降维后得到的编码矩阵维数较小,易于量化. 此外,在甚低速率语音编码中,由于没有足够的比特数来描述编码参数,往往很难得到高质量的合成语音. 本算法采用两帧联合编码、帧间后向预测三级矢量量化、离散余弦变换(DCT)和分裂式矩阵量化等技术来降低编码速率和改善音质. 非正式主观听觉测试显示,1kb/s 2DNMF-WI编码器合成语音的质量稍差于2kb/s的NMF-WI语音编码算法.  相似文献   

2.
宽带ISF参数的非等系数帧间预测分裂矢量量化方法   总被引:1,自引:0,他引:1  
李海婷  鲍长春 《电子学报》2008,36(6):1214-1217
 本文提出了一种新的适用于宽带语音编码ISF参数量化的非等系数帧间预测分裂矢量量化方案.该量化方案利用ISF参数的帧间相关性,基于预测分裂矢量量化原理,首先对待量化的ISF参数矢量进行去均值和非等系数帧间预测,然后对去均值后的ISF参数的预测残差进行分裂矢量量化.实验表明,该算法在每帧编码比特数为46bits时达到了透明量化,且平均谱失真比G.722.2中ISF参数量化的平均谱失真小.  相似文献   

3.
该文提出了一种特征波形提取速率自适应于输入语音帧特性的波形内插编码方案。基于双加权长时预测增益最大原则并利用前向基音判决实现了较为可靠的基音周期估计算法,用基音周期、浊音度和波表面平坦度决定波形提取速率以及SEW(Slowly Evolving Waveform)和REW(Rapidly Evolving Waveform)的更新速率。实验证明,该文提出的波形内插(WI)编码算法相比固定波形提取速率的WI算法在平均码率和计算复杂度上均有一定程度的降低,且合成语音质量明显优于4.8kbps的CELP语音编码算法。  相似文献   

4.
语音编码和声码器语音编码一般可分为波形编码和参数编码两大类。波形编码就是在某种保真度准则下尽力降低量化每个语音样本的比特数,同时保持相对好的语音质量。因为这类技术是针对语音波形进行的,所以称之为“波形编码”。波形编码一般需要很高的比特数(如16kbp...  相似文献   

5.
一种基于图像内容重要性的缓存器控制策略   总被引:2,自引:0,他引:2  
本文提出了一种基于图像内容重要性的缓存器控制策略。极低码率视频编码时,对于头肩图像,根据每帧的运动估值结果和可用比特数决定脸部区和其他区的量化级组合,给予脸部区较细的量化。该策略的在使每帧所用比特数基本与目标比特数相符的同时,保证了脸部区的重建图像质量。  相似文献   

6.
王军  张连海  屈丹 《通信技术》2009,42(10):204-206
宽带语音编码中普遍使用导抗谱频率描述声道。利用转换分类差矢量分裂矢量量化方法对导抗谱频率进行量化,该方法基于转换分类矢量量化及差值分裂矢量量化。IsF矢量先按照给出的码书分类,然后每一类中的差矢量再进行分裂矢量量化。实验结果表明,该算法可在每帧编码比特数为37时达到透明量化要求,并且码书存储量明显少于StephenSo等人给出的转换分类分裂矢量量化方法。  相似文献   

7.
朱娜娜  鲍长春  李靓 《通信学报》2004,25(11):70-76
基于传统的波形内插语音编码模型,提出了一种新的2kbit/s语音编码方案。该方案在编码端去除了传统方法中复杂的对齐运算,在译码端用三次B样条插值取代传统的线性插值。慢渐变波形只量化低频分量,而快渐变波形用正交多项式拟合,并采用合成-分析技术对其进行矢量量化。DRT测试结果表明,该2kbit/s语音编码方法能获得高可懂度的重建语音。  相似文献   

8.
肖强  陈亮  朱涛  黄建军 《信号处理》2011,27(4):563-568
为实现高质量的极低速语音编码,提出一种基于压缩感知理论的线谱对(LSP)参数降维量化算法。编码端利用压缩感知理论对超帧LSP高维矢量进行降维处理,将原始LSP参数投影到低维空间,得到低维测量值,然后采用分裂矢量量化算法对测量值进行量化;解码端以量化后的测量值为已知条件,利用正交匹配追踪算法重构出原始LSP高维矢量。实验结果表明,本算法相对低速语音编码中的矩阵量化方案,平均谱失真降低了0.23dB,相对基于DCT变换的降维量化方案,平均谱失真降低了0.13dB。这种先降维再量化的思想可以大幅减少编码所需的比特数及码本存储复杂度,有效降低语音编码速率,并且合成语音可懂度、自然度较高,音质虽有所失真,但基本上感觉不到明显的听觉质量下降。   相似文献   

9.
该文基于LPC的自适应前后向量化技术,提出了一种可变速率的混合激励线性预测MELP语音编码算法。该算法中,采用当前语音帧(前向LPC)或前面某帧已合成语音帧(后向LPC)进行线性预测,当采用后向LPC时,只需传输时间序列编码,故减少了LPC系数的平均编码比特。计算机模拟表明,该算法与标准MELP算法合成的语音质量相当,但显著减少了LPC的传输带宽,从而明显降低了MELP平均编码速率。  相似文献   

10.
该文分析了编码转换缓冲区的状态,导出了编码转换缓冲区为防止解码器缓冲区下溢和上溢应满足的条件,建立了序列图像编码转换模型。并根据编码转换缓冲器的状态和信道速率,为待编码帧在图像层上预分配目的序列图像编码比特数,使用DCT系数分布特性来表征图像特性。继而为帧内每一具体宏块选定最佳量化因子,提出了基于最佳量化的码率控制策略,模拟实验表明,该码率控制策略能有效地减少、避免缓冲区出现上、下溢的情况,使输出码率趋于稳定,提高了重建序列图像的信噪比。  相似文献   

11.
This paper describes a new audio coding scheme based on adaptive wavelet analysis that provides transparent audio coding for CD-audio signals at low bit rates (≈1.4 bits/sample per channel). A new perceptual cost function is defined to obtain the best wavelet-packet base for each audio frame. The sharp variations in quantization noise that appear at the border of the frames are minimized by a novel approach that avoids overlapping. The proposed coder guarantees high perceptual quality using filters that generate wavelets of any compact support, because a bit-allocation algorithm that takes into account the equivalent filter frequency responses of the synthesis filter bank branches is used.  相似文献   

12.
Adaptive image coding with perceptual distortion control   总被引:6,自引:0,他引:6  
This paper presents a discrete cosine transform (DCT)-based locally adaptive perceptual image coder, which discriminates between image components based on their perceptual relevance for achieving increased performance in terms of quality and bit rate. The new coder uses a locally adaptive perceptual quantization scheme based on a tractable perceptual distortion metric. Our strategy is to exploit human visual masking properties by deriving visual masking thresholds in a locally adaptive fashion. The derived masking thresholds are used in controlling the quantization stage by adapting the quantizer reconstruction levels in order to meet the desired target perceptual distortion. The proposed coding scheme is flexible in that it can be easily extended to work with any subband-based decomposition in addition to block-based transform methods. Compared to existing perceptual coding methods, the proposed perceptual coding method exhibits superior performance in terms of bit rate and distortion control. Coding results are presented to illustrate the performance of the presented coding scheme.  相似文献   

13.
This paper presents a rate-distortion derived transform trellis coding (TTC) scheme with applications to Gaussian AR sources and speech data. The optimal encoder consists of a Karhunen-Loeve transform (KLT) on the source output, followed by a search on a trellis structured random code, where the decoder is a time-variant nonlinear filter. The scheme is implementable and applicable to stationary Gaussian sources with a bounded and continuous power spectrum and the squared error distortion measure. The code construction is based on the power or eigenvalue spectrum of the source with no restriction on the coding rate. The TTC scheme is first applied to encode a Gaussian AR source often used to model speech. Simulations were conducted at several rates, using an optimal KLT and the suboptimal discrete cosine transform (DCT). Results demonstrate that the DCT performs as well as the KLT, and both yield average distortions very close to the distortion-rate function. For speech data, an adaptive version of the DCT TTC scheme is applied to encode two speech sentences at several coding rates. The adaptation is controlled by an estimate of the short-term eigenvalue spectrum which is transmitted as side information to the receiver. The proposed scheme is a very efficient speech waveform coder that provides reconstructed speech with very high signal-to-noise ratio values and very good perceptual quality at low bit rates.  相似文献   

14.
一种用于WI语音编码的相位预测式矢量量化方法   总被引:1,自引:0,他引:1  
陈悦  鲍长春 《电子与信息学报》2007,29(11):2672-2675
在传统的低比特率语音编码中,考虑到人耳对相位信息不敏感而经常忽略相位信息,这将导致语音粗糙、刺耳甚至音调发生改变。为了获得高质量的声码器,语音的相位信息是不能不考虑的。该文在散布相位矢量量化方法的基础上进一步去除了相位冗余,在波形内插(Waveform Interpolation,WI)编码模型中对相邻帧慢渐变波形(Slowly Evolving Waveform,SEW)的相位谱差值进行预测式矢量量化。实验发现,该方法大大改善了重建语音效果,明显提高了语音的自然度和清晰度。主观A/B测试结果显示,该方法与固定相位法相比,经4~6 bit的相位量化可使合成语音质量得到显著的改善,相比散布相位矢量量化方法,女声的语音合成质量有所改进。  相似文献   

15.
基于准KLT域的线谱对参数压缩感知量化研究   总被引:1,自引:1,他引:0  
用尽可能少的比特数实现线谱对(LSP)参数透明量化一直是语音编码领域的研究热点。该文基于压缩感知理论,研究了LSP参数在准KLT域的稀疏性,并设计了LSP参数先压缩感知再矢量量化的方案。编码端,利用压缩感知理论,在准KLT域将原始LSP参数投影到低维空间,得到低维测量值,而后采用分裂矢量量化算法对测量值进行量化;解码端,以量化后的测量值为已知条件,利用正交匹配追踪算法重构出原始LSP高维矢量,重构值作为最终量化值。实验结果表明,算法在适当的码本存储量和搜索复杂度下,达到透明量化效果所需的比特数最优时仅需5 bit/帧。  相似文献   

16.
The logarithmic companding technique has shown to be extremely useful in speech quantization with rate of 8 bits/sample. However, for lower bit rates it is not the ideal solution for high quality speech coding. Because of that, in this paper we establish source coding scheme which enables better spectrum efficiency for input that has a large dynamic range. Since our wish is also to improve signal quality in comparison with quality defined with standards G.711 and G.712, we opt for adaptive technique application to the speech coding. Our research shows that proper design of forward gain-adaptive polar quantization can enable compression of about 1 bit/sample as well as significantly better quality than in case of using coder designed according to standard G.711. Furthermore, performances can be sustained over the whole speech dynamic range. Also, if the requisite speech quality is not supposed to be lower than G.712 standard quality, the achieved compression can be almost 1.5 bits/sample. Besides, we propose knew simple encoding rule which can additionally reduce bit rate for 0.1 bit/sample.  相似文献   

17.
Bit rate control has been proved to be an important technique used to transmit the variable rate bit stream over a fixed rate channel in MPEG coding. In this paper, a hybrid algorithm is proposed to achieve the rate control and obtain a good picture quality. In the picture layer, the algorithm estimates numbers of bits required for different type of frames. In the macroblock layer, it utilizes the predictive and adaptive perceptual techniques in the regulation of the quantization scale to allocate the target bits. In the upper and left boundary of the picture, the macroblocks are quantized with the perceptual basis to adjust the quantization scale and in the other part of the picture a predictive method is used to update the quantization scale. By using the proposed scheme, the distortion of coding is uniformly distributed in the picture layer as well as in the macroblock layer and better bit rate control is achieved.  相似文献   

18.
A real-time full search vector quantization system for speech waveform coding is implemented using LSTTL and CMOS devices. The system consists of low-pass filters, A/D and D/A converters, an algorithm for discriminating voiced and unvoiced speed, a full search vector quantizer encoder and decoder, and a microprocessor-based controller. The system is designed to operate at two possible rates: one bit/sample using a dimension 8 vector quantizer (6500 bits/s) or 2 bits/sample using a dimension 4 vector quantizer (13 000 bits/s). In both cases the codebooks have rate 8 bits/vector. Separate codebooks were designed for voiced and unvoiced speech based on a training sequence of 640 000 samples containing five different speakers. The subjective and quantitative results are compared to both simulations and with a real-time array processor based implementation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号