首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
在G.729的语音编码算法中,线谱频率量化是采用预测式矢量量化。当语音传送中出现帧丢失时,采用该方法在译码端会产生误差积累,从而导致语音质量下降。为了降低误差积累的影响,本文提出了一种新型的矢量量化方法。实验结果表明,该方法在防止误差积累方面与G.729相比,性能有明显的提高。  相似文献   

2.
This paper presents a multistage tree-structured vector quantization (MTVQ) scheme for line spectral frequencies (LSF), where two advantages exist: It supports embedded quantization which is required for scalable coder designs, and the tree-structure at each stage can be relied on to accelerate the encoding process. The different codebook design strategies suitable for MTVQ are analyzed. Two speech coding standards are modified by replacing their original LSF quantizers with an MTVQ; it is shown that graceful quality degradation can be obtained for the synthetic speech when the number of bits available for LSF decoding is decremented one-by-one. Moreover, the search complexity is substantially reduced with slight performance degradation.  相似文献   

3.
In this paper, memory-based quantization is studied in detail. We propose a new framework, power series quantization (PSQ), for memory-based quantization. With linear spectral frequency (LSF) quantization as the application, several common memory-based quantization methods (FSVQ, predictive VQ, VPQ, safety-net, etc.) are analyzed and compared with the proposed method, and it is shown that the proposed method performs better than all other tested methods. The proposed PSQ method is fully general, in that it can simulate all other memory-based quantizers if it is allowed unlimited complexity  相似文献   

4.
We propose a new method for implementing Karhunen–Loeve transform (KLT)-based speech enhancement to exploit vector quantization (VQ). The method is suitable for real-time processing. The proposed method consists of a VQ learning stage and a filtering stage. In the VQ learning stage, the autocorrelation vectors comprising the first$K$elements of the autocorrelation function are extracted from learning data. The autocorrelation vectors are used as codewords in the VQ codebook. Next, the KLT bases that correspond to all the codeword vectors are estimated through eigendecomposition (ED) of the empirical Toeplitz covariance matrices constructed from the codeword vectors. In the filtering stage, the autocorrelation vectors that are estimated from the input signal are compared to the codewords. The nearest one is chosen in each frame. The precomputed KLT bases corresponding to the chosen codeword are used for filtering instead of performing ED, which is computationally intensive. Speech quality evaluation using objective measures shows that the proposed method is comparable to a conventional KLT-based method that performs ED in the filtering process. Results of subjective tests also support this result. In addition, processing time is reduced to about 1/66 that of the conventional method in the case where a frame length of 120 points is used. This complexity reduction is attained after the computational cost in the learning stage and a corresponding increase in the associated memory requirement. Nevertheless, these results demonstrate that the proposed method reduces computational complexity while maintaining the speech quality of the KLT-based speech enhancement.  相似文献   

5.
文章提出了一种多级量化LSF参数的方法,并进一步提出了优化量化的方法以及性能分析。优化算法减少了矢量量化算法的运算量和运算时间。加快了矢量量化的编码速度,且对矢量量化的失真情况没有影响。  相似文献   

6.
提出了语音谱参数的切换双预测多级矢量量化算法(DPMSVQ) 的码本设计方法。这种改进的多级矢量量化方法充分利用语音谱参数的短时相关和长时相关特性,采用了有记忆的多级矢量量化算法(MSVQ);并且通过利用相邻语音帧间语音谱参数的强相关和弱相关的不同特点,采用了分别对应于强相关和弱相关的两个预测值,进一步减小了语音谱参数编码位率。切换双预测多级矢量量化方法能够实现21位的语音谱参数近似“透明”量化,同时能够使语音谱参数量化时的计算复杂度略有减少,所需的存储空间大为减少。  相似文献   

7.
低速语音编码中的预测分类分裂矢量量化技术*   总被引:1,自引:0,他引:1  
为降低编码速率的同时仍能提供较好的谱失真性能,提出了一种预测分类分裂矢量量化算法,它根据线谱对的特点,融合了预测、分类、分裂的方法对线谱对进行量化,加入了记忆性。实验证明与其他几种方法相比,该算法的量化性能在速率与失真间达到了较好的平衡,且计算量大大降低,仅占有内存有所增加。  相似文献   

8.

Vector quantization (VQ) is a very effective way to save bandwidth and storage for speech coding and image coding. Traditional vector quantization methods can be divided into mainly seven types, tree-structured VQ, direct sum VQ, Cartesian product VQ, lattice VQ, classified VQ, feedback VQ, and fuzzy VQ, according to their codebook generation procedures. Over the past decade, quantization-based approximate nearest neighbor (ANN) search has been developing very fast and many methods have emerged for searching images with binary codes in the memory for large-scale datasets. Their most impressive characteristics are the use of multiple codebooks. This leads to the appearance of two kinds of codebook: the linear combination codebook and the joint codebook. This may be a trend for the future. However, these methods are just finding a balance among speed, accuracy, and memory consumption for ANN search, and sometimes one of these three suffers. So, finding a vector quantization method that can strike a balance between speed and accuracy and consume moderately sized memory, is still a problem requiring study.

  相似文献   

9.
Clustering is needed in various applications such as biometric person authentication, speech coding and recognition, image compression and information retrieval. Hundreds of clustering methods have been proposed for the task in various fields but, surprisingly, there are few extensive studies actually comparing them. An important question is how much the choice of a clustering method matters for the final pattern recognition application. Our goal is to provide a thorough experimental comparison of clustering methods for text-independent speaker verification. We consider parametric Gaussian mixture model (GMM) and non-parametric vector quantization (VQ) model using the best known clustering algorithms including iterative (K-means, random swap, expectation-maximization), hierarchical (pairwise nearest neighbor, split, split-and-merge), evolutionary (genetic algorithm), neural (self-organizing map) and fuzzy (fuzzy C-means) approaches. We study recognition accuracy, processing time, clustering validity, and correlation of clustering quality and recognition accuracy. Experiments from these complementary observations indicate clustering is not a critical task in speaker recognition and the choice of the algorithm should be based on computational complexity and simplicity of the implementation. This is mainly because of three reasons: the data is not clustered, large models are used and only the best algorithms are considered. For low-order models, choice of the algorithm, however, can have a significant effect.  相似文献   

10.
A lattice-based scheme for the single-frame and the double-frame quantization of the speech line spectral frequency parameters is proposed. The lattice structure provides a low-complexity vector quantization framework, which is implemented using a trellis structure. In the single-frame scheme, the intraframe dependencies are exploited using a linear predictor. In the double-frame scheme, the parameters of two consecutive frames are jointly quantized and hence the interframe dependencies are also exploited. A switched scheme is also considered in which, lattice-based double-frame and single-frame quantization is performed for each two frame and the one which results in a lower distortion is chosen. Comparisons to the Split-VQ, the Multi-Stage VQ, the Trellis Coded Quantization, the interframe Block-Based Trellis Quantizer, and the interframe scheme used in IS-641 EFRC and the GSM AMR codec are provided. These results demonstrate the effectiveness of the proposed lattice-based quantization schemes, while maintaining a very low complexity. Finally, the issue of the robustness to channel errors is investigated.  相似文献   

11.
This paper proposes a new speaker-dependent coding algorithm to efficiently compress a large speech database for corpus-based concatenative text-to-speech (TTS) engines while maintaining high fidelity. To achieve a high compression ratio and meet the fundamental requirements of concatenative TTS synthesizers, such as partial segment decoding and random access capability, we adopt a nonpredictive analysis-by-synthesis scheme for speaker-dependent parameter estimation and quantization. The spectral coefficients are quantized by using a memoryless split vector quantization (VQ) approach that does not use frame correlation. Considering that excitation signals of a specific speaker show low intra-variation especially in the voiced regions, the conventional adaptive codebook for pitch prediction is replaced by a speaker-dependent pitch-pulse codebook trained by a corpus of single-speaker speech signals. To further improve the coding efficiency, the proposed coder flexibly combines nonpredictive and predictive type method considering the structure of the TTS system. By applying the proposed algorithm to a Korean TTS system, we could obtain comparable quality to the G.729 speech coder and satisfy all the requirements that TTS system needs. The results are verified by both objective and subjective quality measurements. In addition, the decoding complexity of the proposed coder is around 55% lower than that of G.729 annex A  相似文献   

12.
This paper discusses a video compression and decompression method based on vector quantization (VQ) for use on general purpose computer systems without specialized hardware. After describing basic VQ coding, we survey common VQ variations and discuss their impediments in light of the target application. We discuss how the proposed video codec was designed to reduce computational complexity in every principal task of the video codec process. We propose a classified VQ scheme that satisfies the data rate, image quality, decoding speed, and encoding speed objectives for software-only video playback. The functional components of the proposed VQ method are covered in detail. The method employs a pseudo-YUV color space and criteria to detect temporal redundancy and low spatial frequency regions. A treestructured-codebook generation algorithm is proposed to reduce encoding execution time while preserving image quality. Two separate vector codebooks, each generated with the treestructured search, are employed for detail and low spatial frequency blocks. Codebook updating and sharing are proposed to further improve encoder speed and compression.  相似文献   

13.
Vector quantization(VQ) can perform efficient feature extraction from electrocardiogram (ECG) with the advantages of dimensionality reduction and accuracy increase. However, the existing dictionary learning algorithms for vector quantization are sensitive to dirty data, which compromises the classification accuracy. To tackle the problem, we propose a novel dictionary learning algorithm that employs k-medoids cluster optimized by k-means++ and builds dictionaries by searching and using representative samples, which can avoid the interference of dirty data, and thus boost the classification performance of ECG systems based on vector quantization features. We apply our algorithm to vector quantization feature extraction for ECG beats classification, and compare it with popular features such as sampling point feature, fast Fourier transform feature, discrete wavelet transform feature, and with our previous beats vector quantization feature. The results show that the proposed method yields the highest accuracy and is capable of reducing the computational complexity of ECG beats classification system. The proposed dictionary learning algorithm provides more efficient encoding for ECG beats, and can improve ECG classification systems based on encoded feature.  相似文献   

14.
We address the problem of speech compression at very low rates, with the short-term spectrum compressed to less than 20 bits per frame. Current techniques apply structured vector quantization (VQ) to the short-term synthesis filter coefficients to achieve rates of the order of 24 to 26 bits per frame. In this paper we show that temporal correlations in the VQ index stream can be introduced by dynamic codebook ordering, and that these correlations can be exploited by lossless coding approaches to reduce the number of bits per frame of the VQ scheme. The use of lossless coding ensures that no additional distortion is introduced, unlike other interframe techniques. We then detail two constructive algorithms which are able to exploit this redundancy. The first method is a delayed-decision approach, which dynamically adapts the VQ codebook to allow for efficient entropy coding of the index stream. The second is based on a vector subcodebook approach and does not incur any additional delay. Experimental results are presented for both methods to validate the approach.  相似文献   

15.
根据英语/汉语男女声线谱频率(LSF)参数及差分LSF参数帧内相关性统计结果,提出适合于LSF参数及差分LSF参数的分裂矢量量化(SVQ)分组方案。实验表明,在不考虑码书大小的情况下使用SVQ量化10阶LSF参数时,(4,6)分组的量化效果较优,否则(4,2,4)或(4,4,2)分组的量化效果较优。通过相关程度分布表清晰表明,至少68%的差分LSF参数在帧内呈微相关,有效减少了LSF参数的帧内冗余信息。随后采用DSQ和多种分组的EEDSVQ对差分LSF参数进行量化,结果表明差分LSF的量化性能优于LSF参数的量化性能。在语音编码中,采用差分LSF参数代替LSF参数作为模型参数,可在保持相同语音质量的情况下进一步降低编码速率。  相似文献   

16.
In this paper, we use the Gaussian mixture model (GMM) based multidimensional companding quantization framework to develop two important quantization schemes. In the first scheme, the scalar quantization in the companding framework is replaced by more efficient lattice vector quantization. Low-complexity lattice pruning and quantization schemes are provided for the E/sub 8/ Gossett lattice. At moderate to high bit rates, the proposed scheme recovers much of the space-filling loss due to the product vector quantizers (PVQ) employed in earlier work, and thereby, provides improved performance with a marginal increase in complexity. In the second scheme, we generalize the compression framework to accommodate recursive coding. In this approach, the joint probability density function (PDF) of the parameter vectors of successive source frames is modeled using a GMM. The conditional density of the parameter vector of the current source frame based on the quantized values of the parameter vector of the previous source frames is used to generate a new codebook for every current source frame. We demonstrate the efficacy of the proposed schemes in the application of speech spectrum quantization. The proposed scheme is shown to provide superior performance with moderate increase in complexity when compared with conventional one-step linear prediction based compression schemes for both narrow-band and wide-band speech.  相似文献   

17.
In this paper, we examine the computational requirements for the split-vector class of vector quantizers when applied to low-rate speech spectrum quantization. The split-vector quantization techniques are able to reduce the complexity and storage requirements of the 24-bit per frame spectral quantizer to manageable proportions. However, further dramatic reductions in computational complexity are possible, as will be demonstrated. As the fast-search algorithms reported in the literature are somewhat data dependent, it has been necessary to carefully evaluate several methods specifically for the speech coding problem. A total of six methods have been evaluated for their effectiveness in this task, and we show that a so-called “geometric” fast-search method results in a reduction in the average search time of an order of magnitude.  相似文献   

18.
《Real》2004,10(2):95-102
The routine for finding the closest codeword in the encoding phase of vector quantization (VQ) is high computational complexity and time consuming, especially when the codewords deal with the high-dimensional vectors. In this paper, we propose three newly developed schemes for speeding up the encoding phase of VQ. The proposed schemes can easily filter out many impossible codewords such that the search domain is reduced. The experimental results show that the computational time of our proposed schemes can save more than 41–52% computational time in a full search scheme. Furthermore, our schemes only require fewer than 84% of the computational time required in recently proposed alternative.  相似文献   

19.
一种基于MFCC和LPCC的文本相关说话人识别方法   总被引:1,自引:0,他引:1  
于明  袁玉倩  董浩  王哲 《计算机应用》2006,26(4):883-885
在说话人识别的建模过程中,为传统矢量量化模型的码字增加了方差分量,形成了一种新的连续码字分布的矢量量化模型。同时采用美尔倒谱系数及其差分和线性预测倒谱系数及其差分相结合作为识别的特征参数,来进行与文本有关的说话人识别。通过与动态时间规整算法和传统的矢量量化方法进行比较表明,在系统响应时间并未明显增加的基础上,该模型识别率有一定提高。  相似文献   

20.
《Applied Soft Computing》2008,8(1):634-645
We proposed a vector quantization (VQ) with variable block size using local fractal dimensions (LFDs) of an image. A VQ with variable block size has so far been implemented using a quad tree (QT) decomposition algorithm. QT decomposition carries out image partitioning based on the homogeneity of local regions of an image. However, we think that the complexity of local regions of an image is more essential than the homogeneity, because we pay close attention to complex region than homogeneous region. Therefore, complex regions are essential for image compression. Since the complexity of regions of an image is quantified by values of LFD, we implemented variable block size using LFD values and constructed a codebook (CB) for a VQ. To confirm the performance of the proposed method, we only used a discriminant analysis and FGLA to construct a CB. Here, the FGLA is the algorithm to combine generalized Lloyd algorithm (GLA) and the fuzzy k means algorithm. Results of computational experiments showed that this method correctly encodes the regions that we pay close attention. This is a promising result for obtaining a well-perceived compressed image. Also, the performance of the proposed method is superior to that of VQ by FGLA in terms of both compression rate and decoded image quality. Furthermore, 1.0 bpp and more than 30 dB in PSNR by a CB with only 252 code-vectors were achieved using this method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号