首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
In this paper, we present a novel audio coder using the discrete wavelet transform (DWT) and warped linear prediction (WLP). In contrast to conventional LP, WLP allows for the control of frequency resolution to closely match the response of the human auditory system. The structure of the system is similar to the transform coded excitation techniques used in wideband speech coding, where LP has been replaced with WLP, and the residual is analyzed by a wavelet filterbank designed to approximate the critical bands. The inherent shaping of the WLP synthesis filter, and a controlled bit allocation to the wavelet coefficients helps minimise the perceptually significant noise due to the quantization error in the residual. For monophonic signals sampled at 44.1 kHz, the coder achieves near transparent to transparent quality for a variety of speech and music signals at an average bitrate of about 64 kb/s. Tests also show that the coder (in its initial implementation) delivers superior quality to the MPEG layer III and comparable quality to the MPEG2-AAC codec when operating at the same bitrate.  相似文献   

2.
一种新的MDCT快速算法   总被引:5,自引:0,他引:5  
改进型的离散余弦变换(Modified disrete cosine transform)作为良好的时频分析工具在单频编码中广泛应用。本文提出了一种基于快速DCT变换的MDCT快速算法,与其他文献的算法相比,其运算量明显减少。  相似文献   

3.
A two stage hybrid embedded speech/audio coding structure and algorithm are proposed. The first stage of the structure consists of a core speech coder which provides a minimum output bit rate and acceptable performance on clean speech inputs. The second stage is a perceptual/transform based coder which provides a separate optional bitstream for the enhancement of the core stage output.The two stage structure can be used to enhance the quality of an existing codec without modification of the original coding algorithm. In this regard it can be considered a value added option that can be used with a standard (existing) system. The structure can also be used in systems in which many users/systems force the coding algorithm to work simultaneously under multiple constraints of bitrate, complexity, delay, and coding quality.Informal testing of the algorithm has been done using ITU-T standard G.723.1 at 5.3 kb/s as a core coder. The maximum combined bitrate from the core and enhancement stages for the tests is 16 kb/s. The tests show that the second stage significantly improves the quality of the core output in the cases of music and speech with background noise. Compared to the non-embedded fixed rate standard LD-CELP G.728 at 16 kb/s, the quality of the two stage structure is generally lower on these inputs; the embedded feature does affect quality. On clean speech the quality of the two stage structure at 16 kb/s is close to if not better than that of G.728 at 16 kb/s.  相似文献   

4.
Introduction to AVS Audio   总被引:1,自引:0,他引:1       下载免费PDF全文
This paper describes a general audio coding algorithm which has been recently standardized by AVS, China. The algorithm is based on a perceptual coding technique. The codec delivers near CD-quality audio at 128kb/s. This paper describes the coder structure in detail and discusses the reasons for specific design methods. A summary of the subjective test results are presented for the prototype codec. Comparison Mean Opinion Score (CMOS) test indicates that the quality of the AVS audio coder is comparable with MPEG Layer-3 audio coder. A reM-time decoder was used for the characterization test, which is based on a 16-bit fixed-point DSP. The performance of the DSP solution was demonstrated, including computational complexity and storage characteristics.  相似文献   

5.
为满足音视频的同时需求,提出了基于TMS320C64X系列DSP的G.729语音编码器软、硬件设计方案,并着重阐述了在实时实现过程中进行优化的关键技术。经测试表明,优化后的单路语音编码占用了5%左右的CPU资源,比优化前降低了80%,保留了更多的资源用于视频编码,为在同一块DSP芯片上实现音频、视频编码提供可能。  相似文献   

6.
This paper investigates the use of sparse overcomplete decompositions for audio coding. Audio signals are decomposed over a redundant union of modified discrete cosine transform (MDCT) bases having eight different scales. This approach produces a sparser decomposition than the traditional MDCT-based orthogonal transform and allows better coding efficiency at low bitrates. Contrary to state-of-the-art low bitrate coders, which are based on pure parametric or hybrid representations, our approach is able to provide transparency. Moreover, we use a bitplane encoding approach, which provides a fine-grain scalable coder that can seamlessly operate from very low bitrates up to transparency. Objective evaluation, as well as listening tests, show that the performance of our coder is significantly better than a state-of-the-art transform coder at very low bitrates and has similar performance at high bitrates. We provide a link to test soundfiles and source code to allow better evaluation and reproducibility of the results.   相似文献   

7.
This paper deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bit-rate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal models combining learned object priors and various perceptually motivated distortion measures. We design efficient algorithms to infer object parameters and build a coder based on the interpolation of frequency and amplitude parameters. Listening tests suggest that the loudness-based distortion measure outperforms other distortion measures and that our coder results in a better sound quality than baseline transform and parametric coders at 8 and 2 kbit/s. This work constitutes a new step towards a fully object-based coding system, which would represent audio signals as collections of meaningful note-like sound objects  相似文献   

8.
This paper proposes an efficient joint implementation algorithm for computing color space conversion, quantization and discrete cosine transform (DCT) in an image coder/decoder. By combining the three stages, the proposed algorithm reduces the operation amount of computing color space conversion considerably. In the case of color sampling 4:4:4, the proposed algorithm reduces the multiplication amount by 40% and the addition amount by 42% for the conversion from RGB to YCbCr in an image coder, and reduces the multiplication amount by 60% and the addition amount by 42% for the conversion from YCbCr to RGB in an image decoder. In the cases of down-sampling 4:2:2 and 4:1:1, there are the similar results. The existing fast methods in the literatures can still be applied together with this proposed algorithm into the implementation of the international image coding standards which use the transform coding technology, such as JPEG, MPEG and H.26X, and raises the image coding/decoding speed efficiently.  相似文献   

9.
This paper presents Advanced Audio Zip (AAZ), a fine grained scalable to lossless (SLS) audio coder that has recently been adopted as the reference model for MPEG-4 audio SLS work. AAZ integrates the functionalities of high-compression perceptual audio coding, fine granular scalable audio coding, and lossless audio coding in a single framework, and simultaneously provides backward compatibility to MPEG-4 Advanced Audio Coding (AAC). AAZ provides the fine granular bit-rate scalability from lossy to lossless coding, and such a scalability is achieved in a perceptually meaningful way, i.e., better perceptual quality at higher bit-rates. Despite its abundant functionalities, AAZ only introduces negligible overhead in terms of lossless compression performance compared with a nonscalable, lossless only audio coder. As a result, AAZ provides a universal yet efficient solution for digital audio applications such as audio archiving, network audio streaming, portable audio playing, and music downloading which were previously catered for by several different audio coding technologies, and eliminates the need for any transcoding system to facilitate sharing of digital audio contents across these application domains.  相似文献   

10.
A perceptually scalable audio coder generates a bit-stream that contains layers of audio fidelity and is encoded in such a way that adding one of these layers enhances the reconstructed audio by an amount that is just noticeable by the listener. Such algorithms have applications like music on demand at variable levels of fidelity, for instance using 3G and 4G cellular radio systems operating at different bit rates. While the MPEG-4 natural audio coder can create finely scalable bit streams using bit sliced arithmetic coding (BSAC), its perceptual quality at low bit rates is poor. On the other hand, the nonscalable transform-domain weighted interleaved vector quantization (TWIN-VQ) performs well at low bit rates. In this paper, we present a modified version of TWIN-VQ algorithm that generates a perceptually scalable bit-stream with many fine layers of audio fidelity. Using TWIN-VQ as our base ensures the best possible perceptual quality at low bit rates. Specifically, the proposed scalable algorithm performs as well as TWIN-VQ at rates of 8 to 16 kb/s and outperforms scalable BSAC by between 64% and 172% at rates of less than 24 kb/s.  相似文献   

11.
This paper deals with the application of adaptive signal models for parametric audio coding. A fully parametric audio coder, which decomposes the audio signal into sinusoids, transients and noise, is here proposed. Adaptive signal models for sinusoidal, transient, and noise modeling are therefore included in the parametric scheme in order to achieve high-quality and low bit-rate audio coding. In this paper, a new sinusoidal modeling method based on a perceptual distortion measure is proposed. For transient modeling, a fast and effective method based on matching pursuit with a mixed dictionary is chosen. The residue of the previous models is analyzed as a noise-like signal. The proposed parametric audio coder allows high quality audio coding for one-channel audio signals at 16 kbits/s (average bit rate). A bit-rate scalable version of the parametric audio coder is also proposed in this work. Bit-rate scalability is intended for audio streaming applications, which are highly demanded nowadays. The performance of the proposed parametric audio coders (nonscalable and scalable coders) is assessed in comparison to widely used audio coders operating at similar bit rates.   相似文献   

12.
This paper describes a coding paradigm using coding tools based on the characteristics of the human hearing system so as to accommodate a wide range of narrow-band audio inputs without annoying artifacts at low rates (down to 8 kb/s). The narrow-band perceptual audio coder (NPAC) employs a variety of algorithms to account for the perceptually irrelevant parts of the input signal in addition to statistical redundancies. The new algorithms used in the NPAC coder include a perceptual error measure in training the codebooks and selecting the best codewords which takes into account the audible parts of the quantization noise, a perception-based bit-allocation algorithm and a new predictive scheme to vector quantize the scale factors. The NPAC coder delivers acceptable quality without annoying artifacts for most narrow-band audio signals at around 1 bit/sample. Informal subjective tests have shown that the NPAC coder outperforms a commercial low-rate music coder operating at 8 kb/s.  相似文献   

13.
Digital monochromatic images are encoded using a novel minimum mean square error (MSE) linear predictive transform (LPT) coding formulation. The new formulation is appealing for two important reasons. First, it leads to simple coder implementation with a satisfactory signal-to-noise ratio (SNR). Second, it provides a general theoretical framework from which minimum MSE predictive coding and minimum MSE transform coding arise as special cases. Some specific results of this paper that illustrate the previous ideas are: a simple and generally suboptimum two-dimensional LPT coder operating at 2 bit pixer−1 has approximately one third the complexity of a 4 × 4 Hadamard coder while yielding a better SNR; an optimum 2D LPT coder operating at 2 bit pixer−1 has approximately one sixth the complexity of a 4 × 4 Karhunen-Loeve transform (KLT) coder while yielding a better SNR.  相似文献   

14.
We propose two quantization techniques for improving the bit-rate scalability of compression systems that optimize a weighted squared error (WSE) distortion metric. We show that quantization of the base-layer reconstruction error using entropy-coded scalar quantizers is suboptimal for the WSE metric. By considering the compandor representation of the quantizer, we demonstrate that asymptotic (high resolution) optimal scalability in the operational rate-distortion sense is achievable by quantizing the reconstruction error in the compandor's companded domain. We then fundamentally extend this work to the low-rate case by the use of enhancement-layer quantization which is conditional on the base-layer information. In the practically important case that the source is well modeled as a Laplacian process, we show that such conditional coding is implementable by only two distinct switchable quantizers. Conditional coding leads to substantial improvement over the companded scalable quantization scheme introduced in the first part, which itself significantly outperforms standard techniques. Simulation results are presented for synthetic memoryless Laplacian sources with /spl mu/-law companding, and for real-world audio signals in conjunction with MPEG AAC. Using the objective noise-mask ratio (NMR) metric, the proposed approaches were found to result in bit-rate savings of a factor of 2 to 3 when implemented within the scalable MPEG AAC. Moreover, the four-layer scalable coder consisting of 16-kb/s layers achieves performance close to that of the 64-kb/s nonscalable coder on the standard test database of 44.1-kHz audio.  相似文献   

15.
基于心理声学模型的多码率零树小波音频压缩方法   总被引:3,自引:0,他引:3  
何冬梅  高文 《计算机学报》2000,23(3):278-284
MPEG-4音频编码标准不仅对码率和音质提出了更高的要求,而且还要求编码器具有多种功能以满足各种不同应用的需要,该文利用不同尺度小波系数的自相似特性和人耳的掩蔽效应,提出了一种基于心理声学模型的零树小波音频编码算法。该算法不仅可在低码率(56kb/s)上得到透明质量的CD音频信号,而且可产生嵌入式码流,在最优意义上支持多码率的可分级编码,是一种很有前途的适用一多媒体通信等领域的编码方案。  相似文献   

16.
基于稀疏编码收缩和Contourlet变换的红外图像去噪   总被引:1,自引:0,他引:1       下载免费PDF全文
针对稀疏收缩编码法和Contourlet变换的不足,提出了一种新的图像去噪算法。算法可以很好地解决含有加性未知噪声方差的红外图像去噪问题。实验表明,与传统方法、稀疏编码收缩法和Contourlet域降噪方法相比,该算法进一步提高了SNR值,降低了MSE值,获得了更好的图像恢复质量。  相似文献   

17.
Gaussian Mixture Kalman Predictive Coding of Line Spectral Frequencies   总被引:1,自引:0,他引:1  
Gaussian mixture model (GMM)-based predictive coding of line spectral frequencies (LSFs) has gained wide acceptance. In such coders, each mixture of a GMM can be interpreted as defining a linear predictive transform coder. In this paper, we use Kalman filtering principles to model each of these linear predictive transform coders to present GMM Kalman predictive coding. In particular, we show how suitable modeling of quantization noise leads to an adaptive a posteriori GMM that defines a signal-adaptive predictive coder that provides improved coding of LSFs in comparison with the baseline recursive GMM predictive coder. Moreover, we show how running the GMM Kalman predictive coders to convergence can be used to design a stationary GMM Kalman predictive coding system which again provides improved coding of LSFs but now with only a modest increase in run-time complexity over the baseline. In packet loss conditions, this stationary GMM Kalman predictive coder provides much better performance than the recursive GMM predictive coder, and in fact has comparable mean performance to a memoryless GMM coder. Finally, we illustrate how one can utilize Kalman filtering principles to design a postfilter which enhances decoded vectors from a recursive GMM predictive coder without any modifications to the encoding process.  相似文献   

18.
19.
A perceptually enhanced prioritized bit-plane audio coding algorithm is presented in this paper. According to the energy distribution in different frequency regions, the bit-planes are prioritized with optimized parameters. Based on the statistical modeling of the frequency spectrum, a much more simplified implementation of prioritized bit-plane coding is integrated with the recent release of MPEG-4 scalable lossless (SLS) audio coding structure by replacing the sequential bit-plane coding in the enhancement layer. With zero extra side information, trivial added complexity, and modification to the original SLS structure, extensive experimental results show that the perceptual quality of SLS with noncore and very low core bit-rate is improved significantly in a wide range of bit-rate combinations. Fully scalable audio coding up to lossless with much enhanced perceptual quality is thus achieved.  相似文献   

20.
严迪群  王让定 《计算机工程》2008,34(20):172-174
提出一种基于音频点播系统的保密语音隐秘传输实现方案。采用ITU G.729A编码算法对保密语音低码率压缩编码,将保密语音码流通过改进LSB数据隐藏算法嵌入到公开音频中,利用音频点播平台发布到网络上,通过客户端点播实现保密语音提取和回放。测试数据结果表明,通过改进算法,载体音频的感知质量得到了提高,同时也表明该方案对于恶意攻击者具有更好的隐蔽性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号