共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper presents a technique to incorporate psychoacoustic models into an adaptive wavelet packet scheme to achieve perceptually transparent compression of high-quality (34.1 kHz) audio signals at about 45 kb/s. The filter bank structure adapts according to psychoacoustic criteria and according to the computational complexity that is available at the decoder. This permits software implementations that can perform according to the computational power available in order to achieve real time coding/decoding. The bit allocation scheme is an adapted zero-tree algorithm that also takes input from the psychoacoustic model. The measure of performance is a quantity called subband perceptual rate, which the filter bank structure adapts to approach the perceptual entropy (PE) as closely as possible. In addition, this method is also amenable to progressive transmission, that is, it can achieve the best quality of reconstruction possible considering the size of the bit stream available at the encoder. The result is a variable-rate compression scheme for high-quality audio that takes into account the allowed computational complexity, the available bit-budget, and the psychoacoustic criteria for transparent coding. This paper thus provides a novel scheme to marry the results in wavelet packets and perceptual coding to construct an algorithm that is well suited to high-quality audio transfer for Internet and storage applications 相似文献
2.
A perceptual audio coder, in which each audio segment is
adaptively analyzed using either a sinusoidal or an optimum wavelet basis
according to the time-varying characteristics of the audio signals, has been
constructed. The basis optimization is achieved by a novel switched filter
bank scheme, which switches between a uniform filter bank structure
(discrete cosine transform) and a non-uniform filter bank structure
(discrete wavelet transform). A major artifact of the International
ISO/Moving Pictures Experts Group (MPEG) audio coding standard (MPEG-I
layers 1 and 2) known as pre-echo distortion which uses a uniform filter bank structure for
audio signal analysis, is almost eliminated in the proposed coder. A
perceptual masking model implemented using a high-resolution wavelet packet
filter bank with 27 subbands, closely mimicking the critical bands
of the human auditory system, is employed in this audio coder. The resulting
scheme is a variable bit-rate audio coder, which provides compression ratios
comparable to MPEG-I layers 1 and 2 with almost transparent quality. 相似文献
3.
提出了一种新颖的基于自适应小波基优化选择和心理声学模型相结合的数字音频信号的透明质量编码方法,保证固定失真水平上使每帧信号的变换系数的动态分配的比特数最少,并且利用动态码本的方法来消除音频信号的统计冗余,进一步压缩比特率,对于抽样率为44.1kHz每样值用16比特线性码表示的光盘单声道音乐信号可以压缩到64kBPS左右。 相似文献
4.
A new audio coding system is proposed. Using an M-band multiresolution filter bank technique. This consists of a cascade of 4-band and 8-band filter banks. Experiments with a complete audio coding system were carried out with the proposed filter bank, masking model, bit allocation algorithm, scalar quantisation and Huffman coding. For the broadband signals tested, the proposed system resulted in near transparent quality at bit-rates of 78-91 kbit/s with low computational load. It also achieved similar performance to the MPEG layer 2 coder at 128 kbit/s 相似文献
5.
本文依据感知音频编解码基本原理,研究和设计了一种基于多描述编码技术的高质量音频编码算法。这种算法具有较好抗丢包性能,算法的总体思路是先在分析与合成的层面上把音频分解为听觉掩蔽门限和剩余信号,然后在量化和编码层面上分别对音频的听觉掩蔽门限和剩余信号进行多描述处理。姑果表明,在所提出的多描述抗丢包音频编解码算法框架下,多描述算法的抗丢包性能明显优于单描述的抗丢包性能,标量量化多描述算法的抗丢包性能比奇偶分离双描述算法和对偶变换双描述算法的抗丢包性能都要好。 相似文献
6.
In this paper, we present a new method for high quality audio coding at low delay and low bit rate for telecommunications applications such as audioconfe-rence or videoconference. The developped coder is adapted to code generic audio signals at a bit rate of 64 kbit/s with a delay close to 5 ms in the 20-15000 Hz bandwidth. The method is based on speech coding as well as audio coding concepts. The coder combines subband decomposition of the input signal and LD-CELP techniques. We introduce in this structure of coding a psychoacoustic model which allows to allocate an optimal bit rate on each subband according to perceptual properties of the human hearing. In order to satisfy the bit rate requirement of the psychoacoustic model and to reduce the complexity of such a coding algorithm, we suggested a new method of vector quantization based on lattice quantization. This method allows to quantify the residual signal in the LD-CELP coder and avoid the complexity of the full search. Objective and subjective tests have been made on a test set of audio signals which is a critical sub-set used by ISO. Formal tests showed that the quality of the proposed coder is comparable to the best implementation of the MPEG-1, Layer II, but our solution has the advantage of reaching a very low delay (5 ms). 相似文献
7.
We introduce new methods for increasing the performance of multiprogram digital audio broadcast systems, e.g., satellite digital audio broadcasting. Joint multiprogram encoding is an attractive possibility for parallel broadcasting of a large number of programs. Joint coding extended over multiple audio frames in time give further improvements. The benefits of this kind of statistical multiplexing yield improved audio quality and/or higher capacity in terms of number of programs. We describe the new Joint Multiple Program Encoding Technique in the context of the perceptual audio coding (PAC) type of algorithms. We also describe methods for multi-program transmission including Equal Error Protection (EEP) as well as Unequal Error Protection (UEP) and improved error concealment for multiple program transmission. Some of the techniques described in this paper, are currently being used in satellite digital audio broadcasting in the United States. 相似文献
8.
Al-MoussawyRaed YinJunxun SongShaopeng 《电子科学学刊(英文版)》2004,21(3):213-221
This work is concerned with the development and optimization of a signal model for scalable perceptual audio coding at low bit rates. A complementary two-part signal model consisting of Sines plus Noise (SN) is described. The paper presents essentially a fundamental enhancement to the sinusoidal modeling component. The enhancement involves an audio signal scheme based on carrying out overlap-add sinusoidal modeling at three successive time scales, large, medium, and small. The sinusoidal modeling is done in an analysis-by-synthesis overlapadd manner across the three scales by using a psychoacoustically weighted matching pursuits. The sinusoidal modeling residual at the first scale is passed to the smaller scales to allow for the modeling of various signal features at appropriate resolutions. This approach greatly helps to correct the pre-echo inherent in the sinusoidal model. This improves the perceptual audio quality upon our previous work of sinusoidal modeling while using the same number of sinusoids. The most obvious application for the SN model is in scalable, high fidelity audio coding and signal modification. 相似文献
9.
10.
11.
12.
13.
Advances in speech and audio compression 总被引:4,自引:0,他引:4
Gersho A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1994,82(6):900-918
Speech and audio compression has advanced rapidly in recent years spurred on by cost-effective digital technology and diverse commercial applications. Recent activity in speech compression is dominated by research and development of a family of techniques commonly described as code-excited linear prediction (CELP) coding. These algorithms exploit models of speech production and auditory perception and offer a quality versus bit rate tradeoff that significantly exceeds most prior compression techniques for rates in the range of 4 to 16 kb/s. Techniques have also been emerging in recent years that offer enhanced quality in the neighborhood of 2.4 kb/s over traditional vocoder methods. Wideband audio compression is generally aimed at a quality that is nearly indistinguishable from consumer compact-disc audio. Subband and transform coding methods combined with sophisticated perceptual coding techniques dominate in this arena with nearly transparent quality achieved at bit rates in the neighborhood of 128 kb/s per channel 相似文献
14.
Vladimir Britanak 《Signal processing》2011,91(4):624-672
This tutorial paper describes various efficient implementations (published and new unpublished) of the forward and backward modified discrete cosine transform (MDCT) in the MPEG layer III (MP3) audio coding standard developed in the time period 1990-2010, including the efficient implementation of polyphase filter banks for completeness. The efficient MDCT implementations are discussed in the context of (fast) complete analysis/synthesis MDCT filter banks in the MP3 encoder and decoder. In general, for each efficient forward/backward MDCT block transforms implementation are presented: complete formulas or sparse matrix factorizations of the algorithm, the corresponding signal flow graph for the short audio block and the total arithmetic complexity as well as the useful comments related to improving the arithmetic complexity and a possible structural simplification of the algorithm. Finally, all efficient forward/backward MDCT implementations are compared both in terms of the arithmetic complexity and structural simplicity. It is important to note that almost all presented algorithms can be also used for the 2n-length data blocks in others MPEG audio coding standards and proprietary audio compression algorithms. 相似文献
15.
The audio quality, robustness and implementational complexity of a novel mobile digital audio broadcast scheme are addressed. The audio codec proposed is based on an efficient combination of subband coding (SBC) and multipulse excited linear prediction coding (MPLPC). The bit allocation is dynamically adapted according to both the signal power in different subbands and a perceptual hearing model. Typically a segmental signal to noise ratio (SEGSNR) in excess of 30 dB associated with high fidelity subjective quality was achieved for 2.67-b/sample transmissions at a bit rate of 86 kb/s. Perceptually unimpaired audio quality was achieved for a bit error rate (BER) of about 10-4, when injecting random errors, which was degraded for increased BERs. In order to provide robust error protection, the audio codec was also subjected to a rigorous bit sensitivity analysis. Four different forward error correction schemes were investigated in order to explore the complexity, bit rate, and robustness tradeoffs 相似文献
16.
17.
18.
19.
《Electronics & Communication Engineering Journal》1997,9(4):165-175
In high-quality digital audio coding, a great deal of attention is focused on the auditory perception process, as the goal of audio compression is to attain perceptually-transparent compression and reproduction. Consequently models for perceptual masking are used extensively in audio coders, allowing quantisation noise to be allocated in the various frequency subbands according to a masking function. In this way, quantisation noise can be made almost inaudible at the receiver. In this paper, the psychoacoustic phenomenon of auditory masking is described. This is followed by a review of the MPEG-1 (Moving Pictures Experts Group) international standard for audio compression, including an outline of the psychoacoustic models used 相似文献
20.
The class of perceptual audio coding (PAC) algorithms yields efficient and high-quality stereo digital audio bitstreams at bit rates from 16 kb/sec to 128 kb/sec (and higher). To avoid "pops and clicks" in the decoded audio signals, channel error detection combined with source error concealment, or source error mitigation, techniques are preferred to pure channel error correction. One method of channel error detection is to use a high-rate block code, for example, a cyclic redundancy check (CRC) code. Several joint source-channel coding issues arise in this framework because PAC contains a fixed-to-variable source coding component in the form of Huffman codes, so that the output audio packets are of varying length. We explore two such issues. First, we develop methods for screening for undetected channel errors in the audio decoder by looking for inconsistencies between the number of bits decoded by the Huffman decoder and the number of bits in the packet as specified by control information in the bitstream. We evaluate this scheme by means of simulations of Bernoulli sources and real audio data encoded by PAC. Considerable reduction in undetected errors is obtained. Second, we consider several configurations for the channel error detection codes, in particular CRC codes. The preferred set of formats employs variable-block length, variable-rate outer codes matched to the individual audio packets, with one or more codewords used per audio packet. To maintain a constant bit rate into the channel, PAC and CRC encoding must be performed jointly, e.g., by incorporating the CRC into the bit allocation loop in the audio coder. 相似文献