期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Scalable Audio Compression at Low Bitrates

Kandadai S. Creusere C.D. 《IEEE transactions on audio, speech, and language processing》2008,16(5):969-979

A perceptually scalable audio coder generates a bit-stream that contains layers of audio fidelity and is encoded in such a way that adding one of these layers enhances the reconstructed audio by an amount that is just noticeable by the listener. Such algorithms have applications like music on demand at variable levels of fidelity, for instance using 3G and 4G cellular radio systems operating at different bit rates. While the MPEG-4 natural audio coder can create finely scalable bit streams using bit sliced arithmetic coding (BSAC), its perceptual quality at low bit rates is poor. On the other hand, the nonscalable transform-domain weighted interleaved vector quantization (TWIN-VQ) performs well at low bit rates. In this paper, we present a modified version of TWIN-VQ algorithm that generates a perceptually scalable bit-stream with many fine layers of audio fidelity. Using TWIN-VQ as our base ensures the best possible perceptual quality at low bit rates. Specifically, the proposed scalable algorithm performs as well as TWIN-VQ at rates of 8 to 16 kb/s and outperforms scalable BSAC by between 64% and 172% at rates of less than 24 kb/s. 相似文献

2.

Introduction to AVS Audio 总被引：1，自引：0，他引：1

下载免费PDF全文

Hao-Jun Ai Shui-Xian Chen and Rui-Min Hu 《计算机科学技术学报》2006,21(3):360-365

This paper describes a general audio coding algorithm which has been recently standardized by AVS, China. The algorithm is based on a perceptual coding technique. The codec delivers near CD-quality audio at 128kb/s. This paper describes the coder structure in detail and discusses the reasons for specific design methods. A summary of the subjective test results are presented for the prototype codec. Comparison Mean Opinion Score （CMOS） test indicates that the quality of the AVS audio coder is comparable with MPEG Layer-3 audio coder. A reM-time decoder was used for the characterization test, which is based on a 16-bit fixed-point DSP. The performance of the DSP solution was demonstrated, including computational complexity and storage characteristics. 相似文献

3.

Embedded coding using a mixed speech and audio coding paradigm

Sean A. Ramprashad 《International Journal of Speech Technology》1999,2(4):359-372

A two stage hybrid embedded speech/audio coding structure and algorithm are proposed. The first stage of the structure consists of a core speech coder which provides a minimum output bit rate and acceptable performance on clean speech inputs. The second stage is a perceptual/transform based coder which provides a separate optional bitstream for the enhancement of the core stage output.The two stage structure can be used to enhance the quality of an existing codec without modification of the original coding algorithm. In this regard it can be considered a value added option that can be used with a standard (existing) system. The structure can also be used in systems in which many users/systems force the coding algorithm to work simultaneously under multiple constraints of bitrate, complexity, delay, and coding quality.Informal testing of the algorithm has been done using ITU-T standard G.723.1 at 5.3 kb/s as a core coder. The maximum combined bitrate from the core and enhancement stages for the tests is 16 kb/s. The tests show that the second stage significantly improves the quality of the core output in the cases of music and speech with background noise. Compared to the non-embedded fixed rate standard LD-CELP G.728 at 16 kb/s, the quality of the two stage structure is generally lower on these inputs; the embedded feature does affect quality. On clean speech the quality of the two stage structure at 16 kb/s is close to if not better than that of G.728 at 16 kb/s. 相似文献

4.

Adaptive Signal Modeling Based on Sparse Approximations for Scalable Parametric Audio Coding

Ruiz Reyes N. Vera Candeas P. 《IEEE transactions on audio, speech, and language processing》2010,18(3):447-460

This paper deals with the application of adaptive signal models for parametric audio coding. A fully parametric audio coder, which decomposes the audio signal into sinusoids, transients and noise, is here proposed. Adaptive signal models for sinusoidal, transient, and noise modeling are therefore included in the parametric scheme in order to achieve high-quality and low bit-rate audio coding. In this paper, a new sinusoidal modeling method based on a perceptual distortion measure is proposed. For transient modeling, a fast and effective method based on matching pursuit with a mixed dictionary is chosen. The residue of the previous models is analyzed as a noise-like signal. The proposed parametric audio coder allows high quality audio coding for one-channel audio signals at 16 kbits/s (average bit rate). A bit-rate scalable version of the parametric audio coder is also proposed in this work. Bit-rate scalability is intended for audio streaming applications, which are highly demanded nowadays. The performance of the proposed parametric audio coders (nonscalable and scalable coders) is assessed in comparison to widely used audio coders operating at similar bit rates. 相似文献

5.

Compression Artifacts in Perceptual Audio Coding

Chi-Min Liu Han-Wen Hsu Wen-Chieh Lee 《IEEE transactions on audio, speech, and language processing》2008,16(4):681-695

Perceptual audio coding achieves a high compression ratio by exploiting the perceptual irrelevance and data redundancies. By using advanced and sophisticated signal processing methods, perceptual coding has generated artifacts that are quite different from the traditional distortions. A new audio technology becomes mature through the successful modeling, measuring, and control on the artifacts incurred from the technology. With the advance of new coding modules in advanced audio coding (AAC), spectral band replication (SBR), and parametric coding, the incurred artifacts are far more difficult to model, measure, and control than those caused by previous encoding systems like pulse code modulation. This paper models the audible artifacts through the time-frequency diagrams, considers the artifacts-susceptible music types, and analyzes the critical encoding technologies incurring these artifacts. 相似文献

6.

基于心理声学模型的多码率零树小波音频压缩方法 总被引：3，自引：0，他引：3

何冬梅高文《计算机学报》2000,23(3):278-284

ＭＰＥＧ－４音频编码标准不仅对码率和音质提出了更高的要求,而且还要求编码器具有多种功能以满足各种不同应用的需要,该文利用不同尺度小波系数的自相似特性和人耳的掩蔽效应,提出了一种基于心理声学模型的零树小波音频编码算法。该算法不仅可在低码率（５６ｋｂ／ｓ）上得到透明质量的ＣＤ音频信号,而且可产生嵌入式码流,在最优意义上支持多码率的可分级编码,是一种很有前途的适用一多媒体通信等领域的编码方案。相似文献

7.

A fine granular scalable to lossless audio coder

Rongshan Yu Rahardja S. Lin Xiao Chi Chung Ko 《IEEE transactions on audio, speech, and language processing》2006,14(4):1352-1363

This paper presents Advanced Audio Zip (AAZ), a fine grained scalable to lossless (SLS) audio coder that has recently been adopted as the reference model for MPEG-4 audio SLS work. AAZ integrates the functionalities of high-compression perceptual audio coding, fine granular scalable audio coding, and lossless audio coding in a single framework, and simultaneously provides backward compatibility to MPEG-4 Advanced Audio Coding (AAC). AAZ provides the fine granular bit-rate scalability from lossy to lossless coding, and such a scalability is achieved in a perceptually meaningful way, i.e., better perceptual quality at higher bit-rates. Despite its abundant functionalities, AAZ only introduces negligible overhead in terms of lossless compression performance compared with a nonscalable, lossless only audio coder. As a result, AAZ provides a universal yet efficient solution for digital audio applications such as audio archiving, network audio streaming, portable audio playing, and music downloading which were previously catered for by several different audio coding technologies, and eliminates the need for any transcoding system to facilitate sharing of digital audio contents across these application domains. 相似文献

8.

Low Bit-Rate Object Coding of Musical Audio Using Bayesian Harmonic Models

Vincent E. Plumbley M. D. 《IEEE transactions on audio, speech, and language processing》2007,15(4):1273-1282

This paper deals with the decomposition of music signals into pitched sound objects made of harmonic sinusoidal partials for very low bit-rate coding purposes. After a brief review of existing methods, we recast this problem in the Bayesian framework. We propose a family of probabilistic signal models combining learned object priors and various perceptually motivated distortion measures. We design efficient algorithms to infer object parameters and build a coder based on the interpolation of frequency and amplitude parameters. Listening tests suggest that the loudness-based distortion measure outperforms other distortion measures and that our coder results in a better sound quality than baseline transform and parametric coders at 8 and 2 kbit/s. This work constitutes a new step towards a fully object-based coding system, which would represent audio signals as collections of meaningful note-like sound objects 相似文献

9.

BS.1387声学模型在音频编码系统中的应用

胡小鹏李迅贺贵明周小平《计算机工程与应用》2006,42(11):6-9

将ITU-RBS.1387中评判音频质量所采用的声学模型中的基本模式与实际的音频编码系统相结合,对该声学模型的特点进行了理论分析,提出了相应的改进措施以便其应用于实际的音频编码系统中。在我国最新制定的AVS音频编码标准参考编码器上,分别将该声学模型和MPEG-2AAC音频标准的心理声学参考模型2进行了实现,并将模型输出掩蔽参数以及主观听觉试验结果进行了对比验证。试验结果证明该文设计的应用于音频编码器的新声学模型是合理可行的。相似文献

10.

基于人眼感知特性的亮度系数压缩方法

下载免费PDF全文

喻莉郭姗徐士麟周刚李荣《中国图象图形学报》2009,14(3):452-457

在传统视频编码系统中,尽管失真大多由均方误差(MSE)度量,但是基于MSE的失真往往难以衡量不同视频流的主观差异,因此,人眼视觉系统对视频流的感知特性有必要被编码器利用。为了进一步提高编码效率,针对人眼对不同亮度的信号敏感程度不同的特性,提出了一种基于人眼感知特性的亮度系数压缩算法,该算法通过前向量化将人眼不能察觉的冗余信息丢掉来提高编码器压缩效率,并保证了人眼对损失的信息不可见。实验结果表明,采用该算法的AVS参考编码器,其输出码率的下降幅度达到8%～40%,而解码图像的主观观测质量却同未采用该算法的编码器相当。相似文献

11.

A Novel Audio Coding Scheme Using Warped Linear Prediction Model and the Discrete Wavelet Transform

《IEEE transactions on audio, speech, and language processing》2006,14(6):2039-2048

In this paper, we present a novel audio coder using the discrete wavelet transform (DWT) and warped linear prediction (WLP). In contrast to conventional LP, WLP allows for the control of frequency resolution to closely match the response of the human auditory system. The structure of the system is similar to the transform coded excitation techniques used in wideband speech coding, where LP has been replaced with WLP, and the residual is analyzed by a wavelet filterbank designed to approximate the critical bands. The inherent shaping of the WLP synthesis filter, and a controlled bit allocation to the wavelet coefficients helps minimise the perceptually significant noise due to the quantization error in the residual. For monophonic signals sampled at 44.1 kHz, the coder achieves near transparent to transparent quality for a variety of speech and music signals at an average bitrate of about 64 kb/s. Tests also show that the coder (in its initial implementation) delivers superior quality to the MPEG layer III and comparable quality to the MPEG2-AAC codec when operating at the same bitrate. 相似文献

12.

Efficient parametric coding of transients

Christensen M.G. van de Par S. 《IEEE transactions on audio, speech, and language processing》2006,14(4):1340-1351

In this paper, methods for improved parametric coding of transients are presented. We propose a signal model for coding of transients consisting of a sum of sinusoids each being amplitude-modulated by a different gamma envelope. These envelopes are characterized by an onset time, an attack and a decay parameter. An efficient method for estimating these parameters is presented. Further, methods are proposed that combine this transient model with a constant-amplitude sinusoidal model in order to achieve efficient coding of both stationary and transient signal parts. By rate-distortion optimization using a perceptual distortion measure, we combine variable rate bit allocation and segmentation in an optimal way. Formal, as well as informal, listening tests show that significant improvements can be achieved with the proposed model as compared to a state-of-the-art sinusoidal coder by the combination of optimal segmentation and amplitude modulated sinusoidal audio coding. 相似文献

13.

Bark scale-based perceptual matching pursuit for improving sinusoidal audio modeling

P. Vera-Candeas N. Ruiz-Reyes F. López-Ferreras 《Digital Signal Processing》2009,19(2):229-240

In this paper we propose an improved sinusoidal modeling method based on perceptual matching pursuits computed in the bark scale for parametric audio coding applications. Complex exponentials compose the overcomplete dictionary for matching pursuits. The main contribution is the minimization of a perceptual distortion measure defined in the bark scale to select the optimum atom at each iteration of the pursuits. Furthermore, a psychoacoustic stopping criterion for the pursuits is presented. The proposed sinusoidal modeling method is suitable to be integrated into a parametric audio coder based on the three-part model of sines, transients and noise (STN model), as can be appreciated in experimental results. Our method provides significant advantages regarding previous works mainly because it operates in the bark scale rather than in frequency domain. 相似文献

14.

The design and simulated performance of a mobile video telephonyapplication for satellite third-generation wireless systems

Dubuc C. Boudreau D. Patenaude F. 《Multimedia, IEEE Transactions on》2001,3(4):424-431

The design and performance of a low bit-rate video telephony service for mobile third-generation (3G) systems is presented. The ITU-T G.723.1 speech coding and the ITU-T H.263 video coding recommendations are used, as proposed by the ITU-T H.324 low bit-rate multimedia communications recommendation. The target bit-rate for the H.324 service is 64 kb/s. The design is performed in conjunction with that of a wideband-code division multiple access (W-CDMA) radio transmission technology (RTT) system, proposed by the European Space Agency (ESA) for the satellite component of the ITU IMT-2000 standard. Most of the results could also be applied to the 3G terrestrial systems. The use of concatenated channel coding with convolutional inner coding and Reed-Solomon (RS) outer coding is investigated. Service designs based on equal error protection (EEP) and unequal error protection (UEP) schemes for the audio and video sources are compared. The simulation of the proposed video telephony services shows that significantly more graceful video and audio degradation is obtained with the proposed UEP scheme than with a more straightforward EEP method. The UEP scheme reduces significantly the occurrence of highly annoying audio and video artefacts, allowing satellite-based video telephony services that are compatible with the current Internet-based applications 相似文献

15.

Colour image compression based on the measure of just noticeable colour difference

Chou C.-H. Liu K.-C. 《Image Processing, IET》2008,2(6):304-322

To the human vision, there exists in colour images a certain amount of perceptual redundancy since the human visual system (HVS) has limited sensitivity in discriminating colour signals of small differences. By measuring the perceptual redundancy inherent in colour images and shaping the coding distortion into the perceptual redundancy, colour images are expected to be represented more efficiently. Approaches to perceptually optimise the efficiency of image coders in compressing colour images with the perceptual redundancy estimated by a colour visual model are presented. The model estimates the perceptual redundancy for each colour pixel as a visibility threshold of colour difference in any colour space and in a spatial or frequency domain. Two existing image coders are modified to take advantage of the perceptual redundancy and simulated to inspect if their coding efficiency is improved. In the spatial domain, the JPEG-LS coder in the near-lossless compression mode is modified to make coding errors part of the perceptual redundancy in compressing colour images in the RGB space. In the wavelet domain, the JPEG2000 coder is refined by minimising the perceptible distortion involved in the rate control of the compressed image in the YC/sub b/C/sub r/space. Simulation results show that, in both cases, the performance of the perceptually tuned coder is superior to that of the un-tuned coder in terms of the bit rate required for achieving the same visual quality. 相似文献

16.

Structures for SNR scalable speech coding

Hui Dong Gibson J.D. 《IEEE transactions on audio, speech, and language processing》2006,14(2):545-557

SNR scalable speech coding is desirable for a number of network multimedia applications, but relatively few SNR-scalable speech coders exist for operation at rates below 16 kb/s. We investigate several SNR scalable source coding structures and define the new concepts of dependent and independent SNR scalability, where independent SNR scalable coders depend on the core layer coder only through the core layer output. Independent SNR scalable structures offer the possibility of providing bit rate scalable functionality to existing nonscalable coders and standards. We show that the MPEG-4 scalable coders are examples of dependent SNR scalable coders, and we introduce a new independent SNR scalable coder called CELPTree, which has the additional advantage of being low delay. We compare the performance of the MPEG-4 coders and CELPTree for both clean and noisy speech, and we examine the effects of frequency-weighted distortion measures in the enhancement layers of SNR scalable speech coders. 相似文献

17.

一种改进的最佳时频原子搜索策略 总被引：7，自引：0，他引：7

下载免费PDF全文

刘利雄贾云得廖斌张敏《中国图象图形学报》2004,9(7):873-877

在极低编码速率条件下，Neff和Zahor提出的基于匹配跟踪信号分解的视频编码器不仅具有比H．263编码器更高的编码性能，而且能够避免产生人眼敏感的方块效应，但由于该算法需要在一个冗余字典里搜索最佳匹配误差结构的原子函数，其实现所需要的运算量比传统的编码器要高很多，因而影响了该编码器的效率。为了提高编码效率，在对能量优先原子搜索策略进行分析的基础上，提出了一种改进的全搜索策略和加权能量优先搜索策略，从而改进了最佳时频原子搜索策略。最后还对搜索策略的编码性能和运算效率进行了评价和实验。相似文献

18.

Audio watermarking based on quantization index modulation using combined perceptual masking

Jyotsna Singh Parul Garg Alok Nath De 《Multimedia Tools and Applications》2012,59(3):921-939

In this paper, a robust audio watermarking scheme for MPEG-1/ Audio Layer II compressed domain is proposed. The scheme is implemented by modifying the subband coefficients using adaptive quantization index modulation. The watermarking procedure exploits perceptual frequency and temporal masking of the human auditory system (HAS) of MPEG coder to satisfy the requirements of robustness, security and transparency. This reduces the computational complexity of proposed scheme. The paper investigates the use of elevated masking threshold to improve detection and achieve higher robustness against re-encoding and awgn attacks. Experimental results show that high capacity of 6,840 bps with ODG ?0.5 without altering the MPEG/audio bitrate. 相似文献

19.

A perceptible watermarking algorithm for audio signals

Malay Kishore Dutta Phalguni Gupta Vinay K. Pathak 《Multimedia Tools and Applications》2014,73(2):691-713

This paper proposes an unconventional method for removable audible watermarking system based on the requirements of a promising application .Given an audio file, the system makes some part of file available for preview and perceptual watermarking on the remaining portion. The watermark is embedded into selected DCT coefficients of host audio signal so that the signal to noise ratio is maintained at a level which is audibly annoying to human auditory system. An issue that arises here is generating huge number of copies of the audio file which are audibly similar and numerically different. Once the audio file is decoded using the secret key a new watermark is embedded in the audio that is perceptually transparent to the human auditory system. Hence this double watermarking i.e. imperceptible and perceptible watermarking provides a novel prototype for digital right management control. The subjective quality tests and robustness tests indicate that the audio quality is excellent and is robust to signal processing attacks. 相似文献

20.

Design of integrated multimedia compression and encryption systems 总被引：1，自引：0，他引：1

Chung-Ping Wu Kuo C.-C.J. 《Multimedia, IEEE Transactions on》2005,7(5):828-839

Two approaches for integrating encryption with multimedia compression systems are studied in this research, i.e., selective encryption and modified entropy coders with multiple statistical models. First, we examine the limitations of selective encryption using cryptanalysis, and provide examples that use selective encryption successfully. Two rules to determine whether selective encryption is suitable for a compression system are concluded. Next, we propose another approach that turns entropy coders into encryption ciphers using multiple statistical models. Two specific encryption schemes are obtained by applying this approach to the Huffman coder and the QM coder. It is shown that security is achieved without sacrificing the compression performance and the computational speed. This modified entropy coding methodology can be applied to most modern compressed audio/video such as MPEG audio, MPEG video, and JPEG/JPEG2000 images. 相似文献