期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On Integer MDCT for Perceptual Audio Coding

Te Li Rahardja S. Rongshan Yu Soo Ngee Koh 《IEEE transactions on audio, speech, and language processing》2007,15(8):2236-2248

In MPEG-4 scalable lossless coding (SLS) which was recently published as an ISO standard in June 2006, the integer modified discrete cosine transform (IntMDCT) was adopted to enable efficient lossless reconstruction. In addition, there is an MDCT filterbank which is inherent to the advanced audio coding (AAC) core that is present in the SLS codec. The presence of two filterbanks have undoubtedly increased the complexity of the implementation, and it is for this reason that the MDCT is disabled and the IntMDCT is then the only type of filterbank that is employed in SLS for both lossy and lossless operations. Because of the rounding operations in the IntMDCT, there is a concern if the use of IntMDCT for perceptual audio coding will eventually degrade the fidelity of the audio codec. This paper addresses this concern by analyzing the performance of the IntMDCT in a lossy coding scenario. It is found that noise introduced by the IntMDCT does not affect the perceptual quality of the coded audio under standard playback circumstances. As such, it concludes that the MDCT and IntMDCT filterbanks are interchangeable at lossy bitrate, and the way of using only the IntMDCT filterbank in scalable audio coding is also justified. 相似文献

2.

Frequency Region-Based Prioritized Bit-Plane Coding for Scalable Audio

Te Li Rahardja S. Soo Ngee Koh 《IEEE transactions on audio, speech, and language processing》2008,16(1):94-105

A perceptually enhanced prioritized bit-plane audio coding algorithm is presented in this paper. According to the energy distribution in different frequency regions, the bit-planes are prioritized with optimized parameters. Based on the statistical modeling of the frequency spectrum, a much more simplified implementation of prioritized bit-plane coding is integrated with the recent release of MPEG-4 scalable lossless (SLS) audio coding structure by replacing the sequential bit-plane coding in the enhancement layer. With zero extra side information, trivial added complexity, and modification to the original SLS structure, extensive experimental results show that the perceptual quality of SLS with noncore and very low core bit-rate is improved significantly in a wide range of bit-rate combinations. Fully scalable audio coding up to lossless with much enhanced perceptual quality is thus achieved. 相似文献

3.

A fine granular scalable to lossless audio coder

Rongshan Yu Rahardja S. Lin Xiao Chi Chung Ko 《IEEE transactions on audio, speech, and language processing》2006,14(4):1352-1363

This paper presents Advanced Audio Zip (AAZ), a fine grained scalable to lossless (SLS) audio coder that has recently been adopted as the reference model for MPEG-4 audio SLS work. AAZ integrates the functionalities of high-compression perceptual audio coding, fine granular scalable audio coding, and lossless audio coding in a single framework, and simultaneously provides backward compatibility to MPEG-4 Advanced Audio Coding (AAC). AAZ provides the fine granular bit-rate scalability from lossy to lossless coding, and such a scalability is achieved in a perceptually meaningful way, i.e., better perceptual quality at higher bit-rates. Despite its abundant functionalities, AAZ only introduces negligible overhead in terms of lossless compression performance compared with a nonscalable, lossless only audio coder. As a result, AAZ provides a universal yet efficient solution for digital audio applications such as audio archiving, network audio streaming, portable audio playing, and music downloading which were previously catered for by several different audio coding technologies, and eliminates the need for any transcoding system to facilitate sharing of digital audio contents across these application domains. 相似文献

4.

基于变分正则化的低质视频图像二维增强仿真

王雪红《计算机仿真》2020,(4):402-405,445

受环境、硬件设备及成本等因素限制,所生产的视频质量低,存在大量冗余噪声,导致局部区域图像模糊不清,无法对目标个体判定或识别。为解决上述问题,构建一种基于变分正则化的低质视频图像二维增强方法。通过采集先验信息,确定噪声去除和质量增强区域,随后使用ROF经典模型和偏微分对原始图像做平滑及去噪处理,并提出前后扩散方程,使方法能够有效抑制背景区域噪声,并对目标个体边缘做锐化处理,利用变分正则化提升视频图像整体分辨率,使其完成图像二维增强。仿真结果表明,所提方法具有控制噪声与加强图像质量双重优势,有效去除、抑制噪声影响,提升目标个体区域强散射点,且方法复杂程度较低,能够实现低质视频图像的均衡优化。相似文献

5.

Audio object coding based on optimal parameter frequency resolution

Wu Tingzhao Hu Ruimin Wang Xiaochen Ke Shanfa 《Multimedia Tools and Applications》2019,78(15):20723-20738

Object-based audio content is becoming the main form of audio content, because it is more interactive and flexible than traditional channel-based audio content. The Spatial Audio Object Coding (SAOC) method is proposed to encode multiple audio objects at low bitrate. However, SAOC extracts only a few parameters for each frame signal, which leads to low parameter frequency resolution. So the decoded signals have serious aliasing distortion which will destroy the sound quality. In this paper, we present a novel audio object coding method. We are the first to analyze how the signal distortion varies with parameter frequency resolution, and determine the optimal resolution to reduce aliasing distortion. In addition, we also achieve low coding bitrate by the dimensional reduction algorithm. Both the objective and subjective experiments confirm that the proposed method can provide higher sound quality of output signals than the state-of-the-art methods at equivalent bitrate.

相似文献

6.

Scalable Audio Compression at Low Bitrates

Kandadai S. Creusere C.D. 《IEEE transactions on audio, speech, and language processing》2008,16(5):969-979

A perceptually scalable audio coder generates a bit-stream that contains layers of audio fidelity and is encoded in such a way that adding one of these layers enhances the reconstructed audio by an amount that is just noticeable by the listener. Such algorithms have applications like music on demand at variable levels of fidelity, for instance using 3G and 4G cellular radio systems operating at different bit rates. While the MPEG-4 natural audio coder can create finely scalable bit streams using bit sliced arithmetic coding (BSAC), its perceptual quality at low bit rates is poor. On the other hand, the nonscalable transform-domain weighted interleaved vector quantization (TWIN-VQ) performs well at low bit rates. In this paper, we present a modified version of TWIN-VQ algorithm that generates a perceptually scalable bit-stream with many fine layers of audio fidelity. Using TWIN-VQ as our base ensures the best possible perceptual quality at low bit rates. Specifically, the proposed scalable algorithm performs as well as TWIN-VQ at rates of 8 to 16 kb/s and outperforms scalable BSAC by between 64% and 172% at rates of less than 24 kb/s. 相似文献

7.

Embedded coding using a mixed speech and audio coding paradigm

Sean A. Ramprashad 《International Journal of Speech Technology》1999,2(4):359-372

A two stage hybrid embedded speech/audio coding structure and algorithm are proposed. The first stage of the structure consists of a core speech coder which provides a minimum output bit rate and acceptable performance on clean speech inputs. The second stage is a perceptual/transform based coder which provides a separate optional bitstream for the enhancement of the core stage output.The two stage structure can be used to enhance the quality of an existing codec without modification of the original coding algorithm. In this regard it can be considered a value added option that can be used with a standard (existing) system. The structure can also be used in systems in which many users/systems force the coding algorithm to work simultaneously under multiple constraints of bitrate, complexity, delay, and coding quality.Informal testing of the algorithm has been done using ITU-T standard G.723.1 at 5.3 kb/s as a core coder. The maximum combined bitrate from the core and enhancement stages for the tests is 16 kb/s. The tests show that the second stage significantly improves the quality of the core output in the cases of music and speech with background noise. Compared to the non-embedded fixed rate standard LD-CELP G.728 at 16 kb/s, the quality of the two stage structure is generally lower on these inputs; the embedded feature does affect quality. On clean speech the quality of the two stage structure at 16 kb/s is close to if not better than that of G.728 at 16 kb/s. 相似文献

8.

A pixel-based outlier-free motion estimation algorithm for scalable video quality enhancement

Xuan DONG Jiangtao WEN 《Frontiers of Computer Science》2015,9(5):729

Scalable video quality enhancement refers to the process of enhancing low quality frames using high quality ones in scalable video bitstreams with time-varying qualities. A key problem in the enhancement is how to search for correspondence between high quality and low quality frames. Previous algorithms usually use block-based motion estimation to search for correspondences. Such an approach can hardly estimate scale and rotation transforms and always introduces outliers to the motion estimation results. In this paper, we propose a pixel-based outlier-free motion estimation algorithm to solve this problem. In our algorithm, the motion vector for each pixel is calculated with respect to estimate translation, scale, and rotation transforms. The motion relationships between neighboring pixels are considered via the Markov random field model to improve the motion estimation accuracy. Outliers are detected and avoided by taking both blocking effects and matching percentage in scaleinvariant feature transform field into consideration. Experiments are conducted in two scenarios that exhibit spatial scalability and quality scalability, respectively. Experimental results demonstrate that, in comparison with previous algorithms, the proposed algorithm achieves better correspondence and avoids the simultaneous introduction of outliers, especially for videos with scale and rotation transforms. 相似文献

9.

Adaptive Signal Modeling Based on Sparse Approximations for Scalable Parametric Audio Coding

Ruiz Reyes N. Vera Candeas P. 《IEEE transactions on audio, speech, and language processing》2010,18(3):447-460

This paper deals with the application of adaptive signal models for parametric audio coding. A fully parametric audio coder, which decomposes the audio signal into sinusoids, transients and noise, is here proposed. Adaptive signal models for sinusoidal, transient, and noise modeling are therefore included in the parametric scheme in order to achieve high-quality and low bit-rate audio coding. In this paper, a new sinusoidal modeling method based on a perceptual distortion measure is proposed. For transient modeling, a fast and effective method based on matching pursuit with a mixed dictionary is chosen. The residue of the previous models is analyzed as a noise-like signal. The proposed parametric audio coder allows high quality audio coding for one-channel audio signals at 16 kbits/s (average bit rate). A bit-rate scalable version of the parametric audio coder is also proposed in this work. Bit-rate scalability is intended for audio streaming applications, which are highly demanded nowadays. The performance of the proposed parametric audio coders (nonscalable and scalable coders) is assessed in comparison to widely used audio coders operating at similar bit rates. 相似文献

10.

H.264可伸缩视频编码层间码率控制算法

杨金孙宇孙世新《计算机应用》2011,31(9):2457-2460

为H.264/AVC标准的可伸缩编码(SVC)扩展提出了一种自适应的层间码率控制算法。该算法提出了一个选择模型,通过当前层的前一帧或者前一层的当前帧来预测Inter帧所需比特数。首先,将码率—复杂度—量化因子(R-C-Q)模型引入可伸缩的视频编码;接着,使用一个已有的比例—积分—微分(PID)缓冲区控制器来根据缓冲区状态提供当前Inter帧的比特数估计;然后,为了在视频画面发生急剧变化时获得更为精确的估计,利用前一层中的当前帧所用实际比特数来进一步进行当前Inter帧比特数估计;最后,使用选择模型决定最终的预测比特数,并通过R-C-Q模型计算出量化因子(QP)。实验结果表明,相对于推荐的JVT-043码率控制算法,所提出的算法可以在SVC的每层获得更加精确的实际输出比特率,保持缓冲区充盈度的稳定,同时减少跳帧和质量波动,提高整体编码质量。相似文献

11.

基于心理声学模型的多码率零树小波音频压缩方法 总被引：3，自引：0，他引：3

何冬梅高文《计算机学报》2000,23(3):278-284

ＭＰＥＧ－４音频编码标准不仅对码率和音质提出了更高的要求,而且还要求编码器具有多种功能以满足各种不同应用的需要,该文利用不同尺度小波系数的自相似特性和人耳的掩蔽效应,提出了一种基于心理声学模型的零树小波音频编码算法。该算法不仅可在低码率（５６ｋｂ／ｓ）上得到透明质量的ＣＤ音频信号,而且可产生嵌入式码流,在最优意义上支持多码率的可分级编码,是一种很有前途的适用一多媒体通信等领域的编码方案。相似文献

12.

Audio enhancement using local SNR-based sparse binary mask estimation and spectral imputation

《Digital Signal Processing》2017

This paper proposes a method for enhancing speech and/or audio quality under noisy conditions. The proposed method first estimates the local signal-to-noise ratio (SNR) of the noisy input signal via sparse non-negative matrix factorization (SNMF). Next, a sparse binary mask (SBM) is proposed that separates the audio signal from the noise by measuring the sparsity of the pool of local SNRs from the adjacent frequency bands of the current and several previous frames. However, some spectral gaps remain across frequency bands after applying the binary masks, which distorts the separated audio signal due to spectral discontinuity. Thus, a spectral imputation technique is used to fill the empty spectrum of the frequency band where it is removed by the SBM. Spectral imputation is conducted by online learning NMF with the spectra of the neighboring non-overlapped frequency bands and their local sparsity. The effectiveness of the proposed enhancement method is demonstrated on two different tasks use speech and musical content, respectively. Consequently, objective measurements and subjective listening tests show that the proposed method outperforms conventional speech and audio enhancement methods, such as SNMF-based alternatives and deep recurrent neural networks for speech enhancement, block thresholding, and a commercially available software tool for audio enhancement. 相似文献

13.

H.264/SVC的RTP封装算法及其应用

下载免费PDF全文

柳伟陈旭梁永生《计算机工程与应用》2010,46(19):27-32

可伸缩视频编码（Scalable Video Coding,SVC）一般采用实时传输协议（Real-time Transport Protocol,RTP）保证视频数据流的实时传输和质量监测。在分析SVC码流结构和RTP 协议的基础上实现了H.264/SVC视频数据的RTP 封装算法,提出基本层与增强层分离的方法用于模拟可伸缩视频流在模拟测试环境中的传输,提出基于RTP 封装的差错隐藏方法解决质量增强层数据丢失问题。实验结果证明了封装算法的有效性、标准兼容性和可扩展性。相似文献

14.

改进的MPEG-4静态纹理BQ模式编码方法

张海翔陈纯庄越挺《计算机辅助设计与图形学学报》2003,15(4):488-494,499

从零树符号的信息含义及其与编码层目标图像的一致性出发，研究MPEG-4VTC工具中BQ模式编码(即PEZW方法)的改进问题．提出基于符号分解的：PEZW改进方法——SP-PEZW方法．提出零树符号的分解表示方法，将零树符号中包含的两部分信息用两个分解符号分别表示，并通过删除编码层中与目标图像“不一致”的分解符号来提高编码层的压缩比．实验表明，SP-PEZW方法与PEZW方法相比，低分辨率编码层有更高的压缩比，低分辨率空间层以指定码率解码的图像质量有明显提高；高分辨率编码层的压缩比并没有明显下降，而高分辨率空间层以指定码率解码的图像质量略有提高．更重要的是，SP-PEZW方法的实现只是在PEZW方法中增加一个零树符号分解编码环节，同时还继承了PEZW方法的随机读取、有容错性、可实现局部编码、内存需求小等多种重要特性。相似文献

15.

A frame-level encoder rate control scheme for transform domain Wyner-Ziv video coding

Jian Chen Shuai Zheng Qing Hu Yonghong Kuo 《Multimedia Tools and Applications》2017,76(20):20567-20585

Available distributed video coding codecs are mostly based on decoder rate control scheme where the parity bits for decoding can be achieved over a feedback channel. Meanwhile, the frequent requests over feedback channel increase the transmission delay. The feedback-free distributed video coding, relying on encoder rate control in literatures, has overcome the aforementioned shortcoming. However, when performing parity bitrate estimation and other operations, the feedback-free distributed video coding systems based on bit-plane usually require high precision of bitrate estimation and high quality of side information at the encoder. In this paper, we propose a frame-level distributed video coding system based on encoder rate control. The innovations include three parts: 1) an adaptive coding mode selection algorithm is proposed, which utilizes both temporal and spatial correlation and reduces the complexity of encoder; 2) a bit-plane rearrangement method is adopted, which makes the coding rate on each bit-plane homogeneous and effectively reduces the accuracy requirement of the parity bitrate prediction and improves the efficiency of rate estimation; 3) a frame-level parity bitrate estimation scheme is presented to enhance the efficiency of rate estimation on the basis of a look-up table. Numerical results verify that the proposed scheme remarkably improves the rate distortion performance of distributed video coding at low bitrate. 相似文献

16.

Semantic Annotation and Retrieval of Music and Sound Effects

Turnbull D. Barrington L. Torres D. Lanckriet G. 《IEEE transactions on audio, speech, and language processing》2008,16(2):467-476

We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our ldquoquery-by-textrdquo system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects. 相似文献

17.

基于 SDN 的密集小蜂窝网络中可伸缩视频传输优化

杨恩众杨博文古亮刘嘉《集成技术》2019,8(4):14-23

为了优化多媒体数据在无线网络中的传输,该文将密集型小蜂窝网络、软件定义网络以及可伸缩视频编码技术相结合,设计了一个完整的多媒体视频传输系统。其中,通过密集型小蜂窝网络基站间的协作,提高无线频谱利用率;通过自适应码率调节技术,为不同用户提供差分服务。该文以最大化用户体验质量为目标,联合决策用户视频质量和无线资源分配。采用李雅普诺夫优化理论,将原问题转化为两个独立的子问题分别进行求解,并给出了仅依赖当前观测信息的低复杂度算法。实验结果显示,该文所提出的算法在动态环境下能够做出较好响应,并且可以实现更高的用户体验。相似文献

18.

基于分层结构的多描述编码 总被引：1，自引：0，他引：1

下载免费PDF全文

陈婧蔡灿辉《中国图象图形学报》2008,13(1):47-52

为了适应不可靠网络传输对编码抗干扰能力的要求,提出了一种基于分层结构的多描述编码方案,该方案根据质量可分级的编码思想将小波系数分为两个描述,独立传输。其每个描述均包含基本层和增强层两部分信息。基本层由系数的关键信息组成,可拷贝到两个描述中同时传输,而增强层是系数的剩余信息,可拆分成两个部分通过不同描述传输。由于每个描述均包含可以恢复图像基本质量的关键信息,因此即使在丢失一个描述的情况下也可以保证较高的图像重建质量。实验结果表明,该方案是正确的和有效的,其在保证较高编码效率的同时,还能提高编码的抗干扰能力。相似文献

19.

Rate-Distortion Analysis and Quality Control in Scalable Internet Streaming

Dai M. Loguinov D. Radha H. M. 《Multimedia, IEEE Transactions on》2006,8(6):1135-1146

Rate-distortion (R-D) modeling of video coders has always been an important issue in video streaming; however, few of the traditional R-D models and their performance have been closely examined in the context of scalable (FGS-like) video. To overcome this shortcoming, the first half of the paper models rate-distortion of DCT-based fine-granular scalable coders and derives a simple operational R-D model for Internet streaming applications. Experimental results demonstrate that this R-D result, an extension of the classical R-D formula, is very accurate within the domain of scalable coding methods exemplified by MPEG-4 FGS and H.264 progressive FGS. In the second half of the paper, we examine congestion control and dynamic rate-scaling algorithms that achieve smooth visual quality during streaming using the proposed R-D model. In constant bitrate (CBR) channels, our R-D based quality-control algorithm dramatically reduces PSNR variation between adjacent frames (to less than 0.1 dB in sample sequences). Since the Internet is a changing environment shared by many sources, even R-D based quality control often cannot guarantee nonfluctuating PSNR in variable-bitrate (VBR) channels without the help from an appropriate congestion controller. Thus, we apply recent utility-based congestion control methods to our problem and show how a combination of this approach and our R-D model can benefit future streaming applications 相似文献

20.

指纹图像多尺度分类字典稀疏增强

下载免费PDF全文

徐德琴卞维新丁新涛丁玉祥《中国图象图形学报》2018,23(7):1014-1023

目的自动指纹识别系统大多是基于细节点匹配的,系统性能依赖于输入指纹质量。输入指纹质量差是目前自动指纹识别系统面临的主要问题。为了提高系统性能,实现对低质量指纹的增强,提出了一种基于多尺度分类字典稀疏表示的指纹增强方法。方法首先,构建高质量指纹训练样本集,基于高质量训练样本学习得到多尺度分类字典;其次,使用线性对比度拉伸方法对指纹图像进行预增强,得到预增强指纹;然后,在空域对预增强指纹进行分块,基于块内点方向一致性对块质量进行评价和分级;最后,在频域构建基于分类字典稀疏表示的指纹块频谱增强模型,基于块质量分级机制和复合窗口策略,结合频谱扩散,基于多尺度分类字典对块频谱进行增强。结果在指纹数据库FVC2004上将提出算法与两种传统指纹增强算法进行了对比实验。可视化和量化实验结果均表明,相比于传统指纹增强算法,提出的方法具有更好的鲁棒性,能有效改善低质量输入指纹质量。结论通过将指纹脊线模式先验引入分类字典学习,为拥有不同方向类别的指纹块分别学习一个更为可靠的字典,使得学习到的分类字典拥有更可靠的脊线模式信息。块质量分级机制和复合窗口策略不仅有助于频谱扩散,改善低质量块的频谱质量,而且使得多尺度分类字典能够成功应用,克服了增强准确性和抗噪性之间的矛盾,使得块增强结果更具稳定性和可靠性,显著提升了低质量指纹图像的增强质量。相似文献