共查询到20条相似文献,搜索用时 0 毫秒
1.
Lu Gan Chengjie Tu Jie Liang Trac D Tran Kai-Kuang Ma 《IEEE transactions on image processing》2007,16(2):428-441
It has been well established that critically sampled boundary pre-/postfiltering operators can improve the coding efficiency and mitigate blocking artifacts in traditional discrete cosine transform-based block coders at low bit rates. In these systems, both the prefilter and the postfilter are square matrices. This paper proposes to use undersampled boundary pre- and postfiltering modules, where the pre-/postfilters are rectangular matrices. Specifically, the prefilter is a "fat" matrix, while the postfilter is a "tall" one. In this way, the size of the prefiltered image is smaller than that of the original input image, which leads to improved compression performance and reduced computational complexities at low bit rates. The design and VLSI-friendly implementation of the undersampled pre-/postfilters are derived. Their relations to lapped transforms and filter banks are also presented. Two design examples are also included to demonstrate the validity of the theory. Furthermore, image coding results indicate that the proposed undersampled pre-/postfiltering systems yield excellent and stable performance in low bit-rate image coding. 相似文献
2.
A digital cellular mobile radio system has been under development in Europe since 1982 under the coordination of the working group CEPT GSM (groupe speciale mobile). In a recent coordinated experiment, listening opinion tests were performed on the speech output of six candidate 16 kb/s speech coding schemes for this system: one regular-pulse excited coder, one multiple-excited coder, and four subband coders. For comparison purposes, test conditions from a companded cellular FM system currently in operation were included in the experiment. The six codecs were companded in terms of subjective quality, transmission delay, and ease of implementation. In this overall comparison, no single codec was superior in all respects. However, the regular-phase-excited linear predictive coder, which provided the best speech quality, had acceptable complexity and delay and was singled out for further improvement. Ultimately, an improved version of this codec, a regular-pulse-excited/long-term-prediction LPC coder was selected 相似文献
3.
The authors summarise the findings of a feasibility study conducted to evaluate parallel implementations of a VSELP speech coder in digital radio 相似文献
4.
Linear prediction parameters within CELP coders are commonly represented by line spectral pairs (LSP), giving stable filters and efficient coding. However, LSP manipulation can also alter the frequencies of the represented signals. The authors use computationally efficient LSP manipulation to enhance the intelligibility of speech degraded by acoustic interference 相似文献
5.
6.
A new phase coding algorithm working in the pitch-cycle waveform domain is introduced. It provides accurate phase coding at low bit cost, thus being suitable for low bit rate sinusoidal coders. Its performance is analysed inside a multiband excitation (MBE) coder with improved onset representation. In this context, the introduction of original phase information by means of the proposed coding algorithm provides noticeable quality improvement without significantly increasing the complexity and total bit rate of the coder 相似文献
7.
Yoo-Sok Saw Peter M. Grant John M. Hannah Bernard Mulgrew 《Signal Processing: Image Communication》1998,13(3):93
Compressed video is a source of bursty traffic in communication networks whose data rate needs to be controlled within the available channel capacity, particularly, when it is transmitted via a fixed rate channel. Since the video rate is nonstationary and bursty at large-scene variations in a statistical sense, we propose a feed-forward, estimator-based rate control scheme associated with spatio-temporal activity features (STAF) for MPEG video encoders. This information is used to estimate the video rate of input picture frames. The estimated video rate enables the future buffer occupancy to be calculated and permits the encoder to adapt the quantisation step size to limit the increase or decrease in video rate due to dramatic scene variation. The current and future occupancies are used in a nonlinear quantiser control scheme to determine an appropriate quantisation step size depending on them. The novelty of this technique is that the nonlinear prediction and the nonlinear quantiser control are combined to achieve effective feed-forward video rate control, particularly, for realistic video containing various scene variations. In this paper, we highlight the innovative structure of the scheme and evaluate the performance of rate control algorithms with heuristic, linear and nonlinear rate estimators in the framework of the MPEG2 test model 5 video encoder. The performance measures are the occupancy of a two-frame delay buffer and peak signal-to-noise ratio (PSNR) for video quality. 相似文献
8.
The combination of speech coders and entropy coders is investigated, for bit rate reduction. Three speech coders of the celp (code excited linear prediction) type are considered and the residual correlation in lsp (line spectrum pairs) coefficients and gains in a speech frame is exploited. The lossless entropy coders use Huffman, Lzw (lempel ziv welch) and gzip (LZ-Huffrnan) techniques. The greatest efficiency is provided by the adaptive Huffman approach, with a 15 % gain in each type of compressed parameter and an overall average bit rate reduction of 7 % for the FS1016 coder and 5 % for the Tetra and lbc coders. 相似文献
9.
The introduction of new variable bit rate (VBR) speech coders has opened up new perspectives for the implementation of adaptive voice over IP (AVoIP) systems. The paper compares different VBR speech coding techniques in a scenario in which the rate of the single sources is dynamically adapted to the workload conditions. The coders compared are the AMR, the M3 R and the G.729. Using source and rate control mechanism models, performance is evaluated in terms of loss probability, offered throughput and mean CMOS with varying numbers of sources. The use of header compression mechanisms is also evaluated 相似文献
10.
11.
Techniques for improving the performance of CELP (code excited linear prediction)-type speech coders while maintaining reasonable computational complexity are explored. A harmonic noise weighting function, which enhances the perceptual quality of the processed speech, is introduced. The combination of harmonic noise weighting and subsample pitch lag resolution significantly improves the coder performance for voiced speech. Strategies for reducing the speech coder's data rate, while maintaining speech quality, are presented. These include a method for efficient encoding of the long-term predictor lags, utilization of multiple gain vector quantizers, and a multimode definition of the speech coder frame. A 5.9-kb/s VSELP speech coder that incorporates these features is described. Complexity reduction techniques which allow the coder to be implemented using a single fixed-point DSP (digital signal processor) are discussed 相似文献
12.
Low bit-rate efficient compression for seismic data 总被引:3,自引:0,他引:3
Averbuch A.Z. Meyer R. Stromberg J.-O. Coifman R. Vassiliou A. 《IEEE transactions on image processing》2001,10(12):1801-1814
Some marine seismic data sets exceed 10 Tbytes, and there are seismic surveys planned with a volume of around 120 Tbytes. The need to compress these very large seismic data files is imperative. Nevertheless, seismic data are quite different from the typical images used in image processing and multimedia applications. Some of their major differences are the data dynamic range exceeding 100 dB in theory, very often it is data with extensive oscillatory nature, the x and y directions represent different physical meaning, and there is significant amount of coherent noise which is often present in seismic data. Up to now some of the algorithms used for seismic data compression were based on some form of wavelet or local cosine transform, while using a uniform or quasiuniform quantization scheme and they finally employ a Huffman coding scheme. Using this family of compression algorithms we achieve compression results which are acceptable to geophysicists, only at low to moderate compression ratios. For higher compression ratios or higher decibel quality, significant compression artifacts are introduced in the reconstructed images, even with high-dimensional transforms. The objective of this paper is to achieve higher compression ratio, than achieved with the wavelet/uniform quantization/Huffman coding family of compression schemes, with a comparable level of residual noise. The goal is to achieve above 40 dB in the decompressed seismic data sets. Several established compression algorithms are reviewed, and some new compression algorithms are introduced. All of these compression techniques are applied to a good representation of seismic data sets, and their results are documented in this paper. One of the conclusions is that adaptive multiscale local cosine transform with different windows sizes performs well on all the seismic data sets and outperforms the other methods from the SNR point of view. All the described methods cover wide range of different data sets. Each data set will have his own best performed method chosen from this collection. The results were performed on four different seismic data sets. Special emphasis was given to achieve faster processing speed which is another critical issue that is examined in the paper. Some of these algorithms are also suitable for multimedia type compression. 相似文献
13.
The paper presents a speech coding algorithm for operation at 11025 samples/s. The coder provides improved speech quality and compatibility with the MS‐Windows multimedia environment. The coding algorithm has been developed by adapting the ITU G729 and enhancing it with some recent developments in the medium band coding. The coder operates over a band of frequencies ranging from 20 to 5400 Hz at a bit rate of 8.9 kbit/s. Application of this coder includes intranet VoIP, voice chatting, multimedia communications, and voice archiving. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献
14.
Ong L.K. Kondoz A.M. Evans B.G. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(3):191-196
Spectral efficiency in digital voice communications in personal communications networks is primarily provided for by advanced speech coding techniques. A threat to the quality and low bit rate requirements of these speech coders is imposed by the transmission channel. Channel coding is thus considered mandatory, but its performance is limited to specific channel conditions and system constraints like bandwidth and transmission power. The authors describe a technique for exploiting source redundancy in speech coders for further improving the performance of the channel coding scheme without the obligatory increased redundancy 相似文献
15.
Multicast routing for multimedia communication 总被引:3,自引:0,他引:3
The authors present heuristics for multicast tree construction for communication that depends on: bounded end-to-end delay along the paths from source to each destination and minimum cost of the multicast tree, where edge cost and edge delay can be independent metrics. The problem of computing such a constrained multicast tree is NP-complete. It is shown that the heuristics demonstrate good average case behavior in terms of cost, as determined by simulations on a large number of graphs 相似文献
16.
Although humans rely primarily on hearing to process speech, they can also extract a great deal of information with their eyes through lipreading. This skill becomes extremely important when the acoustic signal is degraded by noise. It would, therefore, be beneficial to find methods to reinforce acoustic speech with a synthesized visual signal for high noise environments. This paper addresses the interaction between acoustic speech and visible speech. Algorithms for converting audible speech into visible speech are examined, and applications which can utilize this conversion process are presented. Our results demonstrate that it is possible to animate a natural-looking talking head using acoustic speech as an input 相似文献
17.
Michael Elad Roman Goldenberg Ron Kimmel 《IEEE transactions on image processing》2007,16(9):2379-2383
An efficient approach for face compression is introduced. Restricting a family of images to frontal facial mug shots enables us to first geometrically deform a given face into a canonical form in which the same facial features are mapped to the same spatial locations. Next, we break the image into tiles and model each image tile in a compact manner. Modeling the tile content relies on clustering the same tile location at many training images. A tree of vector-quantization dictionaries is constructed per location, and lossy compression is achieved using bit-allocation according to the significance of a tile. Repeating this modeling/coding scheme over several scales, the resulting multiscale algorithm is demonstrated to compress facial images at very low bit rates while keeping high visual qualities, outperforming JPEG-2000 performance significantly. 相似文献
18.
The problem of predictor mistracking for narrowband signals in backward adaptive ADPCM (adaptive digital pulse code modulation) speech coders is shown to arise as a result of feedback from the signal reconstruction filter to the predictor adaptation process. A class of residual-signal-driven lattice predictors (PR) is defined that guarantees tracking for all signals without regard to the order of prediction. The LR predictor enhances speech and DTMF (dual-tone multifrequency) signal transmission performance in the presence of transmission errors. Under error-free transmission conditions, a segmental SNR (signal-to-noise ratio) drop for speech of nearly 2 dB may be encountered for the LR predictor relative to the classical signal-drive lattice predictor. In most practical telecommunication applications, however, this degradation is outweighed by the improved robustness of the predictor 相似文献
19.
The evolving multimedia applications generate requirements for complex transport capabilities, i.e., functional features, in the end-to-end communication system such as handling of heterogeneity among communicating terminals, supporting finer levels of user-specifiable quality of data transport service, and synchronization of various data streams for delivery at users in real time. Accordingly, the communication system may be viewed as extending the basic capabilities provided by the backbone network (e.g., bandwidth allocation) into a set of transport capabilities suitable for complex applications. This paper presents: (1) an object-oriented view of the user interface to the communication system with an elegant separation of data transport functionalities, and (2) an approach to the design of underlying transport protocols. The object-orientation decomposes an application-level data transport into a set of network channel objects, with each channel object handling a separate data stream. The object interactions are modeled using a “data-flow programming” style, which allows a richer set of protocols to implement the communication system and offers flexibility to accommodate complex and heterogeneous subscriber services/terminals. The “data-flow programming” method also allows a high degree of communication level parallelism among data transport through channels. The view of a multimedia communication system as a “parameterizable black-box”, as underscored in the object-oriented structuring, allows easier interworking of the communication system with existing networks and easier integration of multimedia transport into programming environments 相似文献