首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
自适应多码率语音编码流的可靠传输   总被引:4,自引:3,他引:4  
赵训威  张平  王檀 《通信学报》2004,25(5):175-181
自适应多码率语音编码已入选为第三代移动通信系统的语音压缩编码方案。本文提出了一种适合压缩语音传输的联合信源信道编码方法并对其性能进行了统计比较。利用压缩语音比特流中的固用冗余的信道译码算法是本文的研究重点。仿真结果表明利用信源冗余信息的信道译码器可以获得较大的编码增益。本文所用的信道编码方案为适合语音传输的卷积码。  相似文献   

2.
A new type of scene adaptive coder has been developed. It involves a quadtree mean decomposition of the motion-compensated frame-to-frame difference signal followed by a scalar quantization of the local means. As a fundamental property, the new coding algorithm treats the displacement estimation problem and the quadtree construction problem as a unit. The displacement vector and the related quadtree are jointly optimized in order to minimize the direct frame-to-frame update information rate (in bits), which turns up as a new and more adequate cost function in displacement estimation. This guarantees the highest possible data compression ratio at a given quality threshold. Excellent results have been obtained for coding of color image sequences at a rate of 64 kb/s. The quadtree concept entails a much lower computational complexity compared to the conventional motion-compensated transform coder while achieving a subjective image quality that is as good or better than that of the traditional transform-based counterpart  相似文献   

3.
Yoma  N.B. McInnes  F. Jack  M. 《Electronics letters》1996,32(15):1350-1352
The problem of speech pulse detection with additive noise at a signal-to-noise ratio (SNR) as low as 0 and -6 dB is addressed. The noise is assumed to be reasonably stationary and correlated. Three techniques have been examined: the autoregressive analysis of noise; spectral density comparison; and the non-stationarity measure  相似文献   

4.
In this paper, we present a median-rate speech coder, the controlled adaptive prediction delta modulation coder (CAPDM), which operates at 16 kb/s with good speech quality and low algorithm complexity. The coder is dedicated to personal communication network (PCN) applications and transmits speech samples on the basis of packets. It combines the features of a one-step looking forward decision, syllabic companding, instantaneous companding, and adaptive prediction. In addition to the use of a short-term prediction filter, CAPDM also exploits the pitch property to predict speech waveform explicitly. With the aid of a pitch prediction filter, the performance of a CAPDM codec improves about 3 dB in segmental signal-to-noise ratio (SEGSNR). The average SEGSNR of CAPDM.FF is about 21 dB, which is 7 dB over traditional CVSD at 16 kb/s. We also utilize an adaptive postfilter (APF) to enhance the perceptual quality of the decoded speech. The mean opinion score (MOS) listening test of CAPDM.FF with APF shows that its average score achieves 4.19, which is as good as G.728 16-kb/s LD-CELP and is comparable with CCITT G.721 32-kb/s ADPCM. The complexity of CAPDM.FF is evaluated to be 8 MIPS, which is much lower than that of LD-CELP and could be further reduced by adopting a smaller correlation window for pitch detection. To solve the problem of packet loss, we developed a packet-based waveform substitution method by reinitializing the codec parameters at the beginning of each packet. The simulation results show that CAPDM.FF could tolerate 5% of packet loss and still keep an SEGSNR at 10 dB and an MOS at about 3.0  相似文献   

5.
A new phase coding algorithm working in the pitch-cycle waveform domain is introduced. It provides accurate phase coding at low bit cost, thus being suitable for low bit rate sinusoidal coders. Its performance is analysed inside a multiband excitation (MBE) coder with improved onset representation. In this context, the introduction of original phase information by means of the proposed coding algorithm provides noticeable quality improvement without significantly increasing the complexity and total bit rate of the coder  相似文献   

6.
Lee  J.I. Un  C.K. 《Electronics letters》1989,25(19):1275-1277
Although the speech quality of the code excited linear prediction (CELP) coder at 4800 bit/s is relatively good, it is still perceived as rough or noisy. The authors propose a residual shaping method that produces, without signal distortion, quality comparable to that obtained by adaptive postfiltering. The proposed method is particularly effective in the multiple tandeming environment.<>  相似文献   

7.
Results of experimental comparisons of forward- and backward-adaptive prediction in differential pulse code modulation (DPCM) of speech are presented. Two different types of comparisons are conducted. In one comparison, both predictors are used with the same three/five-level pitch compensating quantizer (PCQ). For this comparison, forward prediction clearly outperforms backward prediction, but with the penalty of a 10% increase in data rate due to the need to transmit coefficients. In the second comparison, the forward-prediction DPCM system and the backward-prediction DPCM system are constrained to have the same data rate of 16 kbits/sec. The backward-adaptive predictor outperforms forward prediction for this latter comparison. The speech data base for the simulations is one sentence spoken by a male speaker in four different languages: English, French, German, and Arabic. The performance comparisons are based on signal-to-quantization noise ratio, signal-to-prediction error ratio, sound spectrograms, and formal subjective listening tests.  相似文献   

8.
Hwai-Tsu Hu 《Electronics letters》1998,34(14):1385-1386
A linear prediction is formulated via the orthogonal principle to facilitate the incorporation of various error minimisation criteria. A weighting function, which downplays extreme errors, is used to provide a robust estimate lying between the L1 and L2 criteria. Experiments based on synthetic vowels reveal that the proposed method outperforms the L1 and L2 estimates  相似文献   

9.
An adaptive predictive coder providing almost toll quality at 16 kb/s and minimal degradation when the bit rate is lowered to 9.6 kb/s is described. The coder can operate at intermediate bit rates and can also change bit rate on a packet-by-packet basis. Variable bit rate operation is achieved through the use of switched quantization, thus eliminating the need for buffering of the output. A noise shaping filter provides flexible control of the output noise spectrum. The filter, in conjunction with an enhanced way to adapt the quantizer step size, which tries to accommodate the quantization noise feedback, accounts for the toll quality. By quantizing the residue with more than one quantizer, the effective number of bits per sample can be controlled in a deterministic way regardless of the entropy residue. The lower limit of operation is at 9.6 kb/s. Performance of the coder under random bit errors is also presented. It has been found that only at error rates of 10-2 and higher does the degradation becomes objectionable  相似文献   

10.
In this paper, we establish a probabilistic framework for adaptive transform coding that leads to a generalized Lloyd type algorithm for transform coder design. Transform coders are often constructed by concatenating an ad hoc choice of transform with suboptimal bit allocation and quantizer design. Instead, we start from a probabilistic latent variable model in the form of a mixture of constrained Gaussian mixtures. From this model, we derive an transform coder design algorithm, which integrates optimization of all transform coder parameters. An essential part this algorithm is our introduction of a new transform basis-the coding optimal transform-which, unlike commonly used transforms, minimizes compression distortion. Adaptive transform coders can be effective for compressing databases of related imagery since the high overhead associated with these coders can be amortized over the entire database. For this work, we performed compression experiments on a database of synthetic aperture radar images. Our results show that adaptive coders improve compressed signal-to-noise ratio (SNR) by approximately 0.5 dB compared with global coders. Coders that incorporated the coding optimal transform had the best SNRs on the images used to develop the coder. However, coders that incorporated the discrete cosine transform generalized better to new images.  相似文献   

11.
Variable Rate (VR) speech coders are classified into: source-controlled VR coders where the rate is selected depending on the local character of the speech, and network-controlled VR coders where an external control signal selects the coding rate. The first category benefits from the variable rate channels used by Code Division Multiple Access (CDMA) mobile communications. The second category is indispensable for the right behaviour of the CDMA systems under conditions as high traffic levels. The VR speech coder presented in this communication exhibits both types of control. The source control is achieved by means of a Voice Activity Detector (VAD) and a phonetic classifier. The network control acts on the selection procedure of the multipulse excitation sequence to the synthesis filter. This is the main advantage of our VR MultiPulse speech coder because by means of an external signal the bit rate can be changed only every 4 msec, without transitions or distortions. Considering one-way communication, six different operating rates can be externally selected ranging from 4.8 to 9.1 kbps for the active frames; an average bit rate of 380 bps is required for the noise frames.This work has been partly funded by the Spanish Research National Plan under grant no. TIC92-0800-C05-02 and by Northern Telecom.  相似文献   

12.
A robust quantiser design for image coding is presented. The proposed quantiser can be viewed as the combination of compound of a quantiser, a variable length code (VLC) coder, and a channel coder. Simulation results show that our proposed scheme has a graceful distortion behaviour within the designed noise range  相似文献   

13.
耳语音是一种语言方式,是指声带轻微振动或者不振动的轻声说话。本文对已经收集形成的语音库的基础之上进行了一系列研究,在此基础上就正常音和耳语音对共振峰位置带宽进行研究计算,得出其相应的变化比例,最终获得了耳语音在共振峰的基本特点。  相似文献   

14.
New learning algorithms for an adaptive nonlinear forward predictor that is based on a pipelined recurrent neural network (PRNN) are presented. A computationally efficient gradient descent (GD) learning algorithm, together with a novel extended recursive least squares (ERLS) learning algorithm, are proposed. Simulation studies based on three speech signals that have been made public and are available on the World Wide Web (WWW) are used to test the nonlinear predictor. The gradient descent algorithm is shown to yield poor performance in terms of prediction error gain, whereas consistently improved results are achieved with the ERLS algorithm. The merit of the nonlinear predictor structure is confirmed by yielding approximately 2 dB higher prediction gain than a linear structure predictor that employs the conventional recursive least squares (RLS) algorithm  相似文献   

15.
16.
This paper describes the implementation of a Speech Understanding System component which tracks the formants of pseudo-syllabic nuclei containing voiced consonants. The nuclei are isolated from continuous speech after a precategorical classification in which feature extraction is carried out by modules organized in a hierarchy of levels. FFT and LPC spectra are the input to the formant tracking system. It works under the control of rules specifying the possible formant evolutions given previously hypothesized phonetic features and produces fuzzy graphs rather than usual formant patterns because formants are not always evident in the spectrogram pattern.  相似文献   

17.
The authors propose an efficient perceptual adaptive quantisation scheme that uses Q-N (quantisation step-number of output bits) mapping tables and fuzzy perceptual classifiers. Using this adaptive quantisation scheme, stable perceptual quality and good visual effects can be obtained at a constant output bit rate  相似文献   

18.
G.723.1语音编码器算法的聚类优化策略及其应用   总被引:2,自引:0,他引:2  
在实现G.723.1编码器算法的基础上,提出了一种针对码本搜索的聚类优化策略,利用聚类分析的结果,对原始码本进行了分组及重构等处理,从而实现了快速搜索。测试及系统的实际运行情况表明,在使算法复杂度下降约1.46MIPS的条件下,优化结果仍能保证解码端的音质无明显下降。  相似文献   

19.
A code tree generated by a stochastically populated innovations tree with a backward adaptive gain and backward adaptive synthesis filters is considered. The synthesis configuration uses a cascade of two all-pole filters: a pitch (long time delay) filter followed by a formant (short time delay) filter. Both filters are updated using backward adaptation. The formant predictor is updated using an adaptive lattice algorithm. The multipath (M, L) search algorithm is used to encode the speech. A frequency-weighted error measure is used to reduce the perceptual loudness of the quantization noise. The addition of the pitch filter gives 2-10-dB increase in segSNR (segmental signal-to-noise ratio) in the voiced segments. Subjective testing has shown that the coder attains a subjective quality equivalent to 7 b/sample log-PCM (pulse code modulation) with an encoding delay of eight samples (1 ms with an 8-kHz sampling rate)  相似文献   

20.
A new three-dimensional (3-D) discrete cosine transform (DCT) coder for medical images is presented. In the proposed method, a segmentation technique based on the local energy magnitude is used to segment subblocks of the image into different energy levels. Then, those subblocks with the same energy level are gathered to form a 3-D cuboid. Finally, 3-D DCT is employed to compress the 3-D cuboid individually. Simulation results show that the reconstructed images achieve a bit rate lower than 0.25 bit per pixel even when the compression ratios are higher than 35. As compared with the results by JPEG and other strategies, it is found that the proposed method achieves better qualities of decoded images  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号