首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 39 毫秒
1.
Wideband speech and audio coding   总被引:5,自引:0,他引:5  
Typical parameters of wideband speech and audio signals, including digitized versions of each, potential applications, and available transmission media, are described. Facts about human auditory perception that are exploited in audio coding and quality measures that play an important role in coder evaluations and designs are reviewed. Techniques for efficient coding of wideband speech and audio signals, with an emphasis on existing standards, are discussed. The audio coding standard developed by the Moving Pictures Expert Group within the International Organization for standardization (ISO/MPEG) is covered in some detail, since it will be used in many application areas, including digital storage, transmission, and broadcasting of audio-only signals and audiovisual applications such as videotelephony, videoconferencing, and TV broadcasting. Ongoing research and standardization work is outlined  相似文献   

2.
3.
The objective evaluation of the audio performance of nonlinear processes such as data reduction has required the development of a new generation of perception-based assessment algorithms. In the future it is expected that a large proportion of communications traffic will be wideband and multimedia. Objective assessment of perceived performance is required for optimisation during design, and for commissioning. In developing objective techniques to assess multimedia systems it is necessary to investigate, and model, the dependencies which exist between audio and visual perception (other senses may follow). This paper summarises the background of current perceptual modelling work and reports early progress towards a multisensory perceptual model. An experimental investigation into combined audio/video perception is described and an algorithmic basis for a multisensory perceptual model proposed. In the future it is predicted that algorithmic models of perception will be exploited for objective assessment of multimedia systems and by the delivery technology for advanced telepresence applications.  相似文献   

4.
Advances in speech and audio compression   总被引:4,自引:0,他引:4  
Speech and audio compression has advanced rapidly in recent years spurred on by cost-effective digital technology and diverse commercial applications. Recent activity in speech compression is dominated by research and development of a family of techniques commonly described as code-excited linear prediction (CELP) coding. These algorithms exploit models of speech production and auditory perception and offer a quality versus bit rate tradeoff that significantly exceeds most prior compression techniques for rates in the range of 4 to 16 kb/s. Techniques have also been emerging in recent years that offer enhanced quality in the neighborhood of 2.4 kb/s over traditional vocoder methods. Wideband audio compression is generally aimed at a quality that is nearly indistinguishable from consumer compact-disc audio. Subband and transform coding methods combined with sophisticated perceptual coding techniques dominate in this arena with nearly transparent quality achieved at bit rates in the neighborhood of 128 kb/s per channel  相似文献   

5.
本文简略报道国际活动图像专家组近年对多媒体通信草拟新标准MPEG 4的进展情况。其中音频编码包括语音、音乐 (自然的和合成的 ) ,比特率从 2至 6 4kb /s。视频编码包括甚低比特率 5~ 6 4kb /s和较高比特率 6 4kb /s至 2Mb /s。视频编码可将图像中每一对象分开编码为不同的比特流层 ,又可操纵对象的尺度、位置等 ,具有以内容为基础的交互式功能。除了核心编码器外 ,对输入视频序列的每一帧分成若干个任意形状的“视频对象平面” ,编码成各个分开的“视频对象层”。另外 ,利用“子图形”编码技术 ,将图像的背景以及前景中的每一对象分开编码成视频序列后传输 ,可以改善视像质量。  相似文献   

6.
Wideband speech is the major differentiation and attraction of third-generation network services in both the circuit and packet switched domain. Increased audio bandwidth introduces a significant leap in perceived quality of service compared to currently utilized narrowband telephony in second-generation mobile communications and the PSTN. The adaptive multirate wideband (AMR-WB) speech codec is the service enabler for improved user experience. It is an established 3GPP and ITU-T wideband speech codec standard and represents the state-of-the-art in speech quality as well as robustness in error prone radio channels. It is also the first codec algorithm standardized for wideband speech for mobile communications.  相似文献   

7.
Entropy coding principles are applied to the 16 kbit/s ITU G.728 speech codec. It is shown that the average bit rate can be reduced to 14.5 kbit/s without a significant increase in the codec complexity. In very low bit rate audiovisual communication applications such as the videophone, the saved bits can be used to improve the output video quality  相似文献   

8.
Digital speech technology is reviewed, with the emphasis on applications demanding high-quality reproduction of the speech signal. Examples of such applications are network telephony, ISDN terminals for audio teleconferencing, and systems for the storage of audio signals, which include the important subclass of wideband speech. Depending on the application, the bandwidth of input speech can vary from about 3 kHz to nearly 20 kHz. Coding for digital telephony at 4 and 8 kb/s, network quality coding at 16 kb/s, and coding for audio at 7 and 20 kHz are examined. Future directions in the field are discussed with respect to anticipated technology applications and the algorithms needed to support these technologies  相似文献   

9.
High efficiency audio compression is the basic technology in audio involved multimedia communications. Downmixing and parametric coding is efficient coding scheme with wide applications in some up-to-date audio codecs such as Parametric Stereo (PS) in EAAC+ and MPEG-Surround. Principle Component Analysis (PCA) stereo coding followed this idea to map two channels to one channel with maximum energy and parameterize the secondary channel. This paper investigates the conventional PCA method performance under general stereo model with multiple sound sources and different directions, and then proposes a Polar Coordinate based PCA (PC-PCA) stereo coding method. It has been proved that when multiple sound sources exist with different directions, PC-PCA is better than the conventional PCA method when Mean to Standard deviation Ratio (MSR) is large. A stereo codec based on PC-PCA is proposed to validate the performance improvement of proposed method. Objective and subjective tests show the proposed method achieves a comparative quality and saves 50% parameter bit rate comparing with conventional PCA method, and obtains a 4-8 MUSHRA scores improvement comparing with state-of-the-art stereo codec at the same parameter bit rate.  相似文献   

10.
Low bit-rate speech coders for multimedia communication   总被引:10,自引:0,他引:10  
The International Telecommunications Union (ITU) has standardized three speech coders which are applicable to low-bit-rate multimedia communications. ITU Rec. G.729 8 kb/s CS-ACELP has a 15 ms algorithmic codec delay and provides network-quality speech. It was originally designed for wireless applications, but is applicable to multimedia communications as well. Annex A of Rec. G.729 is a reduced-complexity version of the CS-ACELP coder. It was designed explicitly for simultaneous voice and data applications that are prevalent in low-bit-rate multimedia communications. These two coders use the same bitstream format and can interoperate. The ITU Rec. G.723.1 6.3 and 5.3 kb/s speech coder for multimedia communications was designed originally for low-bit-rate videophones. Its frame size of 30 ms and one-way algorithmic codec delay of 37.5 ms allow for a further reduction in bit rate compared to the G.729 coder. In applications where low delay is important, the delay of G.723.1 may be too large. However, if the delay is acceptable, G.723.1 provides a lower-complexity alternative to G.729 at the expense of a slight degradation in quality. This article describes the attributes of speech coders such as bit rate, complexity, delay, and quality. Then it discusses the basic concepts of the three new ITU coders by comparing their specific attributes. The second part of this article describes the standardization process for each of these coders  相似文献   

11.
12.
In this paper, we present a new method for high quality audio coding at low delay and low bit rate for telecommunications applications such as audioconfe-rence or videoconference. The developped coder is adapted to code generic audio signals at a bit rate of 64 kbit/s with a delay close to 5 ms in the 20-15000 Hz bandwidth. The method is based on speech coding as well as audio coding concepts. The coder combines subband decomposition of the input signal and LD-CELP techniques. We introduce in this structure of coding a psychoacoustic model which allows to allocate an optimal bit rate on each subband according to perceptual properties of the human hearing. In order to satisfy the bit rate requirement of the psychoacoustic model and to reduce the complexity of such a coding algorithm, we suggested a new method of vector quantization based on lattice quantization. This method allows to quantify the residual signal in the LD-CELP coder and avoid the complexity of the full search. Objective and subjective tests have been made on a test set of audio signals which is a critical sub-set used by ISO. Formal tests showed that the quality of the proposed coder is comparable to the best implementation of the MPEG-1, Layer II, but our solution has the advantage of reaching a very low delay (5 ms).  相似文献   

13.
An overview of low bit rate coding and the interaction between source coding and channel coding is presented. The interaction of coding with networking in a multiuser environment, including algorithms for robust coding which anticipate imperfect network performance, and techniques of decoding a signal that has traversed an imperfect network are described. The performances of such algorithms are illustrated with examples from speech, audio, and video transmission in the presence of packet losses. The challenges in measuring the quality of service (QOS) in the context of new algorithms for coding and networking and the difficulty of measuring QOS in the networking of multimedia information are discussed  相似文献   

14.
Voice is one of the oldest telecommunications services and remains the most ubiquitous. Despite the longevity of voice services, and huge investment in development over many decades, the baseline characteristics of telecommunications speech have remained unchanged for many decades — namely, narrowband audio with a bandwidth of 300 Hz to 3.4 kHz. With an arguably mundane legacy one might question whether voice will ultimately lose out to other forms of so-called rich media. This paper shows that voice, enhanced by wideband coding technology is far from passé. Rather, voice continues to complement the other media types that augment it to offer the user a more natural communications experience.  相似文献   

15.
张晖 《电子质量》2001,(7):82-87
第三代移动通信(3G)面临的一个重要的挑战就要要无缝集成固定和移动网络中的多媒体业务。对于移动用户来讲,网络支持的业务有图像、多媒体、数据以及不同服务级别的话音业务。为了满足以上业务需求,3G系统必须具有丰富的性能。如今普遍采用ATM技术的同质网络虽然支持很多用户,但此种网络结构不可能成为最终的解决方案(至少从学术界、网络设备制造商的观点来看)。而以太网家族中高速设备的快速发展,已经部分代替了ATM。同时基于因特网业务的爆炸性增长,已经确保IP仍将成为下一代系统网络层协议。本文讨论了IP网因支持移动业务而产生的问题,以及对切换技术进行的分析。  相似文献   

16.
This section of the magazine presents recent algorithms developed by the ITU to provide high quality coding beyond traditional narrowband telephony. Speech coders can be characterized by their bit rate, quality, complexity, and delay. Typical applications fall into one of two categories, one-way and two-way. The first includes storage applications such as telephone answering systems, streaming, multimedia delivery, and push-to-talk calls. The second includes realtime communications such as two person phone calls and conference calls. In this latter category, if the delay is too large - exceeding 300 ms round-trip - humans have difficulty communicating, while for storage and playback operations delay is not a factor. The complexity of a speech coder is one of the main contributing factors to its cost and energy usage. Complexity is most often measured in terms of memory usage (both RAM and ROM) and the number of instructions executed per second. All applications are sensitive to cost, and many are sensitive to energy usage as well. The desired bit rate is determined by channel capacity or storage capacity, depending on the application.  相似文献   

17.
To design, optimise and deliver multimedia and virtual-reality products and services it is necessary to match performance to the capabilities of users. When a multimedia system is used, the presence of audio and video stimuli introduces significant cross-modal effects (the sensory streams interact). This paper introduces a number of cross-modal interactions that are relevant to communications systems and discusses the advanced experimental techniques required to provide data for modelling multi-modal perception. The aim of the work is to provide a multi-modal perceptual model that can be used for performance assessment and can be incorporated into coding algorithms. The current and future applications of multi-modal modelling are discussed.  相似文献   

18.
Neither the ISDN nor the subsequent broadband-ISDN (B-ISDN) delivered on the promise of being a network for all services. Now that mantle has been passed to IP networks born out of computer-to-computer communications. But their benefits, such as great flexibility of bit rate and resilience, are accompanied by other characteristics which are alien not only to traditional telephony but also to many of the multimedia services which telcos are now seeking to offer their customers. As part of the adaptation of IP networks to carry the widest range of services and applications, the concept of quality of service has been invoked. This paper looks at the performance characteristics which various applications require. It also describes techniques they can incorporate to reach a satisfactory overall result in combination with the network.  相似文献   

19.
The restricted audio quality of today's telephone networks is mainly due to the narrowband (NB) limitation to the frequency range from about 300 Hz to 3.4 kHz. Meanwhile, codecs for wideband (WB) telephony (50 Hz to 7 kHz) exist with significantly improved speech intelligibility and naturalness. However, the broad introduction of wideband speech coding requires strong efforts of both network operators and their customers because many elements of the networks (i.e., terminals and network nodes) have to be modified. An intermediate step to overcome the narrowband limitation can be achieved by applying artificial bandwidth extension (BWE) in the receiver. In this article we review the basic principles of bandwidth extension, and discuss several application scenarios in which both wideband coding and BWE complement each other. The introduction of BWE methods in terminals and networks may help to speed up the introduction of true wideband speech coding in the near future.  相似文献   

20.
通用的感知音频编码技术能在很低码率下提供透明的音质,但编解码算法延迟大,难以满足两地实时通信的要求,语音编码技术能以低延迟提供良好的语音服务,但不适合处理复杂的音频信号。分析了影响感知音频编解码算法延迟的主要因素,并给出了定量的计算方法,着重论述了MPEG-4 AAC-LD的关键技术、性能及其应用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号