首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Feedback-based error control for mobile video transmission   总被引:25,自引:0,他引:25  
We review feedback-based low bit-rate video coding techniques for robust transmission in mobile multimedia networks. For error control on the source coding level, each decoder has to make provisions for error detection, resynchronization, and error concealment, and we review techniques suitable for that purpose. Further, techniques are discussed for intelligent processing of acknowledgment information by the coding control to adapt the source coder to the channel. We review and compare error tracking, error confinement, and reference picture selection techniques for channel-adaptive source coding. For comparison of these techniques, a system for transmitting low bit-rate video over a wireless channel is presented and the performance is evaluated for a range of transmission conditions. We also show how feedback-based source coding can be employed in conjunction with precompressed video stored on a media server. The techniques discussed are applicable to a wide variety of interframe video schemes, including various video coding standards. Several of the techniques have been incorporated into the H.263 video compression standard, and this standard is used as an example throughout  相似文献   

2.
苏英俊  陆建华  王京 《电子学报》2001,29(Z1):1803-1806
本文提出了一种基于广义的率失真函数的信源编码,信道编码和差错隐藏联合优化的方法.这种广义的率失真函数综合反映了视频信源经信源编码,信道编码,差错隐藏后的率失真特性,因此可以用来进行视频通信系统收发端的联合优化.仿真结果表明,和传统的信源信道联合编码算法相比,这种基于广义的率失真特性的联合优化可以取得更好的结果.  相似文献   

3.
Joint source-channel coding is an effective approach for the design of bandwidth efficient and error resilient communication systems with manageable complexity. An interesting research direction within this framework is the design of source decoders that exploit the residual redundancy for effective signal reconstruction at the receiver. Such source decoders are expected to replace the traditionally heuristic error concealment units that are elements of most multimedia communication systems. In this paper, we consider the reconstruction of signals encoded with a multistage vector quantizer (MSVQ) and transmitted over a noisy communications channel. The MSVQ maintains a moderate complexity and, due to its successive refinement feature, is a suitable choice for the design of layered (progressive) source codes. An approximate minimum mean squared error source decoder for MSVQ is presented, and its application to the reconstruction of the linear predictive coefficient (LPC) parameters in mixed excitation linear prediction (MELP) speech codec is analyzed. MELP is a low-rate standard speech codec suitable for bandwidth-limited communications and wireless applications. Numerical results demonstrate the effectiveness of the proposed schemes  相似文献   

4.
Transmission of block-coded images through error-prone wireless channels often results in lost blocks. In this study, we investigate a novel error concealment method for covering up these high packet losses and reconstructing a close approximation. Our scheme is a modified discrete wavelet transform (DWT) technique (namely, subbands based image error concealment (SIEC)) for embedding downsized replicas of original image into itself. We propose that this technique can be implemented for wireless channels to combat degradations in a backward-compatible scheme. We show that the proposed error concealment technique is promising, especially for the erroneous channels causing a wider range of packet losses, at the expense of computational burden  相似文献   

5.
提出了2.4G无线影音传输系统中的数字音频传输方案,介绍了AES/EBU数字音频接口标准,并详细阐述了系统的关键技术,包括差错掩盖技术和天线切换技术。采用差错掩盖技术,可以消除“噼啪”声;而运用天线切换技术,提高了音频信号接收质量,有效地抑制干扰,提升了系统的性能。  相似文献   

6.

Majority of the automatic speech recognition systems (ASR) are trained with neutral speech and the performance of these systems are affected due to the presence of emotional content in the speech. The recognition of these emotions in human speech is considered to be the crucial aspect of human-machine interaction. The combined spectral and differenced prosody features are considered for the task of the emotion recognition in the first stage. The task of emotion recognition does not serve the sole purpose of improvement in the performance of an ASR system. Based on the recognized emotions from the input speech, the corresponding adapted emotive ASR model is selected for the evaluation in the second stage. This adapted emotive ASR model is built using the existing neutral and synthetically generated emotive speech using prosody modification method. In this work, the importance of emotion recognition block at the front-end along with the emotive speech adaptation to the ASR system models were studied. The speech samples from IIIT-H Telugu speech corpus were considered for building the large vocabulary ASR systems. The emotional speech samples from IITKGP-SESC Telugu corpus were used for the evaluation. The adapted emotive speech models have yielded better performance over the existing neutral speech models.

  相似文献   

7.
Robust image and video communications have become more imperative due to the ubiquitous proliferation of multimedia applications over wireless sensor networks. In this work, the transmission distortions on the image data induced by both channel and instant node failures for Wireless Sensor Networks (WSN) are considered. The effect of two techniques and their integration with multipath transmission are investigated to compensate the multimedia distortions at the expense of incurring additional energy consumption and/or wasting bandwidth resources. First technique is watermarking based error concealment utilizing discrete wavelet transform for embedding downsized replicas of original image into itself. The other is conventional Reed–Solomon (RS) coding utilizing additional information bits to correct bit/symbol errors. Performance results obtained from extensive simulations utilizing a communication and energy model applicable to WSN show that error concealment (EC) integrated schemes, especially EC with multipath fusion (ECMF), are more promising to compensate losses than RS-coding-integrated and pure multipath transmission techniques in WSN.  相似文献   

8.
We study TCP performance over the wireless links deploying a wireless rate-control technique, whose link characteristics are identified by variable link rate and bursty transmission error. We present a TCP enhancement scheme, called rate-adaptive snoop (RA-Snoop). RA-Snoop caches TCP packets selectively based on the wireless channel condition and the cached packets are retransmitted locally over the wireless link in case corruption loss is detected. In addition, for effective adaptation to variable bandwidth, RA-Snoop calculates the window feedback based on the bandwidth-delay product estimation and the queue level, then conveys this feedback information on the receiver's advertised window field in the acknowledgements returning to TCP sources. We compare the performance of RA-Snoop with that of existing schemes in the aspect of goodput and fairness. Results from simulations reveal that RA-Snoop achieves significant improvements over the existing schemes for various traffic scenarios.  相似文献   

9.
This letter studies the performance of indoor wireless communication systems operating at 60 GHz with different polarization schemes. Circular polarization is known to reduce multipath effects in line-of-sight (LOS) environments in the 60 GHz band. We propose a modified channel model based on the IEEE 802.15.3c channel model to incorporate the polarization effects. We then use this model to evaluate the error performance of a wireless communication system that uses circular polarization. The results are compared with linear polarization for LOS environments.  相似文献   

10.
We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web (WWW). We compare a server-only processing model where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Internet. We follow a novel encoding paradigm, trying to maximize recognition performance instead of perceptual reproduction, and we find that by transmitting the cepstral coefficients we can achieve significantly higher recognition performance at a fraction of the bit rate required when encoding the speech signal directly. We find that the required bit rate to achieve the recognition performance of high-quality unquantized speech is just 2000 bits per second  相似文献   

11.
This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add‐on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two‐stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained.  相似文献   

12.
A cross-layer packet scheduling scheme that streams pre-encoded video over wireless downlink packet access networks to multiple users is presented. The scheme can be used with the emerging wireless standards such as HSDPA and IEEE 802.16. A gradient based scheduling scheme is used in which user data rates are dynamically adjusted based on channel quality as well as the gradients of a utility function. The user utilities are designed as a function of the distortion of the received video. This enables distortion-aware packet scheduling both within and across multiple users. The utility takes into account decoder error concealment, an important component in deciding the received quality of the video. We consider both simple and complex error concealment techniques. Simulation results show that the gradient based scheduling framework combined with the content-aware utility functions provides a viable method for downlink packet scheduling as it can significantly outperform current content-independent techniques. Further tests determine the sensitivity of the system to the initial video encoding schemes, as well as to non-real-time packet ordering techniques.  相似文献   

13.
模板匹配法技术是汉语声母识别中较为成功的算法,但它的缺陷影响了其恢复错误、改善识别性能。神经网络(NN)和模糊系统的结合,保留了双方的优点,充分利用了模糊神经网良好的容错性能、计算性能、分类性能和决策性能。本文重点研究了两种基于模糊神经网的声母识别方案,通过对其结构、识别率和特点的分析,可看出模糊神经网的声母识别性能明显优于模板匹配法,是更适于语音识别的网络。  相似文献   

14.
15.
Articulation errors seriously reduce speech intelligibility and the ease of spoken communication. Speech-language pathologists manually identify articulation error patterns based on their clinical experience, which is a time-consuming and expensive process. This study proposes an automatic pronunciation error identification system that uses a novel dependence network (DN) approach. In order to derive a subject's articulatory information, a photo naming task is performed to obtain the subject's speech patterns. Based on clinical knowledge about speech evaluation, a DN scheme was used to model the relationships of a test word, a subject, a speech pattern, and an articulation error pattern. To integrate DN into automatic speech recognition (ASR), a pronunciation confusion network is proposed to model the probability of DN and is then used to guide the search space of the ASR. Further, to increase the accuracy of the ASR, an appropriate threshold based on a histogram of pronunciation errors is selected in order to disregard rare pronunciation errors. Finally, the articulation error patterns were well identified by integrating the likelihoods of the DNs of each phoneme. The results of this study indicate that it is feasible to clinically implement this dynamic network approach to achieve satisfactory performance in articulation evaluation.  相似文献   

16.
We consider a basic scenario in wireless data access: a number of mobile clients are interested in a set of data items kept at a common server. Each client independently sends requests to inform the server of its desired data items and the server replies with a broadcast channel. We are interested in studying the energy consumption characteristics in such a scenario. First, we define a utility function for quantifying performance. Based on the utility function, we formulate the wireless data access scenario as a noncooperative game - wireless data access (WDA) game. Although our proposed probabilistic data access scheme does not rely on client caching, game theoretical analysis shows that clients do not always need to send requests to the server. Simulation results also indicate that our proposed scheme, compared with a simple always-request one, increases the utility and lifetime of every client while reducing the number of requests sent, with a cost of slightly larger average query delay. We also compare the performance of our proposed scheme with two popular schemes that employ client caching. Our results show that caching-only benefits clients with high query rates at the expense of both shorter lifetime and smaller utility in other clients.  相似文献   

17.
There has been progress in improving speech recognition using a tightly-coupled modality such as lip movement; and using additional input interfaces to improve recognition of commands in multimodal human? computer interfaces such as speech and pen-based systems. However, there has been little work that attempts to improve the recognition of spontaneous, conversational speech by adding information from a loosely?coupled modality. The study investigated this idea by integrating information from gaze into an automatic speech recognition (ASR) system. A probabilistic framework for multimodal recognition was formalised and applied to the specific case of integrating gaze and speech. Gaze-contingent ASR systems were developed from a baseline ASR system by redistributing language model probability mass according to the visual attention. These systems were tested on a corpus of matched eye movement and related spontaneous conversational British English speech segments (n = 1355) for a visual-based, goal-driven task. The best performing systems had similar word error rates to the baseline ASR system and showed an increase in keyword spotting accuracy. The core values of this work may be useful for developing robust speech-centric multimodal decoding system functions.  相似文献   

18.
并行子带HMM最大后验概率自适应非线性类估计算法   总被引:1,自引:0,他引:1  
目前,自动语音识别(ASR)系统在实验室环境下获得了较高的识别率,但是在实际环境中,由于受到背景噪声和传输信道的影响,系统的识别性能急剧恶化.本文以听觉试验为基础,提出一种新的独立子带并行最大后验概率的非线性类估计算法,用以提高识别系统的鲁棒性.本算法利用多种噪声和识别内容功率谱差异,以及噪声在不同频带上对HMM影响的不同,采用多层感知机(MLP)对噪声环境下最大后验概率进行非线性映射,以减少识别系统由于环境不匹配而导致的识别性能下降.实验表明:该算法性能明显优于最大后验线性回归算法和Sangita提出的子带语音识别算法.  相似文献   

19.
We propose a modified motion estimation algorithm that is adequate for error localization and temporal error concealment in transmitting videos over unreliable channels. In order to achieve good error concealment performance, the proposed algorithm implicitly imposes spatial correlations on motion vectors by extending the block size and overlapping blocks in motion estimation. Thereby, the obtained motion vectors can be used to improve error concealment performance while keeping the encoding efficiency with negligible overhead. In addition, the proposed motion estimation can provide a new error detection measure so that we can maximally utilize uncorrupted data rather than simply discarding all data in a defected packet. Simulation results show that the proposed motion estimation scheme provides significant improvements in error concealment performance over the existing schemes and improves the bit utility over a wide range of error conditions.  相似文献   

20.
Dynamic time warping (DTW) is a nonlinear time-alignment technique for automatic speech recognition (ASR) systems. It had been widely used in many commercial and industrial products, ranging from electronic dailies/dictionaries to wireless voice digit dialers. DTW has the advantages of fast training and searching times, which makes it more popular than other available ASR techniques. However, there exist some limitations to DTW, such as the stringent rule on slope weighting, the nontrivial computation of the K-best paths, and the significant increase in computational time when the endpoint constraint is relaxed or the variations of the length of pattern increased. In this paper, a stochastic method called the genetic algorithm (GA), which is used to solve the nonlinear time alignment problem, is presented. Experimental results show that the GA has a better performance than the DTW. In addition, two derivatives of GA: the hybrid GA and the parallel GA are also presented  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号