期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Audio-visual interaction in multimodal communication

Chellappa R. Tsuhan Chen Katsaggelos A. 《Signal Processing Magazine, IEEE》1997,14(4):37-38

Multimedia signal processing is more than simply “putting together” text, audio, images, and video. It is the integration and interaction among these different media that creates new systems and new research challenges and opportunities. In multimodal communication where speech is involved, audio-visual interaction is particularly significant 相似文献

2.

A Multi-User 3-D Virtual Environment with Interactive Collaboration and Shared Whiteboard Technologies

Leung Wing Ho Chen Tsuhan 《Multimedia Tools and Applications》2003,20(1):7-23

A multi-user 3-D virtual environment allows remote participants to have a transparent communication as if they are communicating face-to-face. The sense of presence in such an environment can be established by representing each participant with a vivid human-like character called an avatar. We review several immersive technologies, including directional sound, eye gaze, hand gestures, lip synchronization and facial expressions, that facilitates multimodal interaction among participants in the virtual environment using speech processing and animation techniques. Interactive collaboration can be further encouraged with the ability to share and manipulate 3-D objects in the virtual environment. A shared whiteboard makes it easy for participants in the virtual environment to convey their ideas graphically. We survey various kinds of capture devices used for providing the input for the shared whiteboard. Efficient storage of the whiteboard session and precise archival at a later time bring up interesting research topics in information retrieval. 相似文献

3.

Audio visual interaction in multimedia

Tsuhan Chen Ram Rao 《Circuits and Devices Magazine, IEEE》1995,11(6):21-26

Audio-visual interaction is a very important issue in personal communication applications. The research and development of multimedia communication systems should account for this interaction. In this paper, we address a number of areas related to audio-visual interaction, such as automatic lipreading, speech-driven talking heads, and lip synchronization. In particular, we will discuss a new trend in video coding research: joint audio-video coding. Given that mouth movements are very difficult to code because of its rapid, complex, and non-rigid motion (so conventional block-based motion-compensation methods fail), we will explain how having extra help from the acoustic signal can enable us to code the mouth movements more efficiently 相似文献

4.

DISCOV: A Framework for Discovering Objects in Video

Liu D. Tsuhan Chen 《Multimedia, IEEE Transactions on》2008,10(2):200-208

This paper presents a probabilistic framework for discovering objects in video. The video can switch between different shots, the unknown objects can leave or enter the scene at multiple times, and the background can be cluttered. The framework consists of an appearance model and a motion model. The appearance model exploits the consistency of object parts in appearance across frames. We use maximally stable extremal regions as observations in the model and hence provide robustness to object variations in scale, lighting and viewpoint. The appearance model provides location and scale estimates of the unknown objects through a compact probabilistic representation. The compact representation contains knowledge of the scene at the object level, thus allowing us to augment it with motion information using a motion model. This framework can be applied to a wide range of different videos and object types, and provides a basis for higher level video content analysis tasks. We present applications of video object discovery to video content analysis problems such as video segmentation and threading, and demonstrate superior performance to methods that exploit global image statistics and frequent itemset data mining techniques. 相似文献

5.

On Probability Density for Modeling Video Traffic

Deepak S. Turaga Tsuhan Chen 《The Journal of VLSI Signal Processing》2003,34(1-2):111-124

Accurate models for variable bit rate (VBR) video traffic need to allow for different frame types present in the video, different activity levels for different frames, and a variable group of pictures (GOP) structure. The temporal as well as the stochastic properties of the trace data need to be captured by any models. We propose some models that capture temporal properties of the data using doubly Markov processes and autoregressive models. We highlight the importance of capturing the stochastic properties of the data accurately, as this leads to significant improvement in the performance of the model. In order to capture the stochastic properties of the traces, the probability density function of the trace data needs to be accurately modeled. Hence, the focus of this paper is on creating autoregressive processes with arbitrary probability densities. We relate this to work in wavelet theory on the solutions to two-scale dilation equations. The performance of our model is evaluated in terms of the stochastic properties of the generated trace as well as using network simulations. 相似文献

6.

Multidimensional multirate filters derived from one dimensional filters

Tsuhan Chen Vaidyanathan P.P. 《Electronics letters》1991,27(3):225-228

A method by which every multidimensional filter with parallelepiped passband can be designed and implemented efficiently, derived from an appropriate one dimensional filter, is presented. With this method, Nyquist constraint and zero-phase requirement can be satisfied easily. A design example is also given.<> 相似文献

7.

Reconstructing dense light field from array of multifocus images for novel view synthesis.

Akira Kubota Kiyoharu Aizawa Tsuhan Chen 《IEEE transactions on image processing》2007,16(1):269-279

This paper presents a novel method for synthesizing a novel view from two sets of differently focused images taken by an aperture camera array for a scene consisting of two approximately constant depths. The proposed method consists of two steps. The first step is a view interpolation to reconstruct an all-in-focus dense light field of the scene. The second step is to synthesize a novel view by a light-field rendering technique from the reconstructed dense light field. The view interpolation in the first step can be achieved simply by linear filters that are designed to shift different object regions separately, without region segmentation. The proposed method can effectively create a dense array of pin-hole cameras (i.e., all-in-focus images), so that the novel view can be synthesized with better quality. 相似文献

8.

VLSI design and implementation fuels the signal-processingrevolution

Rabaey J.M. Gass W. Brodersen R. Nishitani T. Tsuhan Chen 《Signal Processing Magazine, IEEE》1998,15(1):22-37

The article provides a comprehensive overview of the history of how signal-processing researchers have been effectively transforming signal-processing algorithms into efficient implementations. Starting from the early days of analog circuits for signal processing, to digital signal processors (DSPs), to application specific DSPs and programmable DSPs, and to the trend of integrating a complete system on a single chip, this article provides a thorough coverage of the past and the present of design and implementation technology for signal processing systems. Moreover, it presents the exciting challenges faced by the study of the design and implementation of current and future signal processing applications. Topics covered include milestones in signal-processing integrated circuits, the past and future of the signal processor, signal processing in consumer applications, and design automation for signal processing 相似文献

9.

Audio-to-visual conversion for multimedia communication

Rao R.R. Tsuhan Chen Mersereau R.M. 《Industrial Electronics, IEEE Transactions on》1998,45(1):15-22

Although humans rely primarily on hearing to process speech, they can also extract a great deal of information with their eyes through lipreading. This skill becomes extremely important when the acoustic signal is degraded by noise. It would, therefore, be beneficial to find methods to reinforce acoustic speech with a synthesized visual signal for high noise environments. This paper addresses the interaction between acoustic speech and visible speech. Algorithms for converting audible speech into visible speech are examined, and applications which can utilize this conversion process are presented. Our results demonstrate that it is possible to animate a natural-looking talking head using acoustic speech as an input 相似文献

10.

A hierarchical visual model for video object summarization

Liu D Hua G Chen T 《IEEE transactions on pattern analysis and machine intelligence》2010,32(12):2178-2190

相似文献