共查询到20条相似文献,搜索用时 10 毫秒
1.
M. Liu C. Wan L. Wang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2002,6(5):357-364
In recent years, available audio corpora are rapidly increasing from fast growing Internet and digital libraries. How to
classify and retrieve sound files relevant to the user's interest from large databases is crucial for building multimedia
web search engines. In this paper, content-based technology has been applied to classify and retrieve audio clips using a
fuzzy logic system, which is intuitive due to the fuzzy nature of human perception of audio, especially audio clips with mixed
types. Two features selected from various extracted features are used as input to a constructed fuzzy inference system (FIS).
The outputs of the FIS are two types of hierarchical audio classes. The membership functions and rules are derived from the
distributions of extracted audio features. Speech and music can thus be discriminated by the FIS. Furthermore, female and
male speech can be separated by another FIS, whereas percussion can be distinguished from other music instruments. In addition,
we can use multiple FISs to form a “fuzzy tree” for retrieval of more types of audio clips. With this approach, we can classify
and retrieve generic audios more accurately, using fewer features and less computation time, compared to other existing approaches. 相似文献
2.
Nicolas Staelens Jonas De Meulenaere Lizzy Bleumers Glenn Van Wallendael Jan De Cock Koen Geeraert Nick Vercammen Wendy Van den Broeck Brecht Vermeulen Rik Van de Walle Piet Demeester 《Multimedia Systems》2012,18(6):445-457
Lip synchronization is considered a key parameter during interactive communication. In the case of video conferencing and television broadcasting, the differential delay between audio and video should remain below certain thresholds, as recommended by several standardization bodies. However, further research has also shown that these thresholds can be relaxed, depending on the targeted application and use case. In this article, we investigate the influence of lip sync on the ability to perform real-time language interpretation during video conferencing. Furthermore, we are also interested in determining proper lip sync visibility thresholds applicable to this use case. Therefore, we conducted a subjective experiment using expert interpreters, which were required to perform a simultaneous translation, and non-experts. Our results show that significant differences are obtained when conducting subjective experiments with expert interpreters. As interpreters are primarily focused on performing the simultaneous translation, lip sync detectability thresholds are higher compared with existing recommended thresholds. As such, primary focus and the targeted application and use case are important factors to be considered when selecting proper lip sync acceptability thresholds. 相似文献
3.
4.
Based on the analysis of temporal slices, we propose novel approaches for clustering and retrieval of video shots. Temporal slices are a set of two-dimensional (2-D) images extracted along the time dimension of an image volume. They encode rich set of visual patterns for similarity measure. In this paper, we first demonstrate that tensor histogram features extracted from temporal slices are suitable for motion retrieval. Subsequently, we integrate both tensor and color histograms for constructing a two-level hierarchical clustering structure. Each cluster in the top level contains shots with similar color while each cluster in bottom level consists of shots with similar motion. The constructed structure is then used for the cluster-based retrieval. The proposed approaches are found to be useful particularly for sports games, where motion and color are important visual cues when searching and browsing the desired video shots. 相似文献
5.
通过对蓝牙音视频遥控应用的协议模型进行研究,设计了控制器与目标机的遥控应用模型。应用模型以CSR的BlueCore3-MultiMedia芯片为硬件载体,按照协议规范中规定的通信过程,在音视频控制传输协议层之上构建了蓝牙音视频遥控应用框架的软件实体,最终在高级音频传输应用的基础上实现了蓝牙音视频设备间的遥控功能。 相似文献
6.
7.
The development of digital libraries has enhanced the integration of textual and multimedia information in many document collections. The World Wide Web provides the connectivity for many digital library users. Studies exploring the searching characteristics of Web users are an important and a growing area of research. Most Web user studies have focused on general Web searching, regardless of subject matter or format. Little research has examined how Web users search for multimedia information. Our study examines users' multimedia searching on a major Web search service. The data set examined consisted of 1,025,908 queries from 211,058 users of Excite ®, a major Web search service. From this data set, we identified and analyzed queries for audio, image, and video queries. Our findings were compared to results from general Web searching studies. Implications for the design of Web searching services and interfaces are discussed. 相似文献
8.
Xabiel García Pañeda David Melendi Manuel Vilas Roberto García Víctor García Isabel Rodríguez 《Multimedia Tools and Applications》2008,39(3):379-412
Due to the elevated consumption of resources, the high cost of the production of contents and the quality of service required
in audio/video streaming services, it is extremely important to optimize all the elements involved in the deployment of these
services. With this goal in mind, provider companies have developed their management and presentation tools. At the same time,
some specific tools for audio/video streaming analysis have appeared. Data are collected from servers and proxies by analyzing
their log files in order to generate different types of reports. In spite of their utility, there is a disconnection between
these types of tools. In this way, several important relationships between collected data are lost and the influence of other
important aspects such as the behaviour of the users and their relationship with the subject or the length of the contents
is not considered. This generates inaccurate analyses and the impossibility to improve the presentation, for example by generating
recommendations using the information gathered from the analysis tool. Fesoria is a system which combines both characteristics.
It is an analysis tool and, at the same time, a system to manage the whole audio/video service. Fesoria is able to process
the logs gathered from the streaming servers and proxies, and combine the extracted information with other types of data,
such as content metadata, content distribution networks architecture, user preferences, etc. All this information is analyzed
in order to generate reports on service performance, access evolution and users’ preferences, and thus to improve the presentation
of the services. The system has been used in real audio/video services since 2001 with satisfactory results.
相似文献
Isabel RodríguezEmail: |
9.
One of the new applications evolving in the Internet is streaming audio/video. A major reason for its growing popularity is interest in the compelling new services that become possible. Prototype services are being developed which are new to the Internet but offer the same look, feel, and functionality that have traditionally only been found in services delivered via other communication medium, e.g. broadcast television. In addition, the Internet is evolving to offer ‘value‐added’ services, like streaming audio/video with VCR‐style interactivity and embedded hyperlinks. We are poised both on seeing the development of new paradigms for interacting with audio/video, and on seeing the merging of broadcast television and Internet‐based broadcasts. Before this process can be considered successful, a number of technical challenges, derived from the various ways in which content is physically delivered, must be solved. In this paper, we focus on the value‐added service of VCR interactivity. VCR interactivity has long been a challenge for both broadcast television and streamed Internet audio/video. The challenge is how to provide individualized playout for content being streamed to a large group of users using one‐to‐many delivery. While some new companies are starting to offer devices which provide this kind of service for broadcast television, there are still numerous technical challenges for the Internet‐based version of a similar service. This paper has a three‐fold objective. First, we describe the types of services available in the traditional broadcast infrastructure and compare these to the types of services that are deployed or possible in Internet‐based services. Second, we describe our attempts to implement some of the more challenging and novel service types. In particular, we examine client‐based control of programs streamed over the Internet to tens, thousands, or even millions of users. Finally, we discuss the impact of these services on the protocols and applications used to support Internet‐based, multi‐party conferencing. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献
10.
This paper introduces an effective interactive video retrieval system named VisionGo. It jointly explores human and computer to accomplish video retrieval with high effectiveness and efficiency. It assists the interactive video retrieval process in different aspects: (1) it maximizes the interaction efficiency between human and computer by providing a user interface that supports highly effective user annotation and an intuitive visualization of retrieval results; (2) it employs a multiple feedback technique that assists users in choosing proper method to enhance relevance feedback performance; and (3) it facilitates users to assess the retrieval results of motion-related queries by using motion-icons instead of static keyframes. Experimental results based on over 160 h of news video shows demonstrate the effectiveness of the VisionGo system. 相似文献
11.
12.
13.
ZHOU Lei JI Xiu-li SUN Yu-qiang WANG Hai-yan 《通讯和计算机》2007,4(5):62-65
The retrieval of video information is a key multimedia technology, which has been paid wide-range attention recently. This paper introduces the verification and analysis of video information results in the video retrieval system from three aspects: first, the way of the browse and navigation; second, video summary; third, display and interaction interface. 相似文献
14.
Content-based representation and retrieval of visual media: A state-of-the-art review 总被引:5,自引:0,他引:5
Philippe Aigrain Hongjiang Zhang Dragutin Petkovic 《Multimedia Tools and Applications》1996,3(3):179-202
This paper reviews a number of recently available techniques in content analysis of visual media and their application to the indexing, retrieval, abstracting, relevance assessment, interactive perception, annotation and re-use of visual documents.This work was performed while this author was with Institute of Systems Science, Singapore. 相似文献
15.
K.Y. Lin S.H. Hsieh H.P. Tserng K.W. Chou H.T. Lin C.P. Huang K.F. Tzeng 《Advanced Engineering Informatics》2008,22(3):350-361
The increasing importance of text-based information retrieval (IR) developments in the architecture, engineering, and construction industries (AEC) and the lack of sharable testing resources to support these developments call for an approach that can be used to generate domain-specific reference collections. To address this need, the authors investigated the characteristics of the testing environment in AEC and ways to adapt dominant collection preparation methods for the domain. This paper presents the authors’ collection generation approach through the preparation process of the Taiwanese National Center for Research on Earthquake Engineering (NCREE) collection. The collection’s Chinese-to-English translation instruments are also discussed as matching semantic/linguistic resources are highly valued in AEC’s text-based IR developments. The paper also includes a use case for the NCREE collection to show how a collection generated by the proposed approach could be applied to support research experiment and validation. The direct outputs, the NCREE collection and its translation instruments, are sharable and reusable testing resources, while mechanisms for seeking collections from other researchers are part of the extended research endeavors. 相似文献
16.
17.
18.
Andreas Classen Quentin Boucher Patrick Heymans 《Science of Computer Programming》2011,76(12):1130-1143
In the scientific community, feature models are the de-facto standard for representing variability in software product line engineering. This is different from industrial settings where they appear to be used much less frequently. We and other authors found that in a number of cases, they lack concision, naturalness and expressiveness. This is confirmed by industrial experience.When modelling variability, an efficient tool for making models intuitive and concise are feature attributes. Yet, the semantics of feature models with attributes is not well understood and most existing notations do not support them at all. Furthermore, the graphical nature of feature models’ syntax also appears to be a barrier to industrial adoption, both psychological and rational. Existing tool support for graphical feature models is lacking or inadequate, and inferior in many regards to tool support for text-based formats.To overcome these shortcomings, we designed TVL, a text-based feature modelling language. In terms of expressiveness, TVL subsumes most existing dialects. The main goal of designing TVL was to provide engineers with a human-readable language with a rich syntax to make modelling easy and models natural, but also with a formal semantics to avoid ambiguity and allow powerful automation. 相似文献
19.
Alexandra Weilenmann Roger Säljö Arvid Engström 《Personal and Ubiquitous Computing》2014,18(3):737-752
In this article, we examine the practice of learning to produce video using a new visual technology. Drawing upon a design intervention at a science centre, where a group of teenagers tried a new prototype technology for live mobile video editing, we show how the participants struggle with both the content and the form of producing videos, i.e., what to display and how to do it in a comprehensible manner. We investigate the ways in which video literacy practices are negotiated as ongoing accomplishments and explore the communicative and material resources relied upon by participants as they create videos. Our results show that the technology is instrumental in this achievement and that as participants begin to master the prototype, they start to focus more on the narrative aspects of communicating the storyline of a science centre exhibit. The participants are explicitly concerned with such issues as how to create a comprehensible storyline for an assumed audience, what camera angles to use, how to cut and other aspects of the production of a video. We consider these observed activities to be candidate steps in an emerging mobile video literacy trajectory that involves developing a capacity to document and argue by means of this specific medium. 相似文献
20.
《Computer Speech and Language》2014,28(2):607-618
In this paper we study the production and perception of speech in diverse conditions for the purposes of accurate, flexible and highly intelligible talking face animation. We recorded audio, video and facial motion capture data of a talker uttering a set of 180 short sentences, under three conditions: normal speech (in quiet), Lombard speech (in noise), and whispering. We then produced an animated 3D avatar with similar shape and appearance as the original talker and used an error minimization procedure to drive the animated version of the talker in a way that matched the original performance as closely as possible. In a perceptual intelligibility study with degraded audio we then compared the animated talker against the real talker and the audio alone, in terms of audio-visual word recognition rate across the three different production conditions. We found that the visual intelligibility of the animated talker was on par with the real talker for the Lombard and whisper conditions. In addition we created two incongruent conditions where normal speech audio was paired with animated Lombard speech or whispering. When compared to the congruent normal speech condition, Lombard animation yields a significant increase in intelligibility, despite the AV-incongruence. In a separate evaluation, we gathered subjective opinions on the different animations, and found that some degree of incongruence was generally accepted. 相似文献