首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
 In recent years, available audio corpora are rapidly increasing from fast growing Internet and digital libraries. How to classify and retrieve sound files relevant to the user's interest from large databases is crucial for building multimedia web search engines. In this paper, content-based technology has been applied to classify and retrieve audio clips using a fuzzy logic system, which is intuitive due to the fuzzy nature of human perception of audio, especially audio clips with mixed types. Two features selected from various extracted features are used as input to a constructed fuzzy inference system (FIS). The outputs of the FIS are two types of hierarchical audio classes. The membership functions and rules are derived from the distributions of extracted audio features. Speech and music can thus be discriminated by the FIS. Furthermore, female and male speech can be separated by another FIS, whereas percussion can be distinguished from other music instruments. In addition, we can use multiple FISs to form a “fuzzy tree” for retrieval of more types of audio clips. With this approach, we can classify and retrieve generic audios more accurately, using fewer features and less computation time, compared to other existing approaches.  相似文献   

2.
Lip synchronization is considered a key parameter during interactive communication. In the case of video conferencing and television broadcasting, the differential delay between audio and video should remain below certain thresholds, as recommended by several standardization bodies. However, further research has also shown that these thresholds can be relaxed, depending on the targeted application and use case. In this article, we investigate the influence of lip sync on the ability to perform real-time language interpretation during video conferencing. Furthermore, we are also interested in determining proper lip sync visibility thresholds applicable to this use case. Therefore, we conducted a subjective experiment using expert interpreters, which were required to perform a simultaneous translation, and non-experts. Our results show that significant differences are obtained when conducting subjective experiments with expert interpreters. As interpreters are primarily focused on performing the simultaneous translation, lip sync detectability thresholds are higher compared with existing recommended thresholds. As such, primary focus and the targeted application and use case are important factors to be considered when selecting proper lip sync acceptability thresholds.  相似文献   

3.
4.
Based on the analysis of temporal slices, we propose novel approaches for clustering and retrieval of video shots. Temporal slices are a set of two-dimensional (2-D) images extracted along the time dimension of an image volume. They encode rich set of visual patterns for similarity measure. In this paper, we first demonstrate that tensor histogram features extracted from temporal slices are suitable for motion retrieval. Subsequently, we integrate both tensor and color histograms for constructing a two-level hierarchical clustering structure. Each cluster in the top level contains shots with similar color while each cluster in bottom level consists of shots with similar motion. The constructed structure is then used for the cluster-based retrieval. The proposed approaches are found to be useful particularly for sports games, where motion and color are important visual cues when searching and browsing the desired video shots.  相似文献   

5.
芮亚楠 《微计算机信息》2006,22(20):243-245
通过对蓝牙音视频遥控应用的协议模型进行研究,设计了控制器与目标机的遥控应用模型。应用模型以CSR的BlueCore3-MultiMedia芯片为硬件载体,按照协议规范中规定的通信过程,在音视频控制传输协议层之上构建了蓝牙音视频遥控应用框架的软件实体,最终在高级音频传输应用的基础上实现了蓝牙音视频设备间的遥控功能。  相似文献   

6.
7.
Jansen  Bernard J.  Goodrum  Abby  Spink  Amanda 《World Wide Web》2000,3(4):249-254
The development of digital libraries has enhanced the integration of textual and multimedia information in many document collections. The World Wide Web provides the connectivity for many digital library users. Studies exploring the searching characteristics of Web users are an important and a growing area of research. Most Web user studies have focused on general Web searching, regardless of subject matter or format. Little research has examined how Web users search for multimedia information. Our study examines users' multimedia searching on a major Web search service. The data set examined consisted of 1,025,908 queries from 211,058 users of Excite ®, a major Web search service. From this data set, we identified and analyzed queries for audio, image, and video queries. Our findings were compared to results from general Web searching studies. Implications for the design of Web searching services and interfaces are discussed.  相似文献   

8.
Due to the elevated consumption of resources, the high cost of the production of contents and the quality of service required in audio/video streaming services, it is extremely important to optimize all the elements involved in the deployment of these services. With this goal in mind, provider companies have developed their management and presentation tools. At the same time, some specific tools for audio/video streaming analysis have appeared. Data are collected from servers and proxies by analyzing their log files in order to generate different types of reports. In spite of their utility, there is a disconnection between these types of tools. In this way, several important relationships between collected data are lost and the influence of other important aspects such as the behaviour of the users and their relationship with the subject or the length of the contents is not considered. This generates inaccurate analyses and the impossibility to improve the presentation, for example by generating recommendations using the information gathered from the analysis tool. Fesoria is a system which combines both characteristics. It is an analysis tool and, at the same time, a system to manage the whole audio/video service. Fesoria is able to process the logs gathered from the streaming servers and proxies, and combine the extracted information with other types of data, such as content metadata, content distribution networks architecture, user preferences, etc. All this information is analyzed in order to generate reports on service performance, access evolution and users’ preferences, and thus to improve the presentation of the services. The system has been used in real audio/video services since 2001 with satisfactory results.
Isabel RodríguezEmail:
  相似文献   

9.
One of the new applications evolving in the Internet is streaming audio/video. A major reason for its growing popularity is interest in the compelling new services that become possible. Prototype services are being developed which are new to the Internet but offer the same look, feel, and functionality that have traditionally only been found in services delivered via other communication medium, e.g. broadcast television. In addition, the Internet is evolving to offer ‘value‐added’ services, like streaming audio/video with VCR‐style interactivity and embedded hyperlinks. We are poised both on seeing the development of new paradigms for interacting with audio/video, and on seeing the merging of broadcast television and Internet‐based broadcasts. Before this process can be considered successful, a number of technical challenges, derived from the various ways in which content is physically delivered, must be solved. In this paper, we focus on the value‐added service of VCR interactivity. VCR interactivity has long been a challenge for both broadcast television and streamed Internet audio/video. The challenge is how to provide individualized playout for content being streamed to a large group of users using one‐to‐many delivery. While some new companies are starting to offer devices which provide this kind of service for broadcast television, there are still numerous technical challenges for the Internet‐based version of a similar service. This paper has a three‐fold objective. First, we describe the types of services available in the traditional broadcast infrastructure and compare these to the types of services that are deployed or possible in Internet‐based services. Second, we describe our attempts to implement some of the more challenging and novel service types. In particular, we examine client‐based control of programs streamed over the Internet to tens, thousands, or even millions of users. Finally, we discuss the impact of these services on the protocols and applications used to support Internet‐based, multi‐party conferencing. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

10.
This paper introduces an effective interactive video retrieval system named VisionGo. It jointly explores human and computer to accomplish video retrieval with high effectiveness and efficiency. It assists the interactive video retrieval process in different aspects: (1) it maximizes the interaction efficiency between human and computer by providing a user interface that supports highly effective user annotation and an intuitive visualization of retrieval results; (2) it employs a multiple feedback technique that assists users in choosing proper method to enhance relevance feedback performance; and (3) it facilitates users to assess the retrieval results of motion-related queries by using motion-icons instead of static keyframes. Experimental results based on over 160 h of news video shows demonstrate the effectiveness of the VisionGo system.  相似文献   

11.
12.
13.
The retrieval of video information is a key multimedia technology, which has been paid wide-range attention recently. This paper introduces the verification and analysis of video information results in the video retrieval system from three aspects: first, the way of the browse and navigation; second, video summary; third, display and interaction interface.  相似文献   

14.
This paper reviews a number of recently available techniques in content analysis of visual media and their application to the indexing, retrieval, abstracting, relevance assessment, interactive perception, annotation and re-use of visual documents.This work was performed while this author was with Institute of Systems Science, Singapore.  相似文献   

15.
The increasing importance of text-based information retrieval (IR) developments in the architecture, engineering, and construction industries (AEC) and the lack of sharable testing resources to support these developments call for an approach that can be used to generate domain-specific reference collections. To address this need, the authors investigated the characteristics of the testing environment in AEC and ways to adapt dominant collection preparation methods for the domain. This paper presents the authors’ collection generation approach through the preparation process of the Taiwanese National Center for Research on Earthquake Engineering (NCREE) collection. The collection’s Chinese-to-English translation instruments are also discussed as matching semantic/linguistic resources are highly valued in AEC’s text-based IR developments. The paper also includes a use case for the NCREE collection to show how a collection generated by the proposed approach could be applied to support research experiment and validation. The direct outputs, the NCREE collection and its translation instruments, are sharable and reusable testing resources, while mechanisms for seeking collections from other researchers are part of the extended research endeavors.  相似文献   

16.
为实现对P2P-TV应用的实时内容检测,简要介绍了P2P-TV监控系统对P2P-TV平台与频道的精细识别,针对PPTV采用ASF流媒体格式进行数据流传输、节点之间通过UDP协议获取数据,在精确识别出平台与频道的基础上,识别出数据传输过程中的A/V数据包,获知A/V数据包的序号、A/V数据的长度及起始终止位置,通过在线将A/V数据提取并还原为媒体文件并进行内容检测。  相似文献   

17.
18.
In the scientific community, feature models are the de-facto standard for representing variability in software product line engineering. This is different from industrial settings where they appear to be used much less frequently. We and other authors found that in a number of cases, they lack concision, naturalness and expressiveness. This is confirmed by industrial experience.When modelling variability, an efficient tool for making models intuitive and concise are feature attributes. Yet, the semantics of feature models with attributes is not well understood and most existing notations do not support them at all. Furthermore, the graphical nature of feature models’ syntax also appears to be a barrier to industrial adoption, both psychological and rational. Existing tool support for graphical feature models is lacking or inadequate, and inferior in many regards to tool support for text-based formats.To overcome these shortcomings, we designed TVL, a text-based feature modelling language. In terms of expressiveness, TVL subsumes most existing dialects. The main goal of designing TVL was to provide engineers with a human-readable language with a rich syntax to make modelling easy and models natural, but also with a formal semantics to avoid ambiguity and allow powerful automation.  相似文献   

19.
In this article, we examine the practice of learning to produce video using a new visual technology. Drawing upon a design intervention at a science centre, where a group of teenagers tried a new prototype technology for live mobile video editing, we show how the participants struggle with both the content and the form of producing videos, i.e., what to display and how to do it in a comprehensible manner. We investigate the ways in which video literacy practices are negotiated as ongoing accomplishments and explore the communicative and material resources relied upon by participants as they create videos. Our results show that the technology is instrumental in this achievement and that as participants begin to master the prototype, they start to focus more on the narrative aspects of communicating the storyline of a science centre exhibit. The participants are explicitly concerned with such issues as how to create a comprehensible storyline for an assumed audience, what camera angles to use, how to cut and other aspects of the production of a video. We consider these observed activities to be candidate steps in an emerging mobile video literacy trajectory that involves developing a capacity to document and argue by means of this specific medium.  相似文献   

20.
In this paper we study the production and perception of speech in diverse conditions for the purposes of accurate, flexible and highly intelligible talking face animation. We recorded audio, video and facial motion capture data of a talker uttering a set of 180 short sentences, under three conditions: normal speech (in quiet), Lombard speech (in noise), and whispering. We then produced an animated 3D avatar with similar shape and appearance as the original talker and used an error minimization procedure to drive the animated version of the talker in a way that matched the original performance as closely as possible. In a perceptual intelligibility study with degraded audio we then compared the animated talker against the real talker and the audio alone, in terms of audio-visual word recognition rate across the three different production conditions. We found that the visual intelligibility of the animated talker was on par with the real talker for the Lombard and whisper conditions. In addition we created two incongruent conditions where normal speech audio was paired with animated Lombard speech or whispering. When compared to the congruent normal speech condition, Lombard animation yields a significant increase in intelligibility, despite the AV-incongruence. In a separate evaluation, we gathered subjective opinions on the different animations, and found that some degree of incongruence was generally accepted.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号