首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Detecting multimedia events in web videos is an emerging hot research area in the fields of multimedia and computer vision. In this paper, we introduce the core methods and technologies of the framework we developed recently for our Event Labeling through Analytic Media Processing (E-LAMP) system to deal with different aspects of the overall problem of event detection. More specifically, we have developed efficient methods for feature extraction so that we are able to handle large collections of video data with thousands of hours of videos. Second, we represent the extracted raw features in a spatial bag-of-words model with more effective tilings such that the spatial layout information of different features and different events can be better captured, thus the overall detection performance can be improved. Third, different from widely used early and late fusion schemes, a novel algorithm is developed to learn a more robust and discriminative intermediate feature representation from multiple features so that better event models can be built upon it. Finally, to tackle the additional challenge of event detection with only very few positive exemplars, we have developed a novel algorithm which is able to effectively adapt the knowledge learnt from auxiliary sources to assist the event detection. Both our empirical results and the official evaluation results on TRECVID MED’11 and MED’12 demonstrate the excellent performance of the integration of these ideas.  相似文献   

3.
Most multimedia surveillance and monitoring systems nowadays utilize multiple types of sensors to detect events of interest as and when they occur in the environment. However, due to the asynchrony among and diversity of sensors, information assimilation – how to combine the information obtained from asynchronous and multifarious sources is an important and challenging research problem. In this paper, we propose a framework for information assimilation that addresses the issues – “when”, “what” and “how” to assimilate the information obtained from different media sources in order to detect events in multimedia surveillance systems. The proposed framework adopts a hierarchical probabilistic assimilation approach to detect atomic and compound events. To detect an event, our framework uses not only the media streams available at the current instant but it also utilizes their two important properties – first, accumulated past history of whether they have been providing concurring or contradictory evidences, and – second, the system designer’s confidence in them. The experimental results show the utility of the proposed framework.  相似文献   

4.
5.
6.
Multimedia Tools and Applications - An efficient way of extracting useful information from multiple sources of data is to use data fusion technology. This paper introduces a data fusion approach in...  相似文献   

7.
We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.  相似文献   

8.
In video sequences, edges in 2D images (frames) produce 3D surface in the spatio-temporal volume. In this paper, we propose to consider temporal collisions between edges, and thus objects, as 3D ridges in the spatio-temporal volume. Edge collisions (i.e. ridge points) can be located using the maximum principal curvature and the principal curvature direction. Using the detected ridges, we then propose a technique to identify overlapping objects events in an image sequence, by neither computing depth nor optical flow. We present successful experiments on real image sequences.  相似文献   

9.
10.
Abstract

Multimedia information systems, supplied on CD-ROM, are fast becoming a popular consumer product. A huge and growing range of titles is available from high street computer, electrical goods and book shops. In an attempt to provide a compact set of evaluation criteria for these products, established methods in the fields of human-computer interaction (HCI), computer-assisted learning (CAL) and information retrieval are considered. The needs and desires of the home user are substantially different from those of the work place or education user. Observations from product use, and an interview study with home multimedia users, suggests that factors such as aesthetics, levels of interactivity and information content may be crucially important in user satisfaction. Factors such as interface clarity and consistency may be less important than in work place systems.  相似文献   

11.
12.
Multimedia Tools and Applications - Nowadays, dictionary learning has become an important tool in many classification tasks, especially for images. The tailor-made atoms in a dictionary are trained...  相似文献   

13.
Wireless multimedia sensors have been frequently used for detecting events in acoustic rich environments such as protected area networks. Such areas have diverse habitat, frequently varying terrain and are a source of very large number of acoustic events. This work is aimed at detecting the tree cutting event in a forest area, by identifying the acoustic pattern generated due to an axe hitting a tree bole, with the help of wireless multimedia sensors. A series of operations using the hamming window, wiener filter, Otsu thresholding and mathematical morphology are used for removing the unwanted clutter from the spectrogram obtained from such events. Using the sparse nature of the acoustic signals, a compressed sensing based energy efficient data gathering scheme is devised for accurate event reporting. A network of Mica2 motes is deployed in a real forest area to test the validity of the proposed scheme. Analytical and experimental results proves the efficacy of the proposed event detection scheme.  相似文献   

14.
15.
一种数据融合算法评估平台   总被引:3,自引:0,他引:3  
传感器的多样性、战场环境的复杂性给数据融合算法的选择带来了困难,针对这种情况,介绍了一种多传感器数据融合算法测试平台,对融合算法进行了定量分析评估,为融合算法的选择使用提供了一种参考。  相似文献   

16.
This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several classifications based on the fusion methodology and the level of fusion (feature, decision, and hybrid). The fusion methods are described from the perspective of the basic concept, advantages, weaknesses, and their usage in various analysis tasks as reported in the literature. Moreover, several distinctive issues that influence a multimodal fusion process such as, the use of correlation and independence, confidence level, contextual information, synchronization between different modalities, and the optimal modality selection are also highlighted. Finally, we present the open issues for further research in the area of multimodal fusion.  相似文献   

17.
We develop, in this paper, a representation of time and events that supports a range of reasoning tasks such as monitoring and detection of event patterns which may facilitate the explanation of root cause(s) of faults. We shall compare two approaches to event definition: the active database approach in which events are defined in terms of the conditions for their detection at an instant, and the knowledge representation approach in which events are defined in terms of the conditions for their occurrence over an interval. We shall show the shortcomings of the former definition and employ a three-valued temporal first order nonmonotonic logic, extended with events, in order to integrate both definitions.  相似文献   

18.
19.
Silence detection and removal is an essential building block of any multimedia video conferencing system. It reduces the bandwidth requirements of the underlying network transport service and helps to maintain an acceptable end-to-end delay for audio. We analyze the requirements for a silence detection algorithm hosted on a multimedia communication system, and propose a novel low-complexity algorithm operating in the non-linear domain. After discussing the constraints which are imposed by the architecture of the system hardware (computer, packet-based network), we show that several recently proposed silence detection algorithms fail to meet all of these constraints. A new approach is then introduced, based on the small- and large-signal behavior of the speech waveform in the -law domain. The new algorithm is compared with a recent design that meets several of our requirements; experimental results indicate that it performs significantly better in the particular environment at hand.  相似文献   

20.
This paper addresses how to automatically generate code for multimedia extension architectures in the presence of conditionals. We evaluate the costs and benefits of exploiting branches on the aggregate condition codes associated with the fields of a superword (an aggregate object larger than a machine word) such as the branch-on-any instruction of the AltiVec. Branch-on-superword-condition-codes (BOSCC) instructions allow fast detection of aggregate conditions, an optimization opportunity often found in multimedia applications. This paper presents compiler analyses and techniques for generating efficient parallel code using BOSCC instructions. We evaluate our approach, which has been implemented in the SUIF compiler, through a set of experiments with multimedia benchmarks, and compare it with the default approach previously implemented in our compiler. Our experimental results show that using BOSCC instructions can result in better performance for applications where the aggregate condition codes of a superword often evaluate to the same value.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号