共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Wei Tong Yi Yang Lu Jiang Shoou-I Yu ZhenZhong Lan Zhigang Ma Waito Sze Ehsan Younessian Alexander G. Hauptmann 《Machine Vision and Applications》2014,25(1):5-15
Detecting multimedia events in web videos is an emerging hot research area in the fields of multimedia and computer vision. In this paper, we introduce the core methods and technologies of the framework we developed recently for our Event Labeling through Analytic Media Processing (E-LAMP) system to deal with different aspects of the overall problem of event detection. More specifically, we have developed efficient methods for feature extraction so that we are able to handle large collections of video data with thousands of hours of videos. Second, we represent the extracted raw features in a spatial bag-of-words model with more effective tilings such that the spatial layout information of different features and different events can be better captured, thus the overall detection performance can be improved. Third, different from widely used early and late fusion schemes, a novel algorithm is developed to learn a more robust and discriminative intermediate feature representation from multiple features so that better event models can be built upon it. Finally, to tackle the additional challenge of event detection with only very few positive exemplars, we have developed a novel algorithm which is able to effectively adapt the knowledge learnt from auxiliary sources to assist the event detection. Both our empirical results and the official evaluation results on TRECVID MED’11 and MED’12 demonstrate the excellent performance of the integration of these ideas. 相似文献
3.
Information assimilation framework for event detection in multimedia surveillance systems 总被引:1,自引:0,他引:1
Most multimedia surveillance and monitoring systems nowadays utilize multiple types of sensors to detect events of interest as and when they occur in the environment. However, due to the asynchrony among and diversity of sensors, information assimilation – how to combine the information obtained from asynchronous and multifarious sources is an important and challenging research problem. In this paper, we propose a framework for information assimilation that addresses the issues – “when”, “what” and “how” to assimilate the information obtained from different media sources in order to detect events in multimedia surveillance systems. The proposed framework adopts a hierarchical probabilistic assimilation approach to detect atomic and compound events. To detect an event, our framework uses not only the media streams available at the current instant but it also utilizes their two important properties – first, accumulated past history of whether they have been providing concurring or contradictory evidences, and – second, the system designer’s confidence in them. The experimental results show the utility of the proposed framework. 相似文献
4.
目的 图像的变化检测是视觉领域的一个重要问题,传统的变化检测对光照变化、相机位姿差异过于敏感,使得在真实场景中检测结果较差。鉴于卷积神经网络(convolutional neural networks,CNN)可以提取图像中的深度语义特征,提出一种基于多尺度深度特征融合的变化检测模型,通过提取并融合图像的高级语义特征来克服检测噪音。方法 使用VGG(visual geometry group)16作为网络的基本模型,采用孪生网络结构,分别从参考图像和查询图像中提取不同网络层的深度特征。将两幅图像对应网络层的深度特征拼接后送入一个编码层,通过编码层逐步将高层与低层网络特征进行多尺度融合,充分结合高层的语义和低层的纹理特征,检测出准确的变化区域。使用卷积层对每一个编码层的特征进行运算产生对应尺度的预测结果。将不同尺度的预测结果融合得到进一步细化的检测结果。结果 与SC_SOBS(SC-self-organizing background subtraction)、SuBSENSE(self-balanced sensitivity segmenter)、FGCD(fine-grained change detection)和全卷积网络(fully convolutional network,FCN)4种检测方法进行对比。与性能第2的模型FCN相比,本文方法在VL_CMU_CD(visual localization of Carnegie Mellon University for change detection)数据集中,综合评价指标F1值和精度值分别提高了12.2%和24.4%;在PCD(panoramic change detection)数据集中,F1值和精度值分别提高了2.1%和17.7%;在CDnet(change detection net)数据集中,F1值和精度值分别提高了8.5%和5.8%。结论 本文提出的基于多尺度深度特征融合的变化检测方法,利用卷积神经网络的不同网络层特征,有效克服了光照和相机位姿差异,在不同数据集上均能得到较为鲁棒的变化检测结果。 相似文献
5.
6.
7.
Multimedia Tools and Applications - An efficient way of extracting useful information from multiple sources of data is to use data fusion technology. This paper introduces a data fusion approach in... 相似文献
8.
Sangmin Oh Scott McCloskey Ilseo Kim Arash Vahdat Kevin J. Cannons Hossein Hajimirsadeghi Greg Mori A. G. Amitha Perera Megha Pandey Jason J. Corso 《Machine Vision and Applications》2014,25(1):49-69
We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system. 相似文献
9.
In video sequences, edges in 2D images (frames) produce 3D surface in the spatio-temporal volume. In this paper, we propose
to consider temporal collisions between edges, and thus objects, as 3D ridges in the spatio-temporal volume. Edge collisions
(i.e. ridge points) can be located using the maximum principal curvature and the principal curvature direction. Using the
detected ridges, we then propose a technique to identify overlapping objects events in an image sequence, by neither computing
depth nor optical flow. We present successful experiments on real image sequences. 相似文献
10.
11.
Abstract Multimedia information systems, supplied on CD-ROM, are fast becoming a popular consumer product. A huge and growing range of titles is available from high street computer, electrical goods and book shops. In an attempt to provide a compact set of evaluation criteria for these products, established methods in the fields of human-computer interaction (HCI), computer-assisted learning (CAL) and information retrieval are considered. The needs and desires of the home user are substantially different from those of the work place or education user. Observations from product use, and an interview study with home multimedia users, suggests that factors such as aesthetics, levels of interactivity and information content may be crucially important in user satisfaction. Factors such as interface clarity and consistency may be less important than in work place systems. 相似文献
12.
13.
针对当前基于深度学习的边缘检测技术产生的边缘线条杂乱且模糊等问题,提出了一种基于RCF的端到端的跨层融合多尺度特征的边缘检测(CFF)模型。该模型使用RCF作为基线,在主干网络中加入CBAM,采用具有平移不变性的下采样技术,并且去除了主干网络中的部分下采样操作,以保留图像的细节信息,同时使用扩张卷积技术增大模型感受野。此外,采用跨层融合特征图的方式,使得高低层特征能够充分融合。为了平衡各阶段损失和融合损失之间的关系,以及避免出现多尺度特征融合之后低层细节过度丢失的现象,对每个损失添加了一个权重。在伯克利分割数据集(BSDS500)和PASCAL VOL Context数据集上进行了训练,在测试时使用图像金字塔技术提高边缘图像的质量。实验结果表明,CFF模型提取的轮廓比基线网络更加清晰,能够解决边缘模糊问题。在BSDS500基准上进行的评估表明,该模型将最佳数据集规模(ODS)和最佳图像比例(OIS)指标分别提高到0.818和0.839。 相似文献
14.
Multimedia Tools and Applications - Nowadays, dictionary learning has become an important tool in many classification tasks, especially for images. The tailor-made atoms in a dictionary are trained... 相似文献
15.
16.
使用异常情况或标识的传统入侵检测模型,检测粒度较大,精度较差,且占用系统资源较多。针对上述问题,提出了分布式异常事件融合入侵检测模型。该模型通过事件跟踪等方法降低检测粒度;采用分布式的多节点灰度关联度算法,进行异常事件的信息融合,进行异常事件分析处理。仿真实验证明,该模型的入侵检测精度较高,而系统资源消耗较少。 相似文献
17.
目的 视频目标检测旨在序列图像中定位运动目标,并为各个目标分配指定的类别标签。视频目标检测存在目标模糊和多目标遮挡等问题,现有的大部分视频目标检测方法是在静态图像目标检测的基础上,通过考虑时空一致性来提高运动目标检测的准确率,但由于运动目标存在遮挡、模糊等现象,目前视频目标检测的鲁棒性不高。为此,本文提出了一种单阶段多框检测(single shot multibox detector,SSD)与时空特征融合的视频目标检测模型。方法 在单阶段目标检测的SSD模型框架下,利用光流网络估计当前帧与近邻帧之间的光流场,结合多个近邻帧的特征对当前帧的特征进行运动补偿,并利用特征金字塔网络提取多尺度特征用于检测不同尺寸的目标,最后通过高低层特征融合增强低层特征的语义信息。结果 实验结果表明,本文模型在ImageNet VID (Imagelvet for video object detetion)数据集上的mAP (mean average precision)为72.0%,相对于TCN (temporal convolutional networks)模型、TPN+LSTM (tubelet proposal network and long short term memory network)模型和SSD+孪生网络模型,分别提高了24.5%、3.6%和2.5%,在不同结构网络模型上的分离实验进一步验证了本文模型的有效性。结论 本文模型利用视频特有的时间相关性和空间相关性,通过时空特征融合提高了视频目标检测的准确率,较好地解决了视频目标检测中目标漏检和误检的问题。 相似文献
18.
Vishal Krishna Singh Gajendra Sharma Manish Kumar 《Multimedia Tools and Applications》2017,76(18):18531-18555
Wireless multimedia sensors have been frequently used for detecting events in acoustic rich environments such as protected area networks. Such areas have diverse habitat, frequently varying terrain and are a source of very large number of acoustic events. This work is aimed at detecting the tree cutting event in a forest area, by identifying the acoustic pattern generated due to an axe hitting a tree bole, with the help of wireless multimedia sensors. A series of operations using the hamming window, wiener filter, Otsu thresholding and mathematical morphology are used for removing the unwanted clutter from the spectrogram obtained from such events. Using the sparse nature of the acoustic signals, a compressed sensing based energy efficient data gathering scheme is devised for accurate event reporting. A network of Mica2 motes is deployed in a real forest area to test the validity of the proposed scheme. Analytical and experimental results proves the efficacy of the proposed event detection scheme. 相似文献
19.
基于事件框架的主题事件融合研究* 总被引:1,自引:0,他引:1
针对事件抽取获得的单个元事件无法完整描述主题事件的特点,提出了一种主题事件的融合方法,通过该方法将与同一主题相关的所有元事件整合在一起,以层次化的形式表示。首先定义了一种事件融合框架TEFF(topic event fusion framework)。该框架根据各类元事件在主题事件中的作用,将主题事件以层次化的形式表示。同时给出元事件和主题的相关度计算方法,通过该算法来评价元事件和主题的相关度。在TEFF的指导下,通过相关度计算,实现主题事件的融合。在以2008年起的金融危机为主题的实验中,取得了F值为7 相似文献
20.
Pradeep K. Atrey M. Anwar Hossain Abdulmotaleb El Saddik Mohan S. Kankanhalli 《Multimedia Systems》2010,16(6):345-379
This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used
for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal
fusion research is presented through several classifications based on the fusion methodology and the level of fusion (feature,
decision, and hybrid). The fusion methods are described from the perspective of the basic concept, advantages, weaknesses,
and their usage in various analysis tasks as reported in the literature. Moreover, several distinctive issues that influence
a multimodal fusion process such as, the use of correlation and independence, confidence level, contextual information, synchronization
between different modalities, and the optimal modality selection are also highlighted. Finally, we present the open issues
for further research in the area of multimodal fusion. 相似文献