首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME’s performance in the 2012 TRECVID MED evaluation was one of the best reported.  相似文献   

2.
Concept detection stands as an important problem for efficient indexing and retrieval in large video archives. In this work, the KavTan System, which performs high-level semantic classification in one of the largest TV archives of Turkey, is presented. In this system, concept detection is performed using generalized visual and audio concept detection modules that are supported by video text detection, audio keyword spotting and specialized audio-visual semantic detection components. The performance of the presented framework was assessed objectively over a wide range of semantic concepts (5 high-level, 14 visual, 9 audio, 2 supplementary) by using a significant amount of precisely labeled ground truth data. KavTan System achieves successful high-level concept detection performance in unconstrained TV broadcast by efficiently utilizing multimodal information that is systematically extracted from both spatial and temporal extent of multimedia data.  相似文献   

3.
Semantic filtering and retrieval of multimedia content is crucial for efficient use of the multimedia data repositories. Video query by semantic keywords is one of the most difficult problems in multimedia data retrieval. The difficulty lies in the mapping between low-level video representation and high-level semantics. We therefore formulate the multimedia content access problem as a multimedia pattern recognition problem. We propose a probabilistic framework for semantic video indexing, which call support filtering and retrieval and facilitate efficient content-based access. To map low-level features to high-level semantics we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music etc. Semantic concepts in videos interact and to model this interaction explicitly, we propose a network of multijects (multinet). Using probabilistic models for six site multijects, rocks, sky, snow, water-body forestry/greenery and outdoor and using a Bayesian belief network as the multinet we demonstrate the application of this framework to semantic indexing. We demonstrate how detection performance can be significantly improved using the multinet to take interconceptual relationships into account. We also show how the multinet can fuse heterogeneous features to support detection based on inference and reasoning  相似文献   

4.
基于深度学习的图像检索系统   总被引:2,自引:0,他引:2  
基于内容的图像检索系统关键的技术是有效图像特征的获取和相似度匹配策略.在过去,基于内容的图像检索系统主要使用低级的可视化特征,无法得到满意的检索结果,所以尽管在基于内容的图像检索上花费了很大的努力,但是基于内容的图像检索依旧是计算机视觉领域中的一个挑战.在基于内容的图像检索系统中,存在的最大的问题是“语义鸿沟”,即机器从低级的可视化特征得到的相似性和人从高级的语义特征得到的相似性之间的不同.传统的基于内容的图像检索系统,只是在低级的可视化特征上学习图像的特征,无法有效的解决“语义鸿沟”.近些年,深度学习技术的快速发展给我们提供了希望.深度学习源于人工神经网络的研究,深度学习通过组合低级的特征形成更加抽象的高层表示属性类别或者特征,以发现数据的分布规律,这是其他算法无法实现的.受深度学习在计算机视觉、语音识别、自然语言处理、图像与视频分析、多媒体等诸多领域取得巨大成功的启发,本文将深度学习技术用于基于内容的图像检索,以解决基于内容的图像检索系统中的“语义鸿沟”问题.  相似文献   

5.
The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user’s day-to-day activities. It captures on average 3,000 images in a typical day, equating to almost 1 million images per year. It can be used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer’s life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the domain of visual lifelogs. Our concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept’s presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were evaluated on a subset of 95,907 images, to determine the accuracy for detection of each semantic concept. We conducted further analysis on the temporal consistency, co-occurance and relationships within the detected concepts to more extensively investigate the robustness of the detectors within this domain.  相似文献   

6.
7.
8.
The recent popularity of smart mobile devices has led to a significant increase in the needs of multimedia services. Finding new more efficient methods for automatic classification and retrieval of a large number of multimedia files will significantly reduce manpower costs. However, most current video content analysis methods adopt low-level features to analyze video frame by frame, and need to improve high-level semantic analysis on a number of issues. Hence, this study presents a storyboard-based accurate automatic summary video editing system that uses storyboard information, such as character dialogue, narration, caption, background music and shot changes, to enable accurate video content retrieval and automatic render summary videos. The proposed system can be applied to the course video trailer and the commercial video trailer for quick preview video content or suitable viewing configuration for smart mobile devices. Consequently, the audience can quickly understand the whole video story and the video editors can substantially reduce the time taken to publish videos.  相似文献   

9.
To support effective multimedia information retrieval, video annotation has become an important topic in video content analysis. Existing video annotation methods put the focus on either the analysis of low-level features or simple semantic concepts, and they cannot reduce the gap between low-level features and high-level concepts. In this paper, we propose an innovative method for semantic video annotation through integrated mining of visual features, speech features, and frequent semantic patterns existing in the video. The proposed method mainly consists of two main phases: 1) Construction of four kinds of predictive annotation models, namely speech-association, visual-association, visual-sequential, and statistical models from annotated videos. 2) Fusion of these models for annotating un-annotated videos automatically. The main advantage of the proposed method lies in that all visual features, speech features, and semantic patterns are considered simultaneously. Moreover, the utilization of high-level rules can effectively complement the insufficiency of statistics-based methods in dealing with complex and broad keyword identification in video annotation. Through empirical evaluation on NIST TRECVID video datasets, the proposed approach is shown to enhance the performance of annotation substantially in terms of precision, recall, and F-measure.  相似文献   

10.
This paper presents a probabilistic Bayesian belief network (BBN) method for automatic indexing of excitement clips of sports video sequences. The excitement clips from sports video sequences are extracted using audio features. The excitement clips are comprised of multiple subclips corresponding to the events such as replay, field-view, close-ups of players, close-ups of referees/umpires, spectators, players’ gathering. The events are detected and classified using a hierarchical classification scheme. The BBN based on observed events is used to assign semantic concept-labels to the excitement clips, such as goals, saves, and card in soccer video, wicket and hit in cricket video sequences. The BBN based indexing results are compared with our previously proposed event-association based approach and found BBN is better than the event-association based approach. The proposed scheme provides a generalizable method for linking low-level video features with high-level semantic concepts. The generic nature of the proposed approach in the sports domain is validated by demonstrating successful indexing of soccer and cricket video excitement clips. The proposed scheme offers a general approach to the automatic tagging of large scale multimedia content with rich semantics. The collection of labeled excitement clips provide a video summary for highlight browsing, video skimming, indexing and retrieval.  相似文献   

11.
Describing visual contents in videos by semantic concepts is an effective and realistic approach that can be used in video applications such as annotation, indexing, retrieval and ranking. In these applications, video data needs to be labelled with some known set of labels or concepts. Assigning semantic concepts manually is not feasible due to the large volume of ever-growing video data. Hence, automatic semantic concept detection of videos is a hot research area. Recently Deep Convolutional Neural Networks (CNNs) used in computer vision tasks are showing remarkable performance. In this paper, we present a novel approach for automatic semantic video concept detection using deep CNN and foreground driven concept co-occurrence matrix (FDCCM) which keeps foreground to background concept co-occurrence values, built by exploiting concept co-occurrence relationship in pre-labelled TRECVID video dataset and from a collection of random images extracted from Google Images. To deal with the dataset imbalance problem, we have extended this approach by making a fusion of two asymmetrically trained deep CNNs and used FDCCM to further improve concept detection. The performance of the proposed approach is compared with state-of-the-art approaches for the video concept detection over the widely used TRECVID data set and is found to be superior to existing approaches.  相似文献   

12.
为了克服图像底层特征与高层语义之间的语义鸿沟,降低自顶向下的显著性检测方法对特定物体先验的依赖,提出一种基于高层颜色语义特征的显著性检测方法。首先从彩色图像中提取结构化颜色特征并在多核学习框架下,实现对图像进行颜色命名获取像素的颜色语义名称;接着利用图像颜色语义名称分布计算高层颜色语义特征,再将其与底层的Gist特征融合,通过线性支持向量机训练生成显著性分类器,实现像素级的显著性检测。实验结果表明,本文算法能够更加准确地检测出人眼视觉关注点。且与传统的底层颜色特征相比,本文颜色语义特征能够获得更好的显著性检测结果。  相似文献   

13.
14.
This paper presents the semantic pathfinder architecture for generic indexing of multimedia archives. The semantic pathfinder extracts semantic concepts from video by exploring different paths through three consecutive analysis steps, which we derive from the observation that produced video is the result of an authoring-driven process. We exploit this authoring metaphor for machine-driven understanding. The pathfinder starts with the content analysis step. In this analysis step, we follow a data-driven approach of indexing semantics. The style analysis step is the second analysis step. Here, we tackle the indexing problem by viewing a video from the perspective of production. Finally, in the context analysis step, we view semantics in context. The virtue of the semantic pathfinder is its ability to learn the best path of analysis steps on a per-concept basis. To show the generality of this novel indexing approach, we develop detectors for a lexicon of 32 concepts and we evaluate the semantic pathfinder against the 2004 NIST TRECVID video retrieval benchmark, using a news archive of 64 hours. Top ranking performance in the semantic concept detection task indicates the merit of the semantic pathfinder for generic indexing of multimedia archives.  相似文献   

15.
The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing.  相似文献   

16.
目的 海量数据的快速增长给多媒体计算带来了深刻挑战。与传统以手工构造为核心的媒体计算模式不同,数据驱动下的深度学习(特征学习)方法成为当前媒体计算主流。方法 重点分析了深度学习在检索排序与标注、多模态检索与语义理解、视频分析与理解等媒体计算方面的最新进展和所面临的挑战,并对未来的发展趋势进行展望。结果 在检索排序与标注方面, 基于深度学习的神经编码等方法取得了很好的效果;在多模态检索与语义理解方面,深度学习被用于弥补不同模态间的“异构鸿沟“以及底层特征与高层语义间的”语义鸿沟“,基于深度学习的组合语义学习成为研究热点;在视频分析与理解方面, 深度神经网络被用于学习视频的有效表示方式及动作识别,并取得了很好的效果。然而,深度学习是一种数据驱动的方法,易受数据噪声影响, 对于在线增量学习方面还不成熟,如何将深度学习与众包计算相结合是一个值得期待的问题。结论 该综述在深入分析现有方法的基础上,对深度学习框架下为解决异构鸿沟和语义鸿沟给出新的思路。  相似文献   

17.
Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1) Semantics-sensitive video classification problem because of the semantic gap between low-level visual features and high-level semantic visual concepts; 2) Integrated video access problem because of the lack of efficient video database indexing, automatic video annotation, and concept-oriented summary organization techniques. In this paper, we have proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access. 1) A hierarchical semantics-sensitive video classifier is proposed to shorten the semantic gap. The hierarchical tree structure of the semantics-sensitive video classifier is derived from the domain-dependent concept hierarchy of video contents in a database. Relevance analysis is used for selecting the discriminating visual features with suitable importances. The Expectation-Maximization (EM) algorithm is also used to determine the classification rule for each visual concept node in the classifier. 2) A hierarchical video database indexing and summary presentation technique is proposed to support more effective video access over a large-scale database. The hierarchical tree structure of our video database indexing scheme is determined by the domain-dependent concept hierarchy which is also used for video classification. The presentation of visual summary is also integrated with the inherent hierarchical video database indexing tree structure. Integrating video access with efficient database indexing tree structure has provided great opportunity for supporting more powerful video search engines.  相似文献   

18.
基于内容的视频查询系统研究   总被引:1,自引:0,他引:1       下载免费PDF全文
由于多媒体数据库管理和检索的效率直接决定了人们利用多媒体数据信息的效率,因此随着MPEG-7标准的提出,基于内容的图象/视频存储和检索已成为研究的热点.为了快速地对视频进行浏览和检索,在研究基于内容的视频数据库管理和检索等热点问题的基础上,首先使用MPEG-7视觉内容描述子和语义描述子来构建视频数据库的语义结构,并结合底层视觉特征和高层语义特征,采用相关反馈机制和半自动权重更新体制来对视频数据库进行管理和检索;然后采用语法分析器来支持自然语言查询;最后在此基础上实现了基于内容的视频数据库的管理和查询系统.实验证明,该系统能够有效地对视频数据进行管理和检索,并且具有一定的智能性和适应性.  相似文献   

19.
Multimedia understanding is a fast emerging interdisciplinary research area. There is tremendous potential for effective use of multimedia content through intelligent analysis. Diverse application areas are increasingly relying on multimedia understanding systems. Advances in multimedia understanding are related directly to advances in signal processing, computer vision, pattern recognition, multimedia databases, and smart sensors. We review the state-of-the-art techniques in multimedia retrieval. In particular, we discuss how multimedia retrieval can be viewed as a pattern recognition problem. We discuss how reliance on powerful pattern recognition and machine learning techniques is increasing in the field of multimedia retrieval. We review the state-of-the-art multimedia understanding systems with particular emphasis on a system for semantic video indexing centered around multijects and multinets. We discuss how semantic retrieval is centered around concepts and context and the various mechanisms for modeling concepts and context.  相似文献   

20.
We propose a complementary relevance feedback-based content-based image retrieval (CBIR) system. This system exploits the synergism between short-term and long-term learning techniques to improve the retrieval performance. Specifically, we construct an adaptive semantic repository in long-term learning to store retrieval patterns of historical query sessions. We then extract high-level semantic features from the semantic repository and seamlessly integrate low-level visual features and high-level semantic features in short-term learning to effectively represent the query in a single retrieval session. The high-level semantic features are dynamically updated based on users’ query concept and therefore represent the image’s semantic concept more accurately. Our extensive experimental results demonstrate that the proposed system outperforms its seven state-of-the-art peer systems in terms of retrieval precision and storage space on a large scale imagery database.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号