首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper targets at the problem of automatic semantic indexing of news videos by presenting a video annotation and retrieval system which is able to perform automatic semantic annotation of news video archives and provide access to the archives via these annotations. The presented system relies on the video texts as the information source and exploits several information extraction techniques on these texts to arrive at representative semantic information regarding the underlying videos. These techniques include named entity recognition, person entity extraction, coreference resolution, and semantic event extraction. Apart from the information extraction components, the proposed system also encompasses modules for news story segmentation, text extraction, and video retrieval along with a news video database to make it a full-fledged system to be employed in practical settings. The proposed system is a generic one employing a wide range of techniques to automate the semantic video indexing process and to bridge the semantic gap between what can be automatically extracted from videos and what people perceive as the video semantics. Based on the proposed system, a novel automatic semantic annotation and retrieval system is built for Turkish and evaluated on a broadcast news video collection, providing evidence for its feasibility and convenience for news videos with a satisfactory overall performance.  相似文献   

2.
设计和实现一个支持语义的分布式视频检索系统:"语寻"。该系统利用一个改进的视频语义处理工具(该工具基于IBM VideoAnnEx标注工具,并增加镜头语义图标注和自然语言处理的功能)对视频进行语义分析和标注,生成包含语义信息的MPEG-7描述文件,然后对视频的MPEG-7描述文件建立分布式索引,并同时分布式存储视频文件;系统提供丰富的Web查询接口,包括关键字语义扩展查询,语义图查询以及自然语句查询,当用户提交语义查询意图后,便能够迅速地检索到感兴趣的视频和片段,并且可以浏览点播;整个系统采用分布式架构,具备良好的可扩展性,并能够支持海量视频信息的索引和检索。  相似文献   

3.
4.
视频标注是指利用语义索引信息标注视频内容,其目的是方便检索视频。现有视频标注工作使用的视觉底层特征,较难直接用来标注体育视频中的人体专业动作。针对此问题,使用视频图像序列中二维人体关节点特征,建立专业动作知识库来标注体育视频中的专业动作。采用动态规划算法比较视频之间的人体动作差异,并融入协同训练学习算法进行体育视频的半自动标注。以网球比赛视频为测试数据进行实验,结果表明,该算法的动作标注正确率达到81.4%,与现有算法的专业动作标注相比,提高了30.5%。  相似文献   

5.
6.
To support effective multimedia information retrieval, video annotation has become an important topic in video content analysis. Existing video annotation methods put the focus on either the analysis of low-level features or simple semantic concepts, and they cannot reduce the gap between low-level features and high-level concepts. In this paper, we propose an innovative method for semantic video annotation through integrated mining of visual features, speech features, and frequent semantic patterns existing in the video. The proposed method mainly consists of two main phases: 1) Construction of four kinds of predictive annotation models, namely speech-association, visual-association, visual-sequential, and statistical models from annotated videos. 2) Fusion of these models for annotating un-annotated videos automatically. The main advantage of the proposed method lies in that all visual features, speech features, and semantic patterns are considered simultaneously. Moreover, the utilization of high-level rules can effectively complement the insufficiency of statistics-based methods in dealing with complex and broad keyword identification in video annotation. Through empirical evaluation on NIST TRECVID video datasets, the proposed approach is shown to enhance the performance of annotation substantially in terms of precision, recall, and F-measure.  相似文献   

7.
Describing visual contents in videos by semantic concepts is an effective and realistic approach that can be used in video applications such as annotation, indexing, retrieval and ranking. In these applications, video data needs to be labelled with some known set of labels or concepts. Assigning semantic concepts manually is not feasible due to the large volume of ever-growing video data. Hence, automatic semantic concept detection of videos is a hot research area. Recently Deep Convolutional Neural Networks (CNNs) used in computer vision tasks are showing remarkable performance. In this paper, we present a novel approach for automatic semantic video concept detection using deep CNN and foreground driven concept co-occurrence matrix (FDCCM) which keeps foreground to background concept co-occurrence values, built by exploiting concept co-occurrence relationship in pre-labelled TRECVID video dataset and from a collection of random images extracted from Google Images. To deal with the dataset imbalance problem, we have extended this approach by making a fusion of two asymmetrically trained deep CNNs and used FDCCM to further improve concept detection. The performance of the proposed approach is compared with state-of-the-art approaches for the video concept detection over the widely used TRECVID data set and is found to be superior to existing approaches.  相似文献   

8.
基于本体的视频语义检索系统研究   总被引:1,自引:0,他引:1  
徐峰  郑烇 《计算机应用》2010,30(3):835-837
在语义层次上检索视频内容,可以突破“语义鸿沟”,提高视频内容的利用效率。利用本体的标注和推理能力,研究视频语义检索,充分挖掘视频内容的结构和语义信息,构建层次化的语义索引,能极大提高系统的语义检索能力。该视频语义检索系统(OVSR)的本体架构集成了领域本体、视频本体和核心本体,具有很强的扩展和交互能力。主要论述OVSR的本体架构,视频语义模型和索引模型,研究OVSR的查询重写算法以及本体推理算法。  相似文献   

9.
Automatic annotation of semantic events allows effective retrieval of video content. In this work, we present solutions for highlights detection in sports videos. This application is particularly interesting for broadcasters, since they extensively use manual annotation to select interesting highlights that are edited to create new programmes. The proposed approach exploits the typical structure of a wide class of sports videos, namely, those related to sports which are played in delimited venues with playfields of well known geometry, like soccer, basketball, swimming, track and field disciplines, and so on. For this class of sports, a modeling scheme based on a limited set of visual cues and on finite state machines (FSM) that encode the temporal evolution of highlights is presented. Algorithms for model checking and for visual cues estimation are discussed, as well as applications of the representation to different sport domains.  相似文献   

10.
Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well as home-users are dealing with this type of content and therefore novel technologies for semantic search and retrieval are required. In this paper, we present a summary of the most relevant achievements of the RUSHES project, focusing on specific approaches for automatic annotation as well as the main features of the final RUSHES search engine.  相似文献   

11.
This work addresses the development of a computational model of visual attention to perform the automatic summarization of digital videos from television archives. Although the television system represents one of the most fascinating media phenomena ever created, we still observe the absence of effective solutions for content-based information retrieval from video recordings of programs produced by this media universe. This fact relates to the high complexity of the content-based video retrieval problem, which involves several challenges, among which we may highlight the usual demand on video summaries to facilitate indexing, browsing and retrieval operations. To achieve this goal, we propose a new computational visual attention model, inspired on the human visual system and based on computer vision methods (face detection, motion estimation and saliency map computation), to estimate static video abstracts, that is, collections of salient images or key frames extracted from the original videos. Experimental results with videos from the Open Video Project show that our approach represents an effective solution to the problem of automatic video summarization, producing video summaries with similar quality to the ground-truth manually created by a group of 50 users.  相似文献   

12.
Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1) Semantics-sensitive video classification problem because of the semantic gap between low-level visual features and high-level semantic visual concepts; 2) Integrated video access problem because of the lack of efficient video database indexing, automatic video annotation, and concept-oriented summary organization techniques. In this paper, we have proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access. 1) A hierarchical semantics-sensitive video classifier is proposed to shorten the semantic gap. The hierarchical tree structure of the semantics-sensitive video classifier is derived from the domain-dependent concept hierarchy of video contents in a database. Relevance analysis is used for selecting the discriminating visual features with suitable importances. The Expectation-Maximization (EM) algorithm is also used to determine the classification rule for each visual concept node in the classifier. 2) A hierarchical video database indexing and summary presentation technique is proposed to support more effective video access over a large-scale database. The hierarchical tree structure of our video database indexing scheme is determined by the domain-dependent concept hierarchy which is also used for video classification. The presentation of visual summary is also integrated with the inherent hierarchical video database indexing tree structure. Integrating video access with efficient database indexing tree structure has provided great opportunity for supporting more powerful video search engines.  相似文献   

13.
A video retrieval system user hopes to find relevant information when the proposed queries are ambiguous. The retrieval process based on detecting concepts remains ineffective in such a situation. Potential relationships between concepts have been shown as a valuable knowledge resource that can enhance the retrieval effectiveness, even for ambiguous queries. Recent researches in multimedia retrieval have focused on ontology modeling as a common framework to manage knowledge. Handling these ontologies has to cope with issues related to generic knowledge management and processing scalability. Considering these issues, we suggest a context-based fuzzy ontology framework for video content analysis and indexing. In this paper, we focused on the way in which we modeled our fuzzy ontology: First, we populate automatically the generated ontology by gathering various available video annotation datasets. Then, the ontology content was used to infer enhanced video semantic interpretation. Finally, considering user feedback, the content of the ontology was improved. Experimental results showed that our approach achieves the goal of scalability while at the same time allowing better video content semantic interpretation.  相似文献   

14.
Automatic annotation of semantic events allows effective retrieval of video content. In this work, we present solutions for highlights detection in sports videos. The proposed approach exploits the typical structure of a wide class of sports videos, namely those related to sports which are played in delimited venues with playfields of well known geometry, like soccer, basketball, swimming, track and field disciplines, and so on. For these sports, a modeling scheme based on a limited set of visual cues and on finite state machines that encode the temporal evolution of highlights is presented, that is of general applicability to this class of sports. Visual cues encode position and speed information coming from the camera and from the object/athletes that are present in the scene, and are estimated automatically from the video stream. Algorithms for model checking and for visual cues estimation are discussed, as well as applications of the representation to different sport domains.  相似文献   

15.
随着互联网技术的飞速发展,视频数据呈现海量爆炸式增长,传统的视频搜索引擎 多数采用单一的基于文本的检索方法,该检索方法对于视频这类非结构化数据,存在着内容缺失、 语义隔阂等问题,导致检索结果相关度较低。提出一种基于视觉词袋的视频检索校准方法,该方 法结合了视频数据的可视化特征提取技术、TF-IDF 技术、开放数据技术,为用户提供优化后的 视频检索校准结果。首先,基于HSV 模型的聚类算法提取视频的关键帧集合及关键帧权值向量; 接着用关键帧图像的加速稳健特征等表示视频的内容特征,解决视频检索的内容缺失问题;然后 利用TF-IDF 技术衡量查询语句关键字的权值,并开放数据获得查询语句关键字的可视化特征和 语义信息,解决视频检索的语义隔阂问题;最后,将提出的基于视觉词袋的视频检索校准算法应 用于Internet Archive 数据集。实验结果表明,与传统的基于文本的视频检索方法相比,该方法的 平均检索结果相关度提高了15%。  相似文献   

16.
在基于语义的视频检索系统中,为了弥补视频底层特征与高层用户需求之间的差异,提出了时序概率超图模型。它将时间序列因素融入到模型的构建中,在此基础上提出了一种基于时序概率超图模型的视频多语义标注框架(TPH-VMLAF)。该框架结合视频时间相关性,通过使用基于时序概率超图的镜头多标签半监督分类学习算法对视频镜头进行多语义标注。标注过程中同时解决了已标注视频数据不足和多语义标注的问题。实验结果表明,该框架提高了标注的精确度,表现出了良好的性能。  相似文献   

17.
Semantic filtering and retrieval of multimedia content is crucial for efficient use of the multimedia data repositories. Video query by semantic keywords is one of the most difficult problems in multimedia data retrieval. The difficulty lies in the mapping between low-level video representation and high-level semantics. We therefore formulate the multimedia content access problem as a multimedia pattern recognition problem. We propose a probabilistic framework for semantic video indexing, which call support filtering and retrieval and facilitate efficient content-based access. To map low-level features to high-level semantics we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music etc. Semantic concepts in videos interact and to model this interaction explicitly, we propose a network of multijects (multinet). Using probabilistic models for six site multijects, rocks, sky, snow, water-body forestry/greenery and outdoor and using a Bayesian belief network as the multinet we demonstrate the application of this framework to semantic indexing. We demonstrate how detection performance can be significantly improved using the multinet to take interconceptual relationships into account. We also show how the multinet can fuse heterogeneous features to support detection based on inference and reasoning  相似文献   

18.
A fast and simple method for content based retrieval using the DC-pictures of H.264 coded video without full decompression is presented. Compressed domain retrieval is very desirable for content analysis and retrieval of compressed image and video. Even though, DC-pictures are among the most widely used compressed domain indexing and retrieval methods in pre H.264 coded videos, they are not generally used in the H.264 coded video. This is due to two main facts, first, the I-frame in the H.264 standard are spatially predicatively coded and second, the H.264 standard employs Integer Discrete Cosine Transform. In this paper we have applied color histogram indexing method on the DC-pictures derived from H.264 coded I-frames. Since the method is based on independent I-frame coded pictures, it can be used either for video analysis of H.264 coded videos, or image retrieval of the I-frame based coded images such as advanced image coding. The retrieval performance of the proposed algorithm is compared with that the fully decoded images. Simulation results indicate that the performance of the proposed method is very close to the fully decompressed image systems. Moreover the proposed method has much lower computational load.  相似文献   

19.
一种基于子镜头聚类的情节代表帧选取方法   总被引:1,自引:0,他引:1  
情节代表帧选取方法是视频语义分析和基于内容的视频检索的很重要的方法。代表帧的使用大大减少了视频索引的数据量,同时也为视频摘要和检索提供了一种快捷的方法。该文在子镜头的关键帧提取方法基础上,利用模糊C-均值聚类算法,实现了一种基于子镜头聚类的情节代表帧选取方法。实验证明该方法计算简单,可以较好地代表视频情节。  相似文献   

20.
In order to analyse surveillance video, we need to efficiently explore large datasets containing videos of walking humans. Effective analysis of such data relies on retrieval of video data which has been enriched using semantic annotations. A manual annotation process is time-consuming and prone to error due to subject bias however, at surveillance-image resolution, the human walk (their gait) can be analysed automatically. We explore the content-based retrieval of videos containing walking subjects, using semantic queries. We evaluate current research in gait biometrics, unique in its effectiveness at recognising people at a distance. We introduce a set of semantic traits discernible by humans at a distance, outlining their psychological validity. Working under the premise that similarity of the chosen gait signature implies similarity of certain semantic traits we perform a set of semantic retrieval experiments using popular Latent Semantic Analysis techniques. We perform experiments on a dataset of 2000 videos of people walking in laboratory conditions and achieve promising retrieval results for features such as Sex (mAP  =  14% above random), Age (mAP  =  10% above random) and Ethnicity (mAP  =  9% above random).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号