期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吴开兴闫丽颖王欢《微计算机信息》2006,22(18):274-276

本文对基于内容的音频检索提出了一种分级方法,第一级:用HMM对音频事件的统计特性建模;第二级:用SVM结合一些音频事件对特定语义场景建模,完成对语义场景的检索。实验证明,HMM和SVM的结合对音频语义级场景的检索达到比较理想的效果。相似文献

2.

视频实时评论的深度语义表征方法

涂荣成毛先领孔伟杰蔡成飞赵文哲王红法黄河燕《计算机研究与发展》2023,56(9):2169-2179

视频-文本检索作为一项被广泛应用于现实生活中的多模态检索技术受到越来越多的研究者的关注. 近来, 大部分视频文本工作通过利用大规模预训练模型中所学到的视觉与语言之间的匹配关系来提升文本视频间跨模态检索效果. 然而, 这些方法忽略了视频、文本数据都是由一个个事件组合而成. 倘若能捕捉视频事件与文本事件之间的细粒度相似性关系, 将能帮助模型计算出更准确的文本与视频之间的语义相似性关系, 进而提升文本视频间跨模态检索效果. 因此, 提出了一种基于CLIP生成多事件表示的视频文本检索方法(CLIP based multi-event representation generation for video-text retrieval, CLIPMERG). 首先, 通过利用大规模图文预训练模型CLIP的视频编码器(ViT)以及文本编码器(Tansformer)分别将视频、文本数据转换成视频帧token序列以及文本的单词token序列;然后, 通过视频事件生成器(文本事件生成器)将视频帧token序列(单词token序列)转换成k个视频事件表示(k个文本事件表示);最后, 通过挖掘视频事件表示与文本事件表示之间的细粒度关系以定义视频、文本间的语义相似性关系. 在3个常用的公开视频文本检索数据集MSR-VTT, DiDeMo, LSMDC上的实验结果表明所提的CLIPMERG优于现有的视频文本检索方法.

相似文献

3.

A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video 总被引：5，自引：0，他引：5

Changsheng Xu Jinjun Wang Hanqing Lu Yifan Zhang 《Multimedia, IEEE Transactions on》2008,10(3):421-436

Sports video annotation is important for sports video semantic analysis such as event detection and personalization. In this paper, we propose a novel approach for sports video semantic annotation and personalized retrieval. Different from the state of the art sports video analysis methods which heavily rely on audio/visual features, the proposed approach incorporates web-casting text into sports video analysis. Compared with previous approaches, the contributions of our approach include the following. 1) The event detection accuracy is significantly improved due to the incorporation of web-casting text analysis. 2) The proposed approach is able to detect exact event boundary and extract event semantics that are very difficult or impossible to be handled by previous approaches. 3) The proposed method is able to create personalized summary from both general and specific point of view related to particular game, event, player or team according to user's preference. We present the framework of our approach and details of text analysis, video analysis, text/video alignment, and personalized retrieval. The experimental results on event boundary detection in sports video are encouraging and comparable to the manually selected events. The evaluation on personalized retrieval is effective in helping meet users' expectations. 相似文献

4.

Context-based environmental audio event recognition for scene understanding

Tong Lu Gongyou Wang Feng Su 《Multimedia Systems》2015,21(5):507-524

Automatic audio content recognition has attracted an increasing attention for developing multimedia systems, for which the most popular approaches combine frame-based features with statistic models or discriminative classifiers. The existing methods are effective for clean single-source event detection but may not perform well for unstructured environmental sounds, which have a broad noise-like flat spectrum and a diverse variety of compositions. We present an automatic acoustic scene understanding framework that detects audio events through two hierarchies, acoustic scene recognition and audio event recognition, in which the former is preceded by following dominant audio sources and in turn helps infer non-dominant audio events within the same scene through modeling their occurrence correlations. On the scene recognition hierarchy, we perform adaptive segmentation and feature extraction for every input acoustic scene stream through Eigen-audiospace and an optimized feature subspace, respectively. After filtering background, scene streams are recognized by modeling the observation density of dominant features using a two-level hidden Markov model. On the audio event recognition hierarchy, scene knowledge is characterized by an audio context model that essentially describes the occurrence correlations of dominant and non-dominant audio events within this scene. Monte Carlo integration and gradient descent techniques are employed to maximize the likelihood and correctly tag each audio event. To the best of our knowledge, this is the first work that models event correlations as scene context for robust audio event detection from complex and noisy environments. Note that according to the recent report, the mean accuracy for the acoustic scene classification task by human listeners is only around 71 % on the data collected in office environments from the DCASE dataset. None of the existing methods performs well on all scene categories and the average accuracy of the best performances of the recent 11 methods is 53.8 %. The proposed method averagely achieves an accuracy of 62.3 % on the same dataset. Additionally, we create a 10-CASE dataset by manually collecting 5,250 audio clips of 10 scene types and 21 event categories. Our experimental results on 10-CASE show that the proposed method averagely achieves the enhanced performance of 78.3 %, and the average accuracy of audio event recognition can be effectively improved by capturing dominant audio sources and reasoning non-dominant events from the dominant ones through acoustic context modeling. In the future work, exploring the interactions between acoustic scene recognition and audio event detection, and incorporating other modalities to improve the accuracy are required to further advance the proposed framework. 相似文献

5.

ZemPod: A semantic web approach to podcasting

scar Celma Yves Raimond 《Journal of Web Semantics》2008,6(2):162-169

相似文献

6.

Semantic representation of multimedia content: Knowledge representation and semantic indexing

Phivos Mylonas Thanos Athanasiadis Manolis Wallace Yannis Avrithis Stefanos Kollias 《Multimedia Tools and Applications》2008,39(3):293-327

In this paper we present a framework for unified, personalized access to heterogeneous multimedia content in distributed repositories. Focusing on semantic analysis of multimedia documents, metadata, user queries and user profiles, it contributes to the bridging of the gap between the semantic nature of user queries and raw multimedia documents. The proposed approach utilizes as input visual content analysis results, as well as analyzes and exploits associated textual annotation, in order to extract the underlying semantics, construct a semantic index and classify documents to topics, based on a unified knowledge and semantics representation model. It may then accept user queries, and, carrying out semantic interpretation and expansion, retrieve documents from the index and rank them according to user preferences, similarly to text retrieval. All processes are based on a novel semantic processing methodology, employing fuzzy algebra and principles of taxonomic knowledge representation. The first part of this work presented in this paper deals with data and knowledge models, manipulation of multimedia content annotations and semantic indexing, while the second part will continue on the use of the extracted semantic information for personalized retrieval.

Stefanos KolliasEmail:

相似文献

7.

Segmentation, Indexing, and Retrieval for Environmental and Natural Sounds

Wichern G. Xue J. Thornburg H. Mechtley B. Spanias A. 《IEEE transactions on audio, speech, and language processing》2010,18(3):688-707

We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions. 相似文献

8.

基于语义模型的企业数据检索*

董小峰张树生赵寒周竞涛冯赟田占强《计算机应用研究》2006,23(12):217-219

为了支持企业在决策时从企业数据中通过检索获得有意义的数据,提出了基于语义模型的语义检索方法。该方法首先基于概念树描述语义模型,通过概念映射将数据源与语义模型进行语义关联。在此基础上,建立语义模型和支持描述逻辑推理的知识模型之间的映射,通过调用描述逻辑推理机完成语义检索,检索结果再通过语义模型映射对应数据源信息,最终返回语义一致益于决策的数据视图。相似文献

9.

Modeling Content for Semantic-Level Querying of Multimedia 总被引：2，自引：0，他引：2

Harry W. Agius Marios C. Angelides 《Multimedia Tools and Applications》2001,15(1):5-37

Many semantic content-based models have been developed for modeling video and audio in order to enable information retrieval based on semantic content. The level of querying of the media depends upon the semantic aspects modeled. This paper proposes a semantic content-based model for semantic-level querying that makes full use of the explicit media structure, objects, spatial relationships between objects, events and actions involving objects, temporal relationships between events and actions, and integration between syntactic and semantic information. 相似文献

10.

A novel approach for semantic event extraction from sports webcast text

Chun-Min Chen Ling-Hwei Chen 《Multimedia Tools and Applications》2014,71(3):1937-1952

相似文献

11.

Exploring context and content links in social media: a latent space method

Qi GJ Aggarwal C Tian Q Ji H Huang TS 《IEEE transactions on pattern analysis and machine intelligence》2012,34(5):850-862

Social media networks contain both content and context-specific information. Most existing methods work with either of the two for the purpose of multimedia mining and retrieval. In reality, both content and context information are rich sources of information for mining, and the full power of mining and processing algorithms can be realized only with the use of a combination of the two. This paper proposes a new algorithm which mines both context and content links in social media networks to discover the underlying latent semantic space. This mapping of the multimedia objects into latent feature vectors enables the use of any off-the-shelf multimedia retrieval algorithms. Compared to the state-of-the-art latent methods in multimedia analysis, this algorithm effectively solves the problem of sparse context links by mining the geometric structure underlying the content links between multimedia objects. Specifically for multimedia annotation, we show that an effective algorithm can be developed to directly construct annotation models by simultaneously leveraging both context and content information based on latent structure between correlated semantic concepts. We conduct experiments on the Flickr data set, which contains user tags linked with images. We illustrate the advantages of our approach over the state-of-the-art multimedia retrieval techniques. 相似文献

12.

Audio-visual sports highlights extraction using Coupled Hidden Markov Models

Ziyou Xiong 《Pattern Analysis & Applications》2005,8(1-2):62-71

We present our studies on the application of Coupled Hidden Markov Models(CHMMs) to sports highlights extraction from broadcast video using both audio and video information. First, we generate audio labels using audio classification via Gaussian mixture models, and video labels using quantization of the average motion vector magnitudes. Then, we model sports highlights using discrete-observations CHMMs on audio and video labels classified from a large training set of broadcast sports highlights. Our experimental results on unseen golf and soccer content show that CHMMs outperform Hidden Markov Models(HMMs) trained on audio-only or video-only observations. Next, we study how the coupling between the two single-modality HMMs offers improvement on modelling capability by making refinements on the states of the models. We also show that the number of states optimized in this fashion also gives better classification results than other number of states. We conclude that CHMMs provide a promising tool for information fusion techniques in the sports domain for audio-visual event detection and analysis. 相似文献

13.

A framework for flexible summarization of racquet sports video using multiple modalities

《Computer Vision and Image Understanding》2009,113(3):415-424

While most existing sports video research focuses on detecting event from soccer and baseball etc., little work has been contributed to flexible content summarization on racquet sports video, e.g. tennis, table tennis etc. By taking advantages of the periodicity of video shot content and audio keywords in the racquet sports video, we propose a novel flexible video content summarization framework. Our approach combines the structure event detection method with the highlight ranking algorithm. Firstly, unsupervised shot clustering and supervised audio classification are performed to obtain the visual and audio mid-level patterns respectively. Then, a temporal voting scheme for structure event detection is proposed by utilizing the correspondence between audio and video content. Finally, by using the affective features extracted from the detected events, a linear highlight model is adopted to rank the detected events in terms of their exciting degrees. Experimental results show that the proposed approach is effective. 相似文献

14.

一种基于文档内容的语义标注方法

李维勇《微计算机信息》2011,(1):298-300

互联网上存在海量数据,如何在大量的信息中查找到有用信息就变成了一个至关重要的问题。语义网为解决这一问题带来了曙光。然而当今网络现状与语义网之间存在巨大差距,即海量非结构化的页面内容难直接转化为语义的知识。提出了一种基于文档内容的语义标注方法,利用本体所表达的语义环境,即本体知识相关词汇及其所处的语义上下文环境在文档中出现频率,实现对文档的语义标注。实验显示方法取得良好的效果,但受本体知识质量和标注文档质量两个因素影响较大。相似文献

15.

A new sketch-based 3D model retrieval approach by using global and local features

《Graphical Models》2014,76(3):128-139

相似文献

16.

语义视频检索综述 总被引：4，自引：1，他引：4

魏维游静刘凤玉许满武《计算机科学》2006,33(2):1-7

视频内容检索是多媒体应用的一个活跃研究方向,现有的内容检索技术大多是基于低层次特征的。这些非语义的低层特征难以理解,与人思维中的高层语义概念相差甚远,严重影响视频内容检索系统的易用性。低层特征和高层语义概念间的语义鸿沟很难逾越。如何跨越语义鸿沟,用语义概念检索视频内容是目前基于内容视频检索最具挑战性的研究方向。本文介绍语义视频检索出现的背景,分析语义鸿沟出现的原因,对现有尝试跨越语义鸿沟的主要方法进行综述;评述了相关技术的优缺点,探讨了各方法将来可能的研究发展方向以及视频语义检索近期、长期可能的技术突破点。相似文献

17.

Highlights modeling and detection in sports videos

M.?Bertini Email author A.?Del Bimbo W.?Nunziati 《Pattern Analysis & Applications》2004,7(4):411-421

Automatic annotation of semantic events allows effective retrieval of video content. In this work, we present solutions for highlights detection in sports videos. This application is particularly interesting for broadcasters, since they extensively use manual annotation to select interesting highlights that are edited to create new programmes. The proposed approach exploits the typical structure of a wide class of sports videos, namely, those related to sports which are played in delimited venues with playfields of well known geometry, like soccer, basketball, swimming, track and field disciplines, and so on. For this class of sports, a modeling scheme based on a limited set of visual cues and on finite state machines (FSM) that encode the temporal evolution of highlights is presented. Algorithms for model checking and for visual cues estimation are discussed, as well as applications of the representation to different sport domains. 相似文献

18.

跨媒体深层细粒度关联学习方法

下载免费PDF全文

卓昀侃綦金玮彭宇新《软件学报》2019,30(4):884-895

随着互联网与多媒体技术的迅猛发展，网络数据的呈现形式由单一文本扩展到包含图像、视频、文本、音频和3D模型等多种媒体，使得跨媒体检索成为信息检索的新趋势.然而，"异构鸿沟"问题导致不同媒体的数据表征不一致，难以直接进行相似性度量，因此，多种媒体之间的交叉检索面临着巨大挑战.随着深度学习的兴起，利用深度神经网络模型的非线性建模能力有望突破跨媒体信息表示的壁垒，但现有基于深度学习的跨媒体检索方法一般仅考虑图像和文本两种媒体数据之间的成对关联，难以实现更多种媒体的交叉检索.针对上述问题，提出了跨媒体深层细粒度关联学习方法，支持多达5种媒体类型数据（图像、视频、文本、音频和3D模型）的交叉检索.首先，提出了跨媒体循环神经网络，通过联合建模多达5种媒体类型数据的细粒度信息，充分挖掘不同媒体内部的细节信息以及上下文关联.然后，提出了跨媒体联合关联损失函数，通过将分布对齐和语义对齐相结合，更加准确地挖掘媒体内和媒体间的细粒度跨媒体关联，同时利用语义类别信息增强关联学习过程的语义辨识能力，提高跨媒体检索的准确率.在两个包含5种媒体的跨媒体数据集PKU XMedia和PKU XMediaNet上与现有方法进行实验对比，实验结果表明了所提方法的有效性. 相似文献

19.

HMM based soccer video event detection using enhanced mid-level semantic

Xueming Qian Huan Wang Guizhong Liu Xingsong Hou 《Multimedia Tools and Applications》2012,60(1):233-255

Highlight detection is a fundamental step in semantics based video retrieval and personalized sports video browsing. In this paper, an effective hidden Markov models (HMMs) based soccer video event detection method based on a hierarchical video analysis framework is proposed. Soccer video shots are classified into four coarse mid-level semantics: global, median, close-up and audience. Global and local motion information is utilized for the refinement of coarse mid-level semantics. Sequential soccer video is segmented into event clips. Both the temporal transitions of the mid-level semantics and the overall features of an event clip are fused using HMMs to determine the type of event. Highlight detection performance of dynamic Bayesian networks (DBN), conditional random fields (CRF) and the proposed HMM based approach are compared. The average F-score of our highlights (including goal, shoot, foul and placed kick) detection approach is 82.92%, which outperforms that of DBN and CRF by 9.85% and 11.12% respectively. The effects of number of hidden states, overall features, and the refinement of mid-level semantics on the event detection performance are also discussed. 相似文献

20.

Efficient audio-driven multimedia indexing through similarity-based speech / music discrimination

Nikolaos Tsipas Lazaros Vrysis Charalampos Dimoulas George Papanikolaou 《Multimedia Tools and Applications》2017,76(24):25603-25621

In this paper, an audio-driven algorithm for the detection of speech and music events in multimedia content is introduced. The proposed approach is based on the hypothesis that short-time frame-level discrimination performance can be enhanced by identifying transition points between longer, semantically homogeneous segments of audio. In this context, a two-step segmentation approach is employed in order to initially identify transition points between the homogeneous regions and subsequently classify the derived segments using a supervised binary classifier. The transition point detection mechanism is based on the analysis and composition of multiple self-similarity matrices, generated using different audio feature sets. The implemented technique aims at discriminating events focusing on transition point detection with high temporal resolution, a target that is also reflected in the adopted assessment methodology. Thereafter, multimedia indexing can be efficiently deployed (for both audio and video sequences), incorporating the processes of high resolution temporal segmentation and semantic annotation extraction. The system is evaluated against three publicly available datasets and experimental results are presented in comparison with existing implementations. The proposed algorithm is provided as an open source software package in order to support reproducible research and encourage collaboration in the field. 相似文献