首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Most semantic video search methods use text-keyword queries or example video clips and images. But such methods have limitations. To address the problems of example-based video search approaches and avoid the use of specialized models, we conduct semantic video searches using a reranking method that automatically reorders the initial text search results based on visual cues and associated context. We developed two general reranking methods that explore the recurrent visual patterns in many contexts, such as the returned images or video shots from initial text queries, and video stories from multiple channels.  相似文献   

2.
Models for motion-based video indexing and retrieval   总被引:9,自引:0,他引:9  
With the rapid proliferation of multimedia applications that require video data management, it is becoming more desirable to provide proper video data indexing techniques capable of representing the rich semantics in video data. In real-time applications, the need for efficient query processing is another reason for the use of such techniques. We present models that use the object motion information in order to characterize the events to allow subsequent retrieval. Algorithms for different spatiotemporal search cases in terms of spatial and temporal translation and scale invariance have been developed using various signal and image processing techniques. We have developed a prototype video search engine, PICTURESQUE (pictorial information and content transformation unified retrieval engine for spatiotemporal queries) to verify the proposed methods. Development of such technology will enable true multimedia search engines that will enable indexing and searching of the digital video data based on its true content.  相似文献   

3.
4.
5.
6.
A new approach to image retrieval is presented in the domain of museum and gallery image collections. Specialist algorithms, developed to address specific retrieval tasks, are combined with more conventional content and metadata retrieval approaches, and implemented within a distributed architecture to provide cross-collection searching and navigation in a seamless way. External systems can access the different collections using interoperability protocols and open standards, which were extended to accommodate content based as well as text based retrieval paradigms. After a brief overview of the complete system, we describe the novel design and evaluation of some of the specialist image analysis algorithms including a method for image retrieval based on sub-image queries, retrievals based on very low quality images and retrieval using canvas crack patterns. We show how effective retrieval results can be achieved by real end-users consisting of major museums and galleries, accessing the distributed but integrated digital collections.  相似文献   

7.
刘强  张文英  陈恩庆 《信号处理》2020,36(9):1422-1428
人体动作识别在人机交互、视频内容检索等领域有众多应用,是多媒体信息处理的重要研究方向。现有的大多数基于双流网络进行动作识别的方法都是在双流上使用相同的卷积网络去处理RGB与光流数据,缺乏对多模态信息的利用,容易造成网络冗余和相似性动作误判问题。近年来,深度视频也越来越多地用于动作识别,但是大多数方法只关注了深度视频中动作的空间信息,没有利用时间信息。为了解决这些问题,本文提出一种基于异构多流网络的多模态动作识别方法。该方法首先从深度视频中获取动作的时间特征表示,即深度光流数据,然后选择合适的异构网络来进行动作的时空特征提取与分类,最后对RGB数据、RGB中提取的光流、深度视频和深度光流识别结果进行多模态融合。通过在国际通用的大型动作识别数据集NTU RGB+D上进行的实验表明,所提方法的识别性能要优于现有较先进方法的性能。   相似文献   

8.
基于部分匹配的XML文本文档向量检索模型   总被引:3,自引:2,他引:1       下载免费PDF全文
吴劲  陈泽琳 《电子学报》2002,30(Z1):2169-2171
本文提出了部分匹配模式的XML文本文档向量检索模型,给出了XML文本文档树以及子文档树的向量表示和查询以及子查询的向量表示,并由此提出了查询中的祖先-后代关系映射到文档中的祖先-后代关系的部分匹配模式的检索方式,给出了基于此匹配处理过程的相似度计算,以判断文档与查询的相关程度.在构造的检索原型系统中的实验表明,该检索模型具有较好的查全率和查准率.  相似文献   

9.
10.
张天  靳聪  帖云  李小兵 《信号处理》2020,36(6):966-976
跨模态检索旨在通过以某一模态的数据为查询词,使人们能够得到与之相关的其他不同模态数据的检索结果的新型检索方法,这已成为多媒体和信息检索领域中一个有趣的研究问题。但是,目前大多数的研究成果集中于文本到图像、文本到视频以及歌词到音频等跨模态相关任务上,而关于如何为特定的视频通过跨模态检索得到合适的音乐这一跨模态的相关研究却很有限。此外,大多现有的关于视频和音频跨模态的研究依赖于元数据(例如关键字,标签或描述)。本文介绍了一种基于音频和视频这两种模态数据内容的跨模态检索的方法,该方法以新型的双流处理网络为框架,并通过神经网络学习两模态数据在公共子空间的特征表达,以计算音频和视频数据之间的相似度。本文所提出的方法的创新点主要在以下三个方面:1)在原有的提取各模态特征的模型基础上引入注意力机制,以此得到了视频和音频的特征选择模型,并筛选出相应的特征表达。2)使用了样本挖掘机制,剔除了无效样本,使得数据的训练更加高效。3)从计算模态间相似性和保持模态内结构不变两方面出发,设计了相应的损失函数进行模型的训练。且所提出的模型在VEGAS数据集和自建数据集上都取得了较高的准确度。   相似文献   

11.
To overcome the barrier of storage and computation, the hashing technique has been widely used for nearest neighbor search in multimedia retrieval applications recently. Particularly, cross-modal retrieval that searches across different modalities becomes an active but challenging problem. Although numerous of cross-modal hashing algorithms are proposed to yield compact binary codes, exhaustive search is impractical for large-scale datasets, and Hamming distance computation suffers inaccurate results. In this paper, we propose a novel search method that utilizes a probability-based index scheme over binary hash codes in cross-modal retrieval. The proposed indexing scheme employs a few binary bits from the hash code as the index code. We construct an inverted index table based on the index codes, and train a neural network for ranking and indexing to improve the retrieval accuracy. Experiments are performed on two benchmark datasets for retrieval across image and text modalities, where hash codes are generated and compared with several state-of-the-art cross-modal hashing methods. Results show the proposed method effectively boosts the performance on search accuracy, computation cost, and memory consumption in these datasets and hashing methods. The source code is available on https://github.com/msarawut/HCI.  相似文献   

12.
13.
The main idea of an interactive search is to gradually improve search quality of retrieval system via user interaction. While a large amount of work has been made in the past, most of the existing approaches typically require labeling effort for updating the query model. Unfortunately, it is time-consuming and tedious to label a large number of training examples. We aim to develop a novel text-driven cooperative learning scheme, which can offer users a quite natural query fashion and alleviate significantly the burden on users without compromising search performance. Starting with an advanced text-driven video search engine, a multi-view cooperative training strategy is proposed for learning from feedback data a refined ranking function. The main merit of proposed framework is its ability in mining training samples automatically from previous answer set and implicitly combining multiple modalities for effectively learning users’ query intent. Evaluation on TRECVID’ 06 video corpus shows that the proposed scheme with few training seeds achieves a comparable performance with classic interactive schemes.  相似文献   

14.
Video compression using mosaic representations   总被引:8,自引:0,他引:8  
We describe a technique for video compression based on a mosaic image representation obtained by aligning all frames of a video sequence, giving a panoramic view of the scene. We describe two types of mosaics, static and dynamic, which are suited for storage and transmission applications, respectively. In each case, the mosaic construction process aligns the images using a global parametric motion transformation, usually canceling the effect of camera motion on the dominant portion of the scene. The residual motions that are not compensated by the parametric motion are then analyzed for their significance and coded. The mosaic representation exploits large scale spatial and temporal correlations in image sequences. In many applications where there is significant camera motion (e.g., remote surveillance), it performs substantially better than traditional interframe compression methods and offers the potential for very low bit-rate transmission. In storage applications, such as digital libraries and video editing environments, it has the additional benefit of enabling direct access and retrieval of single frames at a time.  相似文献   

15.
The steep rise in music downloading over CD sales has created a major shift in the music industry away from physical media formats and towards online products and services. Music is one of the most popular types of online information and there are now hundreds of music streaming and download services operating on the World-Wide Web. Some of the music collections available are approaching the scale of ten million tracks and this has posed a major challenge for searching, retrieving, and organizing music content. Research efforts in music information retrieval have involved experts from music perception, cognition, musicology, engineering, and computer science engaged in truly interdisciplinary activity that has resulted in many proposed algorithmic and methodological solutions to music search using content-based methods. This paper outlines the problems of content-based music information retrieval and explores the state-of-the-art methods using audio cues (e.g., query by humming, audio fingerprinting, content-based music retrieval) and other cues (e.g., music notation and symbolic representation), and identifies some of the major challenges for the coming years.  相似文献   

16.
This paper presents the use of generative probabilistic models for multimedia retrieval. Gaussian mixture models are estimated to describe the visual content of images (or video) and are explored in different ways of using them for retrieval. So-called query generation (how likely is the query given the document model) and document generation (how likely is the document given the query model) approaches are considered and how both fit in a common probabilistic framework is explained. Query generation is shown to be theoretically superior, and confirmed experimentally on the Trecvid search task. However, it is found that in some cases a document generation approach gives better results. Especially in the cases where queries are narrow and visual results are combined with textual results, the document generation approach seems to be better at setting a visual context than the query generation variant.  相似文献   

17.
Neural networks for intelligent multimedia processing   总被引:6,自引:0,他引:6  
This paper reviews key attributes of neural processing essential to intelligent multimedia processing (IMP). The objective is to show why neural networks (NNs) are a core technology for the following multimedia functionalities: (1) efficient representations for audio/visual information, (2) detection and classification techniques, (3) fusion of multimodal signals, and (4) multimodal conversion and synchronization. It also demonstrates how the adaptive NN technology presents a unified solution to a broad spectrum of multimedia applications. As substantiating evidence, representative examples where NNs are successfully applied to IMP applications are highlighted. The examples cover a broad range, including image visualization, tracking of moving objects, image/video segmentation, texture classification, face-object detection/recognition, audio classification, multimodal recognition, and multimodal lip reading  相似文献   

18.
Learning multimodal dictionaries.   总被引:1,自引:0,他引:1  
Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving mutually related signals. The simultaneous processing of multimodal data can, in fact, reveal information that is otherwise hidden when considering the signals independently. However, in natural multimodal signals, the statistical dependencies between modalities are in general not obvious. Learning fundamental multimodal patterns could offer deep insight into the structure of such signals. In this paper, we present a novel model of multimodal signals based on their sparse decomposition over a dictionary of multimodal structures. An algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal is proposed, as well. The learning is defined in such a way that it can be accomplished by iteratively solving a generalized eigenvector problem, which makes the algorithm fast, flexible, and free of user-defined parameters. The proposed algorithm is applied to audiovisual sequences and it is able to discover underlying structures in the data. The detection of such audio-video patterns in audiovisual clips allows to effectively localize the sound source on the video in presence of substantial acoustic and visual distractors, outperforming state-of-the-art audiovisual localization algorithms.  相似文献   

19.
Video text information plays an important role in semantic-based video analysis, indexing and retrieval. Video texts are closely related to the content of a video. Usually, the fundamental steps of text-based video analysis, browsing and retrieval consist of video text detection, localization, tracking, segmentation and recognition. Video sequences are commonly stored in compressed formats where MPEG coding techniques are often adopted. In this paper, a unified framework for text detection, localization, and tracking in compressed videos using the discrete cosines transform (DCT) coefficients is proposed. A coarse to fine text detection method is used to find text blocks in terms of the block DCT texture intensity information. The DCT texture intensity of an 8×8 block of an intra-frame is approximately represented by seven AC coefficients. The candidate text block regions are further verified and refined. The text block region localization and tracking are carried out by virtue of the horizontal and vertical block texture intensity projection profiles. The appearing and disappearing frames of each text line are determined by the text tracking. The final experimental results show the effectiveness of the proposed methods.  相似文献   

20.
Segmentation of moving objects in image sequence: A review   总被引:6,自引:0,他引:6  
Segmentation of objects in image sequences is very important in many aspects of multimedia applications. In second-generation image/video coding, images are segmented into objects to achieve efficient compression by coding the contour and texture separately. As the purpose is to achieve high compression performance, the objects segmented may not be semantically meaningful to human observers. The more recent applications, such as content-based image/video retrieval and image/video composition, require that the segmented objects be semantically meaningful. Indeed, the recent multimedia standard MPEG-4 specifies that a video is composed of meaningful video objects. Although many segmentation techniques have been proposed in the literature, fully automatic segmentation tools for general applications are currently not achievable. This paper provides a review of this important and challenging area of segmentation of moving objects. We describe common approaches including temporal segmentation, spatial segmentation, and the combination of temporal-spatial segmentation. As an example, a complete segmentation scheme, which is an informative part of MPEG-4, is summarized.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号