首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于多模态信息挖掘融合的视频检索技术   总被引:1,自引:0,他引:1  
基于内容的多媒体检索特别是视频检索,由于多媒体数据本身具有复杂的语义,所以极大地提高了检索的难度.算法着眼于视频本身挖掘出充分的资源信息并且将这些信息加以融合来提高视频检索的性能.基于这种思想,提出一种多模态视频检索模型以及相应的手动式搜索和交互式搜索的算法方案.搜索策略在TRECVID视频检索比赛中取得了不错的成绩.  相似文献   

2.
Most existing content-based video retrieval (CBVR) systems are now amenable to support automatic low-level feature extraction, but they still have limited effectiveness from a user's perspective because of the semantic gap. Automatic video concept detection via semantic classification is one promising solution to bridge the semantic gap. To speed up SVM video classifier training in high-dimensional heterogeneous feature space, a novel multimodal boosting algorithm is proposed by incorporating feature hierarchy and boosting to reduce both the training cost and the size of training samples significantly. To avoid the inter-level error transmission problem, a novel hierarchical boosting scheme is proposed by incorporating concept ontology and multitask learning to boost hierarchical video classifier training through exploiting the strong correlations between the video concepts. To bridge the semantic gap between the available video concepts and the users' real needs, a novel hyperbolic visualization framework is seamlessly incorporated to enable intuitive query specification and evaluation by acquainting the users with a good global view of large-scale video collections. Our experiments in one specific domain of surgery education videos have also provided very convincing results.  相似文献   

3.
Video Annotation Based on Kernel Linear Neighborhood Propagation   总被引:1,自引:0,他引:1  
The insufficiency of labeled training data for representing the distribution of the entire dataset is a major obstacle in automatic semantic annotation of large-scale video database. Semi-supervised learning algorithms, which attempt to learn from both labeled and unlabeled data, are promising to solve this problem. In this paper, a novel graph-based semi-supervised learning method named kernel linear neighborhood propagation (KLNP) is proposed and applied to video annotation. This approach combines the consistency assumption, which is the basic assumption in semi-supervised learning, and the local linear embedding (LLE) method in a nonlinear kernel-mapped space. KLNP improves a recently proposed method linear neighborhood propagation (LNP) by tackling the limitation of its local linear assumption on the distribution of semantics. Experiments conducted on the TRECVID data set demonstrate that this approach outperforms other popular graph-based semi-supervised learning methods for video semantic annotation.  相似文献   

4.
5.
本文针对互联网视频检索技术的发展,阐述了目前主流视频搜索引擎的技术现状,分析了互联网视频检索的关键技术,特别是对于视频特征的提取技术。本文的创新点是提出了一种通用的基于内容的静态语义视频检索方法,该方法可以弥补基于文本视频检索的有关不足,并且在TRECVID的视频概念检索数据的静态语义概念中得到验证,运行稳定。  相似文献   

6.
7.
Automatic semantic concept detection in video is important for effective content-based video retrieval and mining and has gained great attention recently. In this paper, we propose a general post-filtering framework to enhance robustness and accuracy of semantic concept detection using association and temporal analysis for concept knowledge discovery. Co-occurrence of several semantic concepts could imply the presence of other concepts. We use association mining techniques to discover such inter-concept association relationships from annotations. With discovered concept association rules, we propose a strategy to combine associated concept classifiers to improve detection accuracy. In addition, because video is often visually smooth and semantically coherent, detection results from temporally adjacent shots could be used for the detection of the current shot. We propose temporal filter designs for inter-shot temporal dependency mining to further improve detection accuracy. Experiments on the TRECVID 2005 dataset show our post-filtering framework is both efficient and effective in improving the accuracy of semantic concept detection in video. Furthermore, it is easy to integrate our framework with existing classifiers to boost their performance.  相似文献   

8.
许源  薛向阳 《计算机科学》2006,33(11):134-138
准确提取视频高层语义特征,有助于更好地进行基于内容的视频检索。视频局部高层语义特征描述的是图像帧中的物体。考虑到物体本身以及物体所处的特定场景所具有的特点,我们提出一种将图像帧的局部信息和全局信息结合起来提取视频局部高层语义特征的算法。在TRECVID2005数据集上的实验结果表明,与单独基于局部或者单独基于全局的方法相比,此方法具有较好的性能。  相似文献   

9.
Active concept learning in image databases.   总被引:2,自引:0,他引:2  
Concept learning in content-based image retrieval systems is a challenging task. This paper presents an active concept learning approach based on the mixture model to deal with the two basic aspects of a database system: the changing (image insertion or removal) nature of a database and user queries. To achieve concept learning, we a) propose a new user directed semi-supervised expectation-maximization algorithm for mixture parameter estimation, and b) develop a novel model selection method based on Bayesian analysis that evaluates the consistency of hypothesized models with the available information. The analysis of exploitation versus exploration in the search space helps to find the optimal model efficiently. Our concept knowledge transduction approach is able to deal with the cases of image insertion and query images being outside the database. The system handles the situation where users may mislabel images during relevance feedback. Experimental results on Corel database show the efficacy of our active concept learning approach and the improvement in retrieval performance by concept transduction.  相似文献   

10.
Insufficiency of labeled training data is a major obstacle for automatic video annotation. Semi-supervised learning is an effective approach to this problem by leveraging a large amount of unlabeled data. However, existing semi-supervised learning algorithms have not demonstrated promising results in large-scale video annotation due to several difficulties, such as large variation of video content and intractable computational cost. In this paper, we propose a novel semi-supervised learning algorithm named semi-supervised kernel density estimation (SSKDE) which is developed based on kernel density estimation (KDE) approach. While only labeled data are utilized in classical KDE, in SSKDE both labeled and unlabeled data are leveraged to estimate class conditional probability densities based on an extended form of KDE. It is a non-parametric method, and it thus naturally avoids the model assumption problem that exists in many parametric semi-supervised methods. Meanwhile, it can be implemented with an efficient iterative solution process. So, this method is appropriate for video annotation. Furthermore, motivated by existing adaptive KDE approach, we propose an improved algorithm named semi-supervised adaptive kernel density estimation (SSAKDE). It employs local adaptive kernels rather than a fixed kernel, such that broader kernels can be applied in the regions with low density. In this way, more accurate density estimates can be obtained. Extensive experiments have demonstrated the effectiveness of the proposed methods.  相似文献   

11.
基于内容的通用视频检索系统框架设计   总被引:2,自引:0,他引:2  
设计了一个支持所有种类的基于内容视频检索方式的通用视频检索系统框架,提出了设计通用系统框架关键问题的解决方案:基于内容的视频分析和检索的组件设计;遵循MPEG7标准的内容索引表示;Web服务体系结构和技术。通用系统框架具有分布、互操作、开放、可扩展的优点。  相似文献   

12.
A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval   总被引:2,自引:0,他引:2  
Effective video retrieval is the result of interplay between interactive query selection, advanced visualization of results, and a goal-oriented human user. Traditional interactive video retrieval approaches emphasize paradigms, such as query-by-keyword and query-by-example, to aid the user in the search for relevant footage. However, recent results in automatic indexing indicate that query-by-concept is becoming a viable resource for interactive retrieval also. We propose in this paper a new video retrieval paradigm. The core of the paradigm is formed by first detecting a large lexicon of semantic concepts. From there, we combine query-by-concept, query-by-example, query-by-keyword, and user interaction into the MediaMill semantic video search engine. To measure the impact of increasing lexicon size on interactive video retrieval performance, we performed two experiments against the 2004 and 2005 NIST TRECVID benchmarks, using lexicons containing 32 and 101 concepts, respectively. The results suggest that from all factors that play a role in interactive retrieval, a large lexicon of semantic concepts matters most. Indeed, by exploiting large lexicons, many video search questions are solvable without using query-by-keyword and query-by-example. In addition, we show that the lexicon-driven search engine outperforms all state-of-the-art video retrieval systems in both TRECVID 2004 and 2005  相似文献   

13.
胡志军  徐勇 《计算机科学》2020,47(1):117-123
视频是携带信息量最大的媒体,随着抖音短视频等APP的兴起,网络以及数据库的视频数量急剧增加,人工标注的方法已经无法胜任视频检索的任务。视频检索通过提取视频帧的空间特征或者帧与帧之间的时间特征,使得用户能够更客观、更高效地进行视频查找与归类。文中概述了基于内容的视频检索算法,归纳总结了视频检索的一些经典算法,并总结了深度学习在基于内容的视频检索中的研究与应用,最后分析了深度学习在视频检索中的发展前景。  相似文献   

14.
Hashing methods aim to learn a set of hash functions which map the original features to compact binary codes with similarity preserving in the Hamming space. Hashing has proven a valuable tool for large-scale information retrieval. We propose a column generation based binary code learning framework for data-dependent hash function learning. Given a set of triplets that encode the pairwise similarity comparison information, our column generation based method learns hash functions that preserve the relative comparison relations within the large-margin learning framework. Our method iteratively learns the best hash functions during the column generation procedure. Existing hashing methods optimize over simple objectives such as the reconstruction error or graph Laplacian related loss functions, instead of the performance evaluation criteria of interest—multivariate performance measures such as the AUC and NDCG. Our column generation based method can be further generalized from the triplet loss to a general structured learning based framework that allows one to directly optimize multivariate performance measures. For optimizing general ranking measures, the resulting optimization problem can involve exponentially or infinitely many variables and constraints, which is more challenging than standard structured output learning. We use a combination of column generation and cutting-plane techniques to solve the optimization problem. To speed-up the training we further explore stage-wise training and propose to optimize a simplified NDCG loss for efficient inference. We demonstrate the generality of our method by applying it to ranking prediction and image retrieval, and show that it outperforms several state-of-the-art hashing methods.  相似文献   

15.
This paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of dissimilarity space that is adapted to the asymmetric classification problem, and in turn to the query-by-example and relevance feedback paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (ie TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a real-time framework.  相似文献   

16.
We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.  相似文献   

17.
This work addresses the development of a computational model of visual attention to perform the automatic summarization of digital videos from television archives. Although the television system represents one of the most fascinating media phenomena ever created, we still observe the absence of effective solutions for content-based information retrieval from video recordings of programs produced by this media universe. This fact relates to the high complexity of the content-based video retrieval problem, which involves several challenges, among which we may highlight the usual demand on video summaries to facilitate indexing, browsing and retrieval operations. To achieve this goal, we propose a new computational visual attention model, inspired on the human visual system and based on computer vision methods (face detection, motion estimation and saliency map computation), to estimate static video abstracts, that is, collections of salient images or key frames extracted from the original videos. Experimental results with videos from the Open Video Project show that our approach represents an effective solution to the problem of automatic video summarization, producing video summaries with similar quality to the ground-truth manually created by a group of 50 users.  相似文献   

18.
Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled training data in many real-world application areas, such as video annotation. As a significant factor of these algorithms, however, pair-wise similarity metric of samples has not been fully investigated. Specifically, for existing approaches, the estimation of pair-wise similarity between two samples relies on the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pair-wise similarity measure. Accordingly, in this paper, a novel framework for video annotation, called Joint Spatio-Temporal Correlation Learning (JSTCL) is proposed. This framework is characterized by simultaneously taking into account both the spatial and temporal property of video data to improve the estimation of pair-wise similarity. We apply the proposed framework to video annotation and report superior performance compared to key existing approaches over the benchmark TRECVID data set.  相似文献   

19.
Search and retrieval is gaining importance in the ink domain due to the increase in the availability of online handwritten data. However, the problem is challenging due to variations in handwriting between various writers, digitizers and writing conditions. In this paper, we propose a retrieval mechanism for online handwriting, which can handle different writing styles, specifically for Indian languages. The proposed approach provides a keyboard-based search interface that enables to search handwritten data from any platform, in addition to pen-based and example-based queries. One of the major advantages of this framework is that information retrieval techniques such as ranking relevance, detecting stopwords and controlling word forms can be extended to work with search and retrieval in the ink domain. The framework also allows cross-lingual document retrieval across Indian languages.  相似文献   

20.
Many computer vision and pattern recognition algorithms are very sensitive to the choice of an appropriate distance metric. Some recent research sought to address a variant of the conventional clustering problem called semi-supervised clustering, which performs clustering in the presence of some background knowledge or supervisory information expressed as pairwise similarity or dissimilarity constraints. However, existing metric learning methods for semi-supervised clustering mostly perform global metric learning through a linear transformation. In this paper, we propose a new metric learning method that performs nonlinear transformation globally but linear transformation locally. In particular, we formulate the learning problem as an optimization problem and present three methods for solving it. Through some toy data sets, we show empirically that our locally linear metric adaptation (LLMA) method can handle some difficult cases that cannot be handled satisfactorily by previous methods. We also demonstrate the effectiveness of our method on some UCI data sets. Besides applying LLMA to semi-supervised clustering, we have also used it to improve the performance of content-based image retrieval systems through metric learning. Experimental results based on two real-world image databases show that LLMA significantly outperforms other methods in boosting the image retrieval performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号