首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Content-based indexing of multimedia databases   总被引:1,自引:0,他引:1  
Content-based retrieval of multimedia database calls for content-based indexing techniques. Different from conventional databases, where data items are represented by a set of attributes of elementary data types, multimedia objects in multimedia databases are represented by a collection of features; similarity of object contents depends on context and frame of reference; and features of objects are characterized by multimodal feature measures. These lead to great challenges for content-based indexing. On the other hand, there are special requirements on content-based indexing: to support visual browsing, similarity retrieval, and fuzzy retrieval, nodes of the index should represent certain meaningful categories. That is to say that certain semantics must be added when performing indexing. ContIndex, the context-based indexing technique presented in this paper, is proposed to meet these challenges and special requirements. The indexing tree is formally defined by adapting a classification-tree concept. Horizontal links among nodes in the same level enhance the flexibility of the index. A special neural-network model, called Learning based on Experiences and Perspectives (FEP), has been developed to create node categories by fusing multimodal feature measures. It brings into the index the capability of self-organizing nodes with respect to certain context and frames of reference. An icon image is generated for each intermediate node to facilitate visual browsing. Algorithms have been developed to support multimedia object archival and retrieval using Contlndex  相似文献   

2.
3.
基于颜色特征的视频数据库检索系统   总被引:2,自引:0,他引:2  
为了在视频数据库中提供有效的视频检索和浏览功能,必须建立高效的索引.由于视频数据具有层次性的结构,在镜头边界检测后,可以利用聚类方法按不同的相似性尺度对镜头关键帧进行处理,对视频数据建立索引.该系统采用颜色特征,使用Twin Comparison算法实现镜头检测和直方图平均法实现关键帧提取,对关键帧采用K均值聚类算法处理,建立视频数据库索引.实验结果表明该系统能较好地实现视频快速浏览和检索功能.  相似文献   

4.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

5.
6.
7.
Multimodal Video Indexing: A Review of the State-of-the-art   总被引:5,自引:7,他引:5  
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.  相似文献   

8.
Similarity search is important in information retrieval applications where objects are usually represented as vectors of high dimensionality. This leads to the increasing need for supporting the indexing of high-dimensional data. On the other hand, indexing structures based on space partitioning are powerless because of the well-known “curse of dimensionality”. Linear scan of the data with approximation is more efficient in the high-dimensional similarity search. However, approaches so far have concentrated on reducing I/O, and ignored the computation cost. For an expensive distance function such as L p norm with fractional p, the computation cost becomes the bottleneck. We propose a new technique to address expensive distance functions by “indexing the function” by pre-computing some key values of the function once. Then, the values are used to develop the upper/lower bounds of the distance between a data vector and the query vector. The technique is extremely efficient since it avoids most of the distance function computations; moreover, it does not involve any extra secondary storage because no index is constructed and stored. The efficiency is confirmed by cost analysis, as well as experiments on synthetic and real data.  相似文献   

9.
In order to analyse surveillance video, we need to efficiently explore large datasets containing videos of walking humans. Effective analysis of such data relies on retrieval of video data which has been enriched using semantic annotations. A manual annotation process is time-consuming and prone to error due to subject bias however, at surveillance-image resolution, the human walk (their gait) can be analysed automatically. We explore the content-based retrieval of videos containing walking subjects, using semantic queries. We evaluate current research in gait biometrics, unique in its effectiveness at recognising people at a distance. We introduce a set of semantic traits discernible by humans at a distance, outlining their psychological validity. Working under the premise that similarity of the chosen gait signature implies similarity of certain semantic traits we perform a set of semantic retrieval experiments using popular Latent Semantic Analysis techniques. We perform experiments on a dataset of 2000 videos of people walking in laboratory conditions and achieve promising retrieval results for features such as Sex (mAP  =  14% above random), Age (mAP  =  10% above random) and Ethnicity (mAP  =  9% above random).  相似文献   

10.
Indexing and Integrating Multiple Features for WWW Images   总被引:1,自引:0,他引:1  
In this paper, we present a novel indexing technique called Multi-scale Similarity Indexing (MSI) to index image's multi-features into a single one-dimensional structure. Both for text and visual feature spaces, the similarity between a point and a local partition's center in individual space is used as the indexing key, where similarity values in different features are distinguished by different scale. Then a single indexing tree can be built on these keys. Based on the property that relevant images have similar similarity values from the center of the same local partition in any feature space, certain number of irrelevant images can be fast pruned based on the triangle inequity on indexing keys. To remove the “dimensionality curse” existing in high dimensional structure, we propose a new technique called Local Bit Stream (LBS). LBS transforms image's text and visual feature representations into simple, uniform and effective bit stream (BS) representations based on local partition's center. Such BS representations are small in size and fast for comparison since only bit operation are involved. By comparing common bits existing in two BSs, most of irrelevant images can be immediately filtered. To effectively integrate multi-features, we also investigated the following evidence combination techniques—Certainty Factor, Dempster Shafer Theory, Compound Probability, and Linear Combination. Our extensive experiment showed that single one-dimensional index on multi-features improves multi-indices on multi-features greatly. Our LBS method outperforms sequential scan on high dimensional space by an order of magnitude. And Certainty Factor and Dempster Shafer Theory perform best in combining multiple similarities from corresponding multiple features.  相似文献   

11.
结合颜色和空间信息的图像检索方法   总被引:5,自引:1,他引:5  
本文采用了结合空间信息和图像颜色特征的颜色关联图方法。在特征度量时,依据所设计的评价试验,对四种距离度量方法进行了比较。将所选方法应用于西北大学可视化研究所开发的医学影像数据库系统中。实验表明,所述的图像检索方法具有较好的查全率与查准率。  相似文献   

12.
基于内容的视频检索为人们检索具有相似内容的视频数据提供了新的手段,而运动信息作为视频内容中的一种特有信息,是视频检索领域研究关键问题之一.通过对运动特征提取算法进行研究,设计并实现了一个实用的全局运动特征和局部运动特征提取模块.实验表明:该模块能够有效地分割全局运动与局部运动,提取的运动特征信息可作为基于内容的视频相似检索系统的重要索引.  相似文献   

13.
Subspace and similarity metric learning are important issues for image and video analysis in the scenarios of both computer vision and multimedia fields. Many real-world applications, such as image clustering/labeling and video indexing/retrieval, involve feature space dimensionality reduction as well as feature matching metric learning. However, the loss of information from dimensionality reduction may degrade the accuracy of similarity matching. In practice, such basic conflicting requirements for both feature representation efficiency and similarity matching accuracy need to be appropriately addressed. In the style of “Thinking Globally and Fitting Locally”, we develop Locally Embedded Analysis (LEA) based solutions for visual data clustering and retrieval. LEA reveals the essential low-dimensional manifold structure of the data by preserving the local nearest neighbor affinity, and allowing a linear subspace embedding through solving a graph embedded eigenvalue decomposition problem. A visual data clustering algorithm, called Locally Embedded Clustering (LEC), and a local similarity metric learning algorithm for robust video retrieval, called Locally Adaptive Retrieval (LAR), are both designed upon the LEA approach, with variations in local affinity graph modeling. For large size database applications, instead of learning a global metric, we localize the metric learning space with kd-tree partition to localities identified by the indexing process. Simulation results demonstrate the effective performance of proposed solutions in both accuracy and speed aspects.  相似文献   

14.
Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1) Semantics-sensitive video classification problem because of the semantic gap between low-level visual features and high-level semantic visual concepts; 2) Integrated video access problem because of the lack of efficient video database indexing, automatic video annotation, and concept-oriented summary organization techniques. In this paper, we have proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access. 1) A hierarchical semantics-sensitive video classifier is proposed to shorten the semantic gap. The hierarchical tree structure of the semantics-sensitive video classifier is derived from the domain-dependent concept hierarchy of video contents in a database. Relevance analysis is used for selecting the discriminating visual features with suitable importances. The Expectation-Maximization (EM) algorithm is also used to determine the classification rule for each visual concept node in the classifier. 2) A hierarchical video database indexing and summary presentation technique is proposed to support more effective video access over a large-scale database. The hierarchical tree structure of our video database indexing scheme is determined by the domain-dependent concept hierarchy which is also used for video classification. The presentation of visual summary is also integrated with the inherent hierarchical video database indexing tree structure. Integrating video access with efficient database indexing tree structure has provided great opportunity for supporting more powerful video search engines.  相似文献   

15.

Computer vision techniques enhanced by the advent of deep learning has become a quintessential part of our day-to-day life. The application of such computer vision techniques in image retrieval can be termed as query based image retrieval process. Conventional methods have limitations such as increased dimensionality, reduced accuracy, high time consumption, and dependence on indexing for retrieval. In order to overcome these limitations, this research work aims to develop a new image retrieval system by developing an image preprocessing mechanism via target prediction technique, which isolates object from the background. Further, a Micro-structure based Pattern Extraction (MPE) technique is implemented to extract the patterns from the preprocessed image, where the diagonal patterns are generated for increasing the accuracy of the retrieval process. Consequently, the Convolutional Neural Network (CNN) is utilized to reduce the dimensionality of the features, and the similarity learning approach is utilized to map the selected features with trained features based on the distance metric. The performance of the proposed system is evaluated by using various measures. Thereby, the efficiency of the proposed technique is ascertained by comparing it with the existing techniques.

  相似文献   

16.
This paper presents a framework for multimodal retrieval with relevance feedback based on genetic programming. In this supervised learning-to-rank framework, genetic programming is used for the discovery of effective combination functions of (multimodal) similarity measures using the information obtained throughout the user relevance feedback iterations. With these new functions, several similarity measures, including those extracted from different modalities (e.g., text, and content), are combined into one single measure that properly encodes the user preferences. This framework was instantiated for multimodal image retrieval using visual and textual features and was validated using two image collections, one from the Washington University and another from the ImageCLEF Photographic Retrieval Task. For this image retrieval instance several multimodal relevance feedback techniques were implemented and evaluated. The proposed approach has produced statistically significant better results for multimodal retrieval over single modality approaches and superior effectiveness when compared to the best submissions of the ImageCLEF Photographic Retrieval Task 2008.  相似文献   

17.
视频结构挖掘的概念及应用   总被引:3,自引:0,他引:3  
提出了一种视频结构挖掘的概念框架和视频结构挖掘系统框架,在概念框架中对视频结构挖掘相关概念给出了规范化的定义,视频结构挖掘框架包括的主要内容有视频基本结构挖掘、视频语法结构挖掘和视频语义结构挖掘。最后讨论了视频结构挖掘中发现的结构模式和知识的具体应用,包括指导视频的组织与管理、实现基于内容的个性视频推荐和改善视频摘要系统。  相似文献   

18.
李迎新  张明  陆鹏 《现代计算机》2007,(2):94-97,100
在基于图像内容的图像检索(CBIR)系统中,搜索引擎检索图像类似于按照相似标准来查询图像,它应该有足够快的速度并且有较高的检索准确率.索引用来提高系统响应,而相关反馈用于帮助提高检索准确率.在本文中,主要说明基于人感知的相似性度量,以及讨论综合相关反馈的索引方案.该索引方案通过分析特征熵而得出的主从键,而相关反馈是根据Mann-Whitnev检验而提出的,该检验通常用来识别来自同一搜索集中相关图像和不相关图像之间不同特征,并利用不同特征的特点提高检索性能.相关反馈方案针对两不同相似标准来执行,检验判定了这个方法的有效性.最后,把索引机制和相关反馈机制结合起来建立搜索引擎.  相似文献   

19.
20.
In content-based image retrieval (CBIR), relevant images are identified based on their similarities to query images. Most CBIR algorithms are hindered by the semantic gap between the low-level image features used for computing image similarity and the high-level semantic concepts conveyed in images. One way to reduce the semantic gap is to utilize the log data of users' feedback that has been collected by CBIR systems in history, which is also called “collaborative image retrieval.” In this paper, we present a novel metric learning approach, named “regularized metric learning,” for collaborative image retrieval, which learns a distance metric by exploring the correlation between low-level image features and the log data of users' relevance judgments. Compared to the previous research, a regularization mechanism is used in our algorithm to effectively prevent overfitting. Meanwhile, we formulate the proposed learning algorithm into a semidefinite programming problem, which can be solved very efficiently by existing software packages and is scalable to the size of log data. An extensive set of experiments has been conducted to show that the new algorithm can substantially improve the retrieval accuracy of a baseline CBIR system using Euclidean distance metric, even with a modest amount of log data. The experiment also indicates that the new algorithm is more effective and more efficient than two alternative algorithms, which exploit log data for image retrieval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号