首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
Web image retrieval using majority-based ranking approach   总被引:1,自引:0,他引:1  
Web image retrieval has characteristics different from typical content-based image retrieval; web images have associated textual cues. However, a web image retrieval system often yields undesirable results, because it uses limited text information such as surrounding text, URLs, and image filenames. In this paper, we propose a new approach to retrieval, which uses the image content of retrieved results without relying on assistance from the user. Our basic hypothesis is that more popular images have a higher probability of being the ones that the user wishes to retrieve. According to this hypothesis, we propose a retrieval approach that is based on a majority of the images under consideration. We define four methods for finding the visual features of majority of images; (1) majority-first method, (2) centroid-of-all method, (3) centroid-of-top K method, and (4) centroid-of-largest-cluster method. In addition, we implement a graph/picture classifier for improving the effectiveness of web image retrieval. We evaluate the retrieval effectiveness of both our methods and conventional ones by using precision and recall graphs. Experimental results show that the proposed methods are more effective than conventional keyword-based retrieval methods.  相似文献   

2.
Visual vocabulary representation approach has been successfully applied to many multimedia and vision applications, including visual recognition, image retrieval, and scene modeling/categorization. The idea behind the visual vocabulary representation is that an image can be represented by visual words, a collection of local features of images. In this work, we will develop a new scheme for the construction of visual vocabulary based on the analysis of visual word contents. By considering the content homogeneity of visual words, we design a visual vocabulary which contains macro-sense and micro-sense visual words. The two types of visual words are appropriately further combined to describe an image effectively. We also apply the visual vocabulary to construct image retrieving and categorization systems. The performance evaluation for the two systems indicates that the proposed visual vocabulary achieves promising results.  相似文献   

3.
A scene graph provides a powerful intermediate knowledge structure for various visual tasks, including semantic image retrieval, image captioning, and visual question answering. In this paper, the task of predicting a scene graph for an image is formulated as two connected problems, ie, recognizing the relationship triplets, structured as <subject‐predicate‐object>, and constructing the scene graph from the recognized relationship triplets. For relationship triplet recognition, we develop a novel hierarchical recurrent neural network with visual attention mechanism. This model is composed of two attention‐based recurrent neural networks in a hierarchical organization. The first network generates a topic vector for each relationship triplet, whereas the second network predicts each word in that relationship triplet given the topic vector. This approach successfully captures the compositional structure and contextual dependency of an image and the relationship triplets describing its scene. For scene graph construction, an entity localization approach to determine the graph structure is presented with the assistance of available attention information. Then, the procedures for automatically converting the generated relationship triplets into a scene graph are clarified through an algorithm. Extensive experimental results on two widely used data sets verify the feasibility of the proposed approach.  相似文献   

4.
一种基于视觉单词的图像检索方法   总被引:1,自引:0,他引:1  
刁蒙蒙  张菁  卓力  隋磊 《测控技术》2012,31(5):17-20
基于内容的图像检索技术最主要的问题是图像的低层特征和高层语义之间存在着"语义鸿沟"。受文本内容分析的启发,有研究学者借鉴传统词典中用文本单词组合解释术语的思路,将图像视为视觉单词的组合,利用一系列视觉单词的组合来描述图像的语义内容。为此,利用SIFT进行图像的视觉单词特征提取,然后构建视觉单词库,最后实现了一个基于视觉单词的图像检索系统。实验结果表明,该方法在一定程度上提高了图像检索的查准率。  相似文献   

5.
6.
Image retrieval using multiple evidence ranking   总被引:5,自引:0,他引:5  
The World Wide Web is the largest publicly available image repository and a natural source of attention. An immediate consequence is that searching for images on the Web has become a current and important task. To search for images of interest, the most direct approach is keyword-based searching. However, since images on the Web are poorly labeled, direct application of standard keyword-based image searching techniques frequently yields poor results. We propose a comprehensive solution to this problem. In our approach, multiple sources of evidence related to the images are considered. To allow combining these distinct sources of evidence, we introduce an image retrieval model based on Bayesian belief networks. To evaluate our approach, we perform experiments on a reference collection composed of 54000 Web images. Our results indicate that retrieval using an image surrounding text passages is as effective as standard retrieval based on HTML tags. This is an interesting result because current image search engines in the Web usually do not take text passages into consideration. Most important, according to our results, the combination of information derived from text passages with information derived from HTML tags leads to improved retrieval, with relative gains in average precision figures of roughly 50 percent, when compared to the results obtained by the use of each source of evidence in isolation.  相似文献   

7.
近几年来,为了解决图像检索系统中由底层视觉特征和高层语义之间的差异所造成的检索困难,从信息捡索中引入了相关反馈技术。在过去几年中,它在该研究领域取得了一定的成功。文章提出了一种利用反馈信息建立“查询子空间”的检索模型,它将用户的语义查询进行基于视觉特征的分类,构造多个“查询子空间”,这些子空间拥有自身的查询模型和检索模型,最后的检索结果根据这多个“查询子空间”的检索结果得到。该模型具有较强的灵活性、扩展性,有效地利用了用户的反馈信息,动态地建立了底层视觉特征和高层语义之间的映射,能适应不同用户的查询。同时,将负反馈信息合理地融入到该模型中,提高了系统的检索效率。实验结果表明采用该检索模型的系统相比现有的检索系统性能有了较大提高。  相似文献   

8.
9.
《Pattern recognition》2014,47(2):705-720
We present word spatial arrangement (WSA), an approach to represent the spatial arrangement of visual words under the bag-of-visual-words model. It lies in a simple idea which encodes the relative position of visual words by splitting the image space into quadrants using each detected point as origin. WSA generates compact feature vectors and is flexible for being used for image retrieval and classification, for working with hard or soft assignment, requiring no pre/post processing for spatial verification. Experiments in the retrieval scenario show the superiority of WSA in relation to Spatial Pyramids. Experiments in the classification scenario show a reasonable compromise between those methods, with Spatial Pyramids generating larger feature vectors, while WSA provides adequate performance with much more compact features. As WSA encodes only the spatial information of visual words and not their frequency of occurrence, the results indicate the importance of such information for visual categorization.  相似文献   

10.
Typical content-based image retrieval solutions usually cannot achieve satisfactory performance due to the semantic gap challenge. With the popularity of social media applications, large amounts of social images associated with user tagging information are available, which can be leveraged to boost image retrieval. In this paper, we propose a sparse semantic metric learning (SSML) algorithm by discovering knowledge from these social media resources, and apply the learned metric to search relevant images for users. Different from the traditional metric learning approaches that use similar or dissimilar constraints over a homogeneous visual space, the proposed method exploits heterogeneous information from two views of images and formulates the learning problem with the following principles. The semantic structure in the text space is expected to be preserved for the transformed space. To prevent overfitting the noisy, incomplete, or subjective tagging information of images, we expect that the mapping space by the learned metric does not deviate from the original visual space. In addition, the metric is straightforward constrained to be row-wise sparse with the ?2,1-norm to suppress certain noisy or redundant visual feature dimensions. We present an iterative algorithm with proved convergence to solve the optimization problem. With the learned metric for image retrieval, we conduct extensive experiments on a real-world dataset and validate the effectiveness of our approach compared with other related work.  相似文献   

11.
Retrieving similar images based on its visual content is an important yet difficult problem. We propose in this paper a new method to improve the accuracy of content-based image retrieval systems. Typically, given a query image, existing retrieval methods return a ranked list based on the similarity scores between the query and individual images in the database. Our method goes further by relying on an analysis of the underlying connections among individual images in the database to improve this list. Initially, we consider each image in the database as a query and use an existing baseline method to search for its likely similar images. Then, the database is modeled as a graph where images are nodes and connections among possibly similar images are edges. Next, we introduce an algorithm to split this graph into stronger subgraphs, based on our notion of graph’s strength, so that images in each subgraph are expected to be truly similar to each other. We create for each subgraph a structure called integrated image which contains the visual features of all images in the subgraph. At query time, we compute the similarity scores not only between the query and individual database images but also between the query and the integrated images. The final similarity score of a database image is computed based on both its individual score and the score of the integrated image that it belongs to. This leads effectively to a re-ranking of the retrieved images. We evaluate our method on a common image retrieval benchmark and demonstrate a significant improvement over the traditional bag-of-words retrieval model.  相似文献   

12.
Scale-Invariant Visual Language Modeling for Object Categorization   总被引:2,自引:0,他引:2  
In recent years, ldquobag-of-wordsrdquo models, which treat an image as a collection of unordered visual words, have been widely applied in the multimedia and computer vision fields. However, their ignorance of the spatial structure among visual words makes them indiscriminative for objects with similar word frequencies but different word spatial distributions. In this paper, we propose a visual language modeling method (VLM), which incorporates the spatial context of the local appearance features into the statistical language model. To represent the object categories, models with different orders of statistical dependencies have been exploited. In addition, the multilayer extension to the VLM makes it more resistant to scale variations of objects. The model is effective and applicable to large scale image categorization. We train scale invariant visual language models based on the images which are grouped by Flickr tags, and use these models for object categorization. Experimental results show they achieve better performance than single layer visual language models and ldquobag-of-wordsrdquo models. They also achieve comparable performance with 2-D MHMM and SVM-based methods, while costing much less computational time.  相似文献   

13.
基于视觉特征与文本特征融合的图像问答已经成为自动问答的热点研究问题之一。现有的大部分模型都是通过注意力机制来挖掘图像和问题语句之间的关联关系,忽略了图像区域和问题词在同一模态之中以及不同视角的关联关系。针对该问题,提出一种基于多路语义图网络的图像自动问答模型(MSGN),从多个角度挖掘图像和问题之间的语义关联。MSGN利用图神经网络模型挖掘图像区域和问题词细粒度的模态内模态间的关联关系,进而提高答案预测的准确性。模型在公开的图像问答数据集上的实验结果表明,从多个角度挖掘图像和问题之间的语义关联可提高图像问题答案预测的性能。  相似文献   

14.
Ontologies have been intensively applied for improving multimedia search and retrieval by providing explicit meaning to visual content. Several multimedia ontologies have been recently proposed as knowledge models suitable for narrowing the well known semantic gap and for enabling the semantic interpretation of images. Since these ontologies have been created in different application contexts, establishing links between them, a task known as ontology matching, promises to fully unlock their potential in support of multimedia search and retrieval. This paper proposes and compares empirically two extensional ontology matching techniques applied to an important semantic image retrieval issue: automatically associating common-sense knowledge to multimedia concepts. First, we extend a previously introduced textual concept matching approach to use both textual and visual representation of images. In addition, a novel matching technique based on a multi-modal graph is proposed. We argue that the textual and visual modalities have to be seen as complementary rather than as exclusive sources of extensional information in order to improve the efficiency of the application of an ontology matching approach in the multimedia domain. An experimental evaluation is included in the paper.  相似文献   

15.
16.
With the rapid development of the Internet, recent years have seen the explosive growth of social media. This brings great challenges in performing efficient and accurate image retrieval on a large scale. Recent work shows that using hashing methods to embed high-dimensional image features and tag information into Hamming space provides a powerful way to index large collections of social images. By learning hash codes through a spectral graph partitioning algorithm, spectral hashing(SH) has shown promising performance among various hashing approaches. However, it is incomplete to model the relations among images only by pairwise simple graphs which ignore the relationship in a higher order. In this paper, we utilize a probabilistic hypergraph model to learn hash codes for social image retrieval. A probabilistic hypergraph model offers a higher order repre-sentation among social images by connecting more than two images in one hyperedge. Unlike a normal hypergraph model, a probabilistic hypergraph model considers not only the grouping information, but also the similarities between vertices in hy-peredges. Experiments on Flickr image datasets verify the performance of our proposed approach.  相似文献   

17.
顾文娇  张化祥 《计算机工程》2014,(6):238-240,246
当前存在的图像检索大多是基于内容的检索,为提高检索的准确率,通过整合文本及视觉信息,提出一种自动将文本查询转化为可视化表示的方法,实现基于跨媒体字典的图像检索。采用标注图像集挖掘文本和图像间的关系,训练建立一个类似于双语字典的跨媒体字典,自动将文本查询转化为视觉查询,分别进行基于文本和基于视觉的图像检索,将2种方法检索到的图像合并作为最终检索结果。实验结果表明,该方法能有效地提高图像的查准率。  相似文献   

18.
林泽琦  邹艳珍  赵俊峰  曹英魁  谢冰 《软件学报》2019,30(12):3714-3729
自然语言文本形式的文档是软件项目的重要组成部分.如何帮助开发者在大量文档中进行高效、准确的信息定位,是软件复用领域中的一个重要研究问题.提出了一种基于代码结构知识的软件文档语义搜索方法.该方法从软件项目的源代码中解析出代码结构图,并以此作为领域特定的知识来帮助机器理解自然语言文本的语义.这一语义信息与信息检索技术相结合,从而实现了对软件文档的语义检索.在StackOverflow问答文档数据集上的实验表明,与多种文本检索方法相比,该方法在平均准确率(mean average precision,简称MAP)上可以取得至少13.77%的提升.  相似文献   

19.
20.
Searching for relevant images given a query term is an important task in nowadays large-scale community databases. The image ranking approach presented in this work represents an image collection as a graph that is built using a multimodal similarity measure based on visual features and user tags. We perform a random walk on this graph to find the most common images. Further we discuss several scalability issues of the proposed approach and show how in this framework queries can be answered fast. Experimental results validate the effectiveness of the presented algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号