首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 572 毫秒
1.
Image auto-annotation which annotates images according to their semantic contents has become a research focus in computer vision, as it helps people to edit, retrieve and understand large image collections. In the last decades, researchers have proposed many approaches to solve this task and achieved remarkable performance on several standard image datasets. In this paper, we train neural networks using visual and semantic ranking loss to learn visual-semantic embedding. This embedding can be easily applied to nearest-neighbor based models to boost their performance on image auto-annotation. We test our method on four challenging image datasets, reporting comparison results with existing works. Experimental results show that our method can be applied to several state-of-the-art nearest-neighbor based models including TagProp and 2PKNN, and significantly improves their performance.  相似文献   

2.
目的 由于图像检索中存在着低层特征和高层语义之间的“语义鸿沟”,图像自动标注成为当前的关键性问题.为缩减语义鸿沟,提出了一种混合生成式和判别式模型的图像自动标注方法.方法 在生成式学习阶段,采用连续的概率潜在语义分析模型对图像进行建模,可得到相应的模型参数和每幅图像的主题分布.将这个主题分布作为每幅图像的中间表示向量,那么图像自动标注的问题就转化为一个基于多标记学习的分类问题.在判别式学习阶段,使用构造集群分类器链的方法对图像的中间表示向量进行学习,在建立分类器链的同时也集成了标注关键词之间的上下文信息,因而能够取得更高的标注精度和更好的检索效果.结果 在两个基准数据集上进行的实验表明,本文方法在Corel5k数据集上的平均精度、平均召回率分别达到0.28和0.32,在IAPR-TC12数据集上则达到0.29和0.18,其性能优于大多数当前先进的图像自动标注方法.此外,从精度—召回率曲线上看,本文方法也优于几种典型的具有代表性的标注方法.结论 提出了一种基于混合学习策略的图像自动标注方法,集成了生成式模型和判别式模型各自的优点,并在图像语义检索的任务中表现出良好的有效性和鲁棒性.本文方法和技术不仅能应用于图像检索和识别的领域,经过适当的改进之后也能在跨媒体检索和数据挖掘领域发挥重要作用.  相似文献   

3.
图像的自动标注是图像检索领域一项基础而又富有挑战性的任务。深度学习算法自提出以来在图像和文本识别领域取得了巨大的成功,是一种解决"语义鸿沟"问题的有效方法。图像标注问题可以分解为基于图像与标签相关关系的基本图像标注和基于标注词汇共生关系的标注改善两个过程。文中将基本图像标注问题视为一个多标记学习问题,图像的标签先验知识作为深度神经网络的监督信息。在得到基本标注词汇的基础上,利用原始图像标签词汇的依赖关系与先验分布改善了图像的标注结果。最后将所提出的改进的深度学习模型应用于Corel和ESP图像数据集,验证了该模型框架及所提出的解决方案的有效性。  相似文献   

4.
Zhang  Hongjiang  Chen  Zheng  Li  Mingjing  Su  Zhong 《World Wide Web》2003,6(2):131-155
A major bottleneck in content-based image retrieval (CBIR) systems or search engines is the large gap between low-level image features used to index images and high-level semantic contents of images. One solution to this bottleneck is to apply relevance feedback to refine the query or similarity measures in image search process. In this paper, we first address the key issues involved in relevance feedback of CBIR systems and present a brief overview of a set of commonly used relevance feedback algorithms. Almost all of the previously proposed methods fall well into such framework. We present a framework of relevance feedback and semantic learning in CBIR. In this framework, low-level features and keyword annotations are integrated in image retrieval and in feedback processes to improve the retrieval performance. We have also extended framework to a content-based web image search engine in which hosting web pages are used to collect relevant annotations for images and users' feedback logs are used to refine annotations. A prototype system has developed to evaluate our proposed schemes, and our experimental results indicated that our approach outperforms traditional CBIR system and relevance feedback approaches.  相似文献   

5.
Robotic advances and developments in sensors and acquisition systems facilitate the collection of survey data in remote and challenging scenarios. Semantic segmentation, which attempts to provide per‐pixel semantic labels, is an essential task when processing such data. Recent advances in deep learning approaches have boosted this task's performance. Unfortunately, these methods need large amounts of labeled data, which is usually a challenge in many domains. In many environmental monitoring instances, such as the coral reef example studied here, data labeling demands expert knowledge and is costly. Therefore, many data sets often present scarce and sparse image annotations or remain untouched in image libraries. This study proposes and validates an effective approach for learning semantic segmentation models from sparsely labeled data. Based on augmenting sparse annotations with the proposed adaptive superpixel segmentation propagation, we obtain similar results as if training with dense annotations, significantly reducing the labeling effort. We perform an in‐depth analysis of our labeling augmentation method as well as of different neural network architectures and loss functions for semantic segmentation. We demonstrate the effectiveness of our approach on publicly available data sets of different real domains, with the emphasis on underwater scenarios—specifically, coral reef semantic segmentation. We release new labeled data as well as an encoder trained on half a million coral reef images, which is shown to facilitate the generalization to new coral scenarios.  相似文献   

6.
7.
缩小图像低层视觉特征与高层语义之间的鸿沟,以提高图像语义自动标注的精度,是研究大规模图像数据管理的关键。提出一种融合多特征的深度学习图像自动标注方法,将图像视觉特征以不同权重组合成词包,根据输入输出变量优化深度信念网络,完成大规模图像数据语义自动标注。在通用Corel图像数据集上的实验表明,融合多特征的深度学习图像自动标注方法,考虑图像不同特征的影响,提高了图像自动标注的精度。  相似文献   

8.
视觉理解,如物体检测、语义和实例分割以及动作识别等,在人机交互和自动驾驶等领域中有着广泛的应用并发挥着至关重要的作用。近年来,基于全监督学习的深度视觉理解网络取得了显著的性能提升。然而,物体检测、语义和实例分割以及视频动作识别等任务的数据标注往往需要耗费大量的人力和时间成本,已成为限制其广泛应用的一个关键因素。弱监督学习作为一种降低数据标注成本的有效方式,有望对缓解这一问题提供可行的解决方案,因而获得了较多的关注。围绕视觉弱监督学习,本文将以物体检测、语义和实例分割以及动作识别为例综述国内外研究进展,并对其发展方向和应用前景加以讨论分析。在简单回顾通用弱监督学习模型,如多示例学习(multiple instance learning, MIL)和期望—最大化(expectation-maximization, EM)算法的基础上,针对物体检测和定位,从多示例学习、类注意力图机制等方面分别进行总结,并重点回顾了自训练和监督形式转换等方法;针对语义分割任务,根据不同粒度的弱监督形式,如边界框标注、图像级类别标注、线标注或点标注等,对语义分割研究进展进行总结分析,并主要回顾了基于图像级别类别...  相似文献   

9.
Multispectral pedestrian detection is an important functionality in various computer vision applications such as robot sensing, security surveillance, and autonomous driving. In this paper, our motivation is to automatically adapt a generic pedestrian detector trained in a visible source domain to a new multispectral target domain without any manual annotation efforts. For this purpose, we present an auto-annotation framework to iteratively label pedestrian instances in visible and thermal channels by leveraging the complementary information of multispectral data. A distinct target is temporally tracked through image sequences to generate more confident labels. The predicted pedestrians in two individual channels are merged through a label fusion scheme to generate multispectral pedestrian annotations. The obtained annotations are then fed to a two-stream region proposal network (TS-RPN) to learn the multispectral features on both visible and thermal images for robust pedestrian detection. Experimental results on KAIST multispectral dataset show that our proposed unsupervised approach using auto-annotated training data can achieve performance comparable to state-of-the-art deep neural networks (DNNs) based pedestrian detectors trained using manual labels.  相似文献   

10.
With fast growing number of images on photo-sharing websites such as Flickr and Picasa, it is in urgent need to develop scalable multi-label propagation algorithms for image indexing, management and retrieval. It has been well acknowledged that analysis in semantic region level may greatly improve image annotation performance compared to that in the holistic image level. However, region level approach increases the data scale to several orders of magnitude and proposes new challenges to most existing algorithms. In this work, we present a novel framework to effectively compute pairwise image similarity by accumulating the information of semantic image regions. Firstly, each image is encoded as Bag-of-Regions based on multiple image segmentations. Secondly, all image regions are separated into buckets with efficient locality-sensitive hashing (LSH) method, which guarantees high collision probabilities for similar regions. The k-nearest neighbors of each image and the corresponding similarities can be efficiently approximated with these indexed patches. Lastly, the sparse and region-aware image similarity matrix is fed into the multi-label extension of the entropic graph regularized semi-supervised learning algorithm [1]. In combination they naturally yield the capability of handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets validate the effectiveness and efficiency of our proposed framework for region-aware and scalable multi-label propagation.  相似文献   

11.

In the recent years the rapid growth of multimedia content makes the image retrieval a challenging research task. Content Based Image Retrieval (CBIR) is a technique which uses features of image to search user required image from large image dataset according to the user’s request in the form of query image. Effective feature representation and similarity measures are very crucial to the retrieval performance of CBIR. The key challenge has been attributed to the well known semantic gap issue. The machine learning has been actively investigated as possible solution to bridge the semantic gap. The recent success of deep learning inspires as a hope for bridging the semantic gap in CBIR. In this paper, we investigate deep learning approach used for CBIR tasks under varied settings from our empirical studies; we find some encouraging conclusions and insights for future research.

  相似文献   

12.
13.
14.
Zhang  Haofeng  Long  Yang  Shao  Ling 《Multimedia Tools and Applications》2019,78(17):24147-24165

Conventional zero-shot learning methods usually learn mapping functions to project image features into semantic embedding spaces, in which to find the nearest neighbors with predefined attributes. The predefined attributes including both seen classes and unseen classes are often annotated with high dimensional real values by experts, which costs a lot of human labors. In this paper, we propose a simple but effective method to reduce the annotation work. In our strategy, only unseen classes are needed to be annotated with several binary codes, which lead to only about one percent of original annotation work. In addition, we design a Visual Similes Annotation System (ViSAS) to annotate the unseen classes, and build both linear and deep mapping models and test them on four popular datasets, the experimental results show that our method can outperform the state-of-the-art methods in most circumstances.

  相似文献   

15.
Automatic semantic annotation of real-world web images   总被引:1,自引:0,他引:1  
As the number of web images is increasing at a rapid rate, searching them semantically presents a significant challenge. Many raw images are constantly uploaded with little meaningful direct annotations of semantic content, limiting their search and discovery. In this paper, we present a semantic annotation technique based on the use of image parametric dimensions and metadata. Using decision trees and rule induction, we develop a rule-based approach to formulate explicit annotations for images fully automatically, so that by the use of our method, semantic query such as " sunset by the sea in autumn in New York" can be answered and indexed purely by machine. Our system is evaluated quantitatively using more than 100,000 web images. Experimental results indicate that this approach is able to deliver highly competent performance, attaining good recall and precision rates of sometimes over 80%. This approach enables a new degree of semantic richness to be automatically associated with images which previously can only be performed manually.  相似文献   

16.
图像语义分析的多示例学习算法综述   总被引:1,自引:0,他引:1  
多示例学习(MIL)作为第4种机器学习框架,已在图像语义分析中得到了广泛应用.首先介绍MIL的起源、特点、相关概念和数据集;然后以图像语义分析为应用背景,对相关MIL算法进行详细综述,按照算法采用的学习机制对其进行分类,并重点分析了各类算法提出的思路和主要特点;最后,对MIL未来的研究方向作了探讨.  相似文献   

17.
针对自动图像标注中底层特征和高层语义之间的鸿沟问题,提出一种基于随机点积图的图像标注改善算法。该算法首先采用图像底层特征对图像候选标注词建立语义关系图,然后利用随机点积图对其进行随机重构,从而挖掘出训练图像集中丢失的语义关系,最后采用重启式随机游走算法,实现图像标注改善。该算法结合了图像的底层特征与高层语义,有效降低了图像集规模变小对标注的影响。在3种通用图像库上的实验证明了该算法能够有效改善图像标注,宏F值与微平均F值最高分别达到0.784与0.743。  相似文献   

18.
Recently, large scale image annotation datasets have been collected with millions of images and thousands of possible annotations. Latent variable models, or embedding methods, that simultaneously learn semantic representations of object labels and image representations can provide tractable solutions on such tasks. In this work, we are interested in jointly learning representations both for the objects in an image, and the parts of those objects, because such deeper semantic representations could bring a leap forward in image retrieval or browsing. Despite the size of these datasets, the amount of annotated data for objects and parts can be costly and may not be available. In this paper, we propose to bypass this cost with a method able to learn to jointly label objects and parts without requiring exhaustively labeled data. We design a model architecture that can be trained under a proxy supervision obtained by combining standard image annotation (from ImageNet) with semantic part-based within-label relations (from WordNet). The model itself is designed to model both object image to object label similarities, and object label to object part label similarities in a single joint system. Experiments conducted on our combined data and a precisely annotated evaluation set demonstrate the usefulness of our approach.  相似文献   

19.
One of the challenges in image retrieval is dealing with concepts which have no visual appearance in the images or are not used as keywords in their annotations. To address this problem, this paper proposes an unsupervised concept-based image indexing technique which uses a lexical ontology to extract semantic signatures called ‘semantic chromosomes’ from image annotations. A semantic chromosome is an information structure, which carries the semantic information of an image; it is the semantic signature of an image in a collection expressed through a set of semantic DNA (SDNA), each of them representing a concept. Central to the concept-based indexing technique discussed is the concept disambiguation algorithm developed, which identifies the most relevant ‘semantic DNA’ (SDNA) by measuring the semantic importance of each word/phrase in the annotation. The concept disambiguation algorithm is evaluated using crowdsourcing. The experiments show that the algorithm has better accuracy (79.4%) than the accuracy demonstrated by other unsupervised algorithms (73%) in the 2007 Semeval competition. It is also comparable with the accuracy achieved in the same competition by the supervised algorithms (82–83%) which contrary to the approach proposed in this paper have to be trained with large corpora. The approach is currently applied to the automated generation of mood boards used as an inspirational tool in concept design.  相似文献   

20.
With the explosive growth of multimedia data such as unlabeled images on the Web, image auto-annotation has been receiving increasing research interest. By automatically assigning a set of concepts to unlabeled images, image retrieval can be performed over labeled concepts. Most existing studies focus on the relations between images and concepts, and ignore the interdependencies between labeled concepts. In this paper, we propose a novel image auto-annotation model which utilizes the concept interdependency network to achieve better image auto-annotation. When a concept and its interdependent concepts have a high co-occurrence frequency in the training set, we consider boosting the chance of predicting this concept if there is strong visual evidence for the interdependent concepts in an unlabeled image. Additionally, we combine the global concept interdependency and the local concept interdependency to enhance the auto-annotation performance. Extensive experiments on Corel and IAPR datasets show that the proposed approach almost outperforms all existing methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号