首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Semantic gap has become a bottleneck of content-based image retrieval in recent years. In order to bridge the gap and improve the retrieval performance, automatic image annotation has emerged as a crucial problem. In this paper, a hybrid approach is proposed to learn the semantic concepts of images automatically. Firstly, we present continuous probabilistic latent semantic analysis (PLSA) and derive its corresponding Expectation–Maximization (EM) algorithm. Continuous PLSA assumes that elements are sampled from a multivariate Gaussian distribution given a latent aspect, instead of a multinomial one in traditional PLSA. Furthermore, we propose a hybrid framework which employs continuous PLSA to model visual features of images in generative learning stage and uses ensembles of classifier chains to classify the multi-label data in discriminative learning stage. Therefore, the framework can learn the correlations between features as well as the correlations between words. Since the hybrid approach combines the advantages of generative and discriminative learning, it can predict semantic annotation precisely for unseen images. Finally, we conduct the experiments on three baseline datasets and the results show that our approach outperforms many state-of-the-art approaches.  相似文献   

2.
This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.  相似文献   

3.
目的 由于图像检索中存在着低层特征和高层语义之间的“语义鸿沟”,图像自动标注成为当前的关键性问题.为缩减语义鸿沟,提出了一种混合生成式和判别式模型的图像自动标注方法.方法 在生成式学习阶段,采用连续的概率潜在语义分析模型对图像进行建模,可得到相应的模型参数和每幅图像的主题分布.将这个主题分布作为每幅图像的中间表示向量,那么图像自动标注的问题就转化为一个基于多标记学习的分类问题.在判别式学习阶段,使用构造集群分类器链的方法对图像的中间表示向量进行学习,在建立分类器链的同时也集成了标注关键词之间的上下文信息,因而能够取得更高的标注精度和更好的检索效果.结果 在两个基准数据集上进行的实验表明,本文方法在Corel5k数据集上的平均精度、平均召回率分别达到0.28和0.32,在IAPR-TC12数据集上则达到0.29和0.18,其性能优于大多数当前先进的图像自动标注方法.此外,从精度—召回率曲线上看,本文方法也优于几种典型的具有代表性的标注方法.结论 提出了一种基于混合学习策略的图像自动标注方法,集成了生成式模型和判别式模型各自的优点,并在图像语义检索的任务中表现出良好的有效性和鲁棒性.本文方法和技术不仅能应用于图像检索和识别的领域,经过适当的改进之后也能在跨媒体检索和数据挖掘领域发挥重要作用.  相似文献   

4.
5.
由于具有低存储成本、高效检索、低标注成本等方面的优势,无监督的哈希技术已经引起了学术界越来越多的关注,并且已经广泛地应用到大规模数据库检索问题中.先前的无监督方法大部分依靠数据集本身的语义结构作为指导信息,要求在哈希空间中,数据的语义信息能够得到保持,从而完成哈希编码的学习.因此,如何精确地表示语义结构以及哈希编码成为...  相似文献   

6.
7.
8.
In this paper, we propose an image semantic model based on the knowledge and criteria in the field of linguistics and taxonomy. Our work bridges the "semantic gap" by seamlessly exploiting the synergy of both visual feature processing and semantic relevance computation in a new way, and provides improved query efficiency and effectiveness for large general image databases. Our main contributions are as follows: We design novel data structures, namely a Lexical Hierarchy, an Image-Semantic Hierarchy, and a number of Atomic Semantic Domains, to capture the semantics and the features of the database, and to provide the indexing scheme. We present a novel image query algorithm based on the proposed structures. In addition, we propose a novel term expansion mechanism to improve the lexical processing. Our extensive experiments indicate that our proposed techniques are effective in achieving high run-time performance with improved retrieval accuracy. The experiments also show that the proposed method has good scalability.  相似文献   

9.
在实际应用中, 为分类模型提供大量的人工标签越来越困难, 因此, 近几年基于半监督的图像分类问题获得了越来越多的关注.而大量实验表明, 在生成对抗网络(Generative adversarial network, GANs)的训练过程中, 引入少量的标签数据能获得更好的分类效果, 但在该类模型的框架中并没有考虑用于提取图像特征的结构, 为了进一步利用其模型的学习能力, 本文提出一种新的半监督分类模型.该模型在原生成对抗网络模型中添加了一个编码器结构, 用于直接提取图像特征, 并构造了一种新的半监督训练方式, 获得了突出的分类效果.本模型分别在标准的手写体识别数据库MNIST、街牌号数据库SVHN和自然图像数据库CIFAR-10上完成了数值实验, 并与其他半监督模型进行了对比, 结果表明本文所提模型在使用少量带标数据情况下得到了更高的分类精度.  相似文献   

10.
田加林  徐行  沈复民  申恒涛 《软件学报》2022,33(9):3152-3164
零样本草图检索将未见类的草图作为查询样本,用于检索未见类的图像。因此,这个任务同时面临两个挑战:草图和图像之间的模态差异以及可见类和未见类的不一致性。过去的方法通过将草图和图像投射到一个公共空间来消除模态差异,还通过利用语义嵌入(如词向量和词相似度)来弥合可见类和未见类的语义不一致。在本文中,我们提出了跨模态自蒸馏方法,从知识蒸馏的角度研究可泛化的特征,无需语义嵌入参与训练。具体而言,我们首先通过传统的知识蒸馏将预训练的图像识别网络的知识迁移到学生网络。然后,通过草图和图像的跨模态相关性,跨模态自蒸馏将上述知识间接地迁移到草图模态的识别上,提升草图特征的判别性和泛化性。为了进一步提升知识在草图模态内的集成和传播,我们进一步地提出草图自蒸馏。通过为数据学习辨别性的且泛化的特征,学生网络消除了模态差异和语义不一致性。我们在三个基准数据集,即Sketchy、TU-Berlin和QuickDraw,进行了广泛的实验,证明了我们提出的跨模态自蒸馏方法与当前方法相比较的优越性。  相似文献   

11.
Content-based image retrieval aims at substituting traditional indexing based on manual annotation by using automatically-extracted visual indexing features. Novel techniques are needed however to efficiently deal with the semantic gap (i.e. the partial match between the low-level features and the visual content). Here, we investigate a query-free retrieval approach first proposed by Ferecatu and Geman. This approach relies solely on an iterative relevance feedback mechanism that drives a heuristic sampling of the collection, and aims to take explicitly into account the semantic gap. Our contributions are related to three complementary aspects. First, we formalize a large-scale approach based on a hierarchical tree-like organization of the images computed off-line. Second, we propose a versatile modulation of the exploration/exploitation trade-off based on the consistency of the system internal states between successive iterations. Third, we elaborate a long-term optimization of the similarity metric based on the user searching session logs accumulated off-line. We implemented a web-application that integrates all our contributions, and distribute it under the AGPL Version 3 free software license. We organized user-based evaluation campaigns using ImageNet dataset, and show empirically that our contributions significantly improve the retrieval performance of the original framework, that they are complementary to each other, and that their overall integration is consistently beneficial.  相似文献   

12.
当前主流的Web图像检索方法仅考虑了视觉特征,没有充分利用Web图像附带的文本信息,并忽略了相关文本中涉及的有价值的语义,从而导致其图像表达能力不强。针对这一问题,提出了一种新的无监督图像哈希方法——基于语义迁移的深度图像哈希(semantic transfer deep visual hashing,STDVH)。该方法首先利用谱聚类挖掘训练文本的语义信息;然后构建深度卷积神经网络将文本语义信息迁移到图像哈希码的学习中;最后在统一框架中训练得到图像的哈希码和哈希函数,在低维汉明空间中完成对大规模Web图像数据的有效检索。通过在Wiki和MIR Flickr这两个公开的Web图像集上进行实验,证明了该方法相比其他先进的哈希算法的优越性。  相似文献   

13.
基于监督学习的卷积神经网络被证明在图像识别的任务中具有强大的特征学习能力。然而,利用监督的深度学习方法进行图像检索,需要大量已标注的数据,否则很容易出现过拟合的问题。为了解决这个问题,提出了一种新颖的基于深度自学习的图像哈希检索方法。首先,通过无监督的自编码网络学习到一个具有判别性的特征表达函数,这种方法降低了学习的复杂性,让训练样本不需要依赖于有语义标注的图像,算法被迫在大量未标注的数据上学习更强健的特征。其次,为了加快检索速度,抛弃了传统利用欧氏距离计算相似性的方法,而使用感知哈希算法来进行相似性衡量。这两种技术的结合确保了在获得更好的特征表达的同时,获得了更快的检索速度。实验结果表明,提出的方法优于一些先进的图像检索方法。  相似文献   

14.
In content-based image retrieval (CBIR), relevance feedback has been proven to be a powerful tool for bridging the gap between low level visual features and high level semantic concepts. Traditionally, relevance feedback driven CBIR is often considered as a supervised learning problem where the user provided feedbacks are used to learn a distance metric or classification function. However, CBIR is intrinsically a semi-supervised learning problem in which the testing samples (images in the database) are present during the learning process. Moreover, when there are no sufficient feedbacks, these methods may suffer from the overfitting problem. In this paper, we propose a novel neighborhood preserving regression algorithm which makes efficient use of both labeled and unlabeled images. By using the unlabeled images, the geometrical structure of the image space can be incorporated into the learning system through a regularizer. Specifically, from all the functions which minimize the empirical loss on the labeled images, we select the one which best preserves the local neighborhood structure of the image space. In this way, our method can obtain a regression function which respects both semantic and geometrical structures of the image database. We present experimental evidence suggesting that our algorithm is able to use unlabeled data effectively for image retrieval.  相似文献   

15.
While people compare images using semantic concepts, computers compare images using low-level visual features that sometimes have little to do with these semantics. To reduce the gap between the high-level semantics of visual objects and the low-level features extracted from them, in this paper we develop a framework of learning pseudo metrics (LPM) using neural networks for semantic image classification and retrieval. Performance analysis and comparative studies, by experimenting on an image database, show that the LPM has potential application to multimedia information processing.  相似文献   

16.
Semantic filtering and retrieval of multimedia content is crucial for efficient use of the multimedia data repositories. Video query by semantic keywords is one of the most difficult problems in multimedia data retrieval. The difficulty lies in the mapping between low-level video representation and high-level semantics. We therefore formulate the multimedia content access problem as a multimedia pattern recognition problem. We propose a probabilistic framework for semantic video indexing, which call support filtering and retrieval and facilitate efficient content-based access. To map low-level features to high-level semantics we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music etc. Semantic concepts in videos interact and to model this interaction explicitly, we propose a network of multijects (multinet). Using probabilistic models for six site multijects, rocks, sky, snow, water-body forestry/greenery and outdoor and using a Bayesian belief network as the multinet we demonstrate the application of this framework to semantic indexing. We demonstrate how detection performance can be significantly improved using the multinet to take interconceptual relationships into account. We also show how the multinet can fuse heterogeneous features to support detection based on inference and reasoning  相似文献   

17.
目的 随着高光谱成像技术的飞速发展,高光谱数据的应用越来越广泛,各场景高光谱图像的应用对高精度详细标注的需求也越来越旺盛。现有高光谱分类模型的发展大多集中于有监督学习,大多数方法都在单个高光谱数据立方中进行训练和评估。由于不同高光谱数据采集场景不同且地物类别不一致,已训练好的模型并不能直接迁移至新的数据集得到可靠标注,这也限制了高光谱图像分类模型的进一步发展。本文提出跨数据集对高光谱分类模型进行训练和评估的模式。方法 受零样本学习的启发,本文引入高光谱类别标签的语义信息,拟通过将不同数据集的原始数据及标签信息分别映射至同一特征空间以建立已知类别和未知类别的关联,再通过将训练数据集的两部分特征映射至统一的嵌入空间学习高光谱图像视觉特征和类别标签语义特征的对应关系,即可将该对应关系应用于测试数据集进行标签推理。结果 实验在一对同传感器采集的数据集上完成,比较分析了语义—视觉特征映射和视觉—语义特征映射方向,对比了5种基于零样本学习的特征映射方法,在高光谱图像分类任务中实现了对分类模型在不同数据集上的训练和评估。结论 实验结果表明,本文提出的基于零样本学习的高光谱分类模型可以实现跨数据集对分类模型进行训练和评估,在高光谱图像分类任务中具有一定的发展潜力。  相似文献   

18.
In this paper, we present a new framework for organizing image collections into structures that can be used for indexing, browsing, retrieval and summarization. Instead of using tree-based techniques which are not suitable for images, we develop a new solution that is specifically designed for image collections. We consider both low-level image content and high-level semantics in an attempt to alleviate the semantic gap encountered by many systems. The fact that our model is based on a probabilistic framework makes it possible to combine it in a natural way with probabilistic techniques developed recently for image retrieval. The structure our model generates is applied for four purposes. The first is to provide retrieval module with an index, which allows it to improve retrieval time and accuracy, while the second is to provide users with a hierarchical browsing catalog that allows them to navigate the image collection by subject. This represents an additional step towards facilitating human-computer interaction in the context of image retrieval and navigation. The third aim is to provide users with a summarization of the general content of each class in the collection, and the fourth is a retrieval mechanism. Related issues such as relevance feedback and feature selection are also addressed. The experiments at the end of the paper show that the proposed framework yields some significant improvements  相似文献   

19.
In this paper, we propose a novel visual tracking algorithm using the collaboration of generative and discriminative trackers under the particle filter framework. Each particle denotes a single task, and we encode all the tasks simultaneously in a structured multi-task learning manner. Then, we implement generative and discriminative trackers, respectively. The discriminative tracker considers the overall information of object to represent the object appearance; while the generative tracker takes the local information of object into account for handling partial occlusions. Therefore, two models are complementary during the tracking. Furthermore, we design an effective dictionary updating mechanism. The dictionary is composed of fixed and variational parts. The variational parts are progressively updated using Metropolis–Hastings strategy. Experiments on different challenging video sequences demonstrate that the proposed tracker performs favorably against several state-of-the-art trackers.  相似文献   

20.
Many problems in information processing involve some form of dimensionality reduction, such as face recognition, image/text retrieval, data visualization, etc. The typical linear dimensionality reduction algorithms include principal component analysis (PCA), random projection, locality-preserving projection (LPP), etc. These techniques are generally unsupervised which allows them to model data in the absence of labels or categories. In this paper, we propose a semi-supervised subspace learning algorithm for image retrieval. In relevance feedback-driven image retrieval system, the user-provided information can be used to better describe the intrinsic semantic relationships between images. Our algorithm is fundamentally based on LPP which can incorporate user's relevance feedbacks. As the user's feedbacks are accumulated, we can ultimately obtain a semantic subspace in which different semantic classes can be best separated and the retrieval performance can be enhanced. We compared our proposed algorithm to PCA and the standard LPP. Experimental results on a large collection of images have shown the effectiveness and efficiency of our proposed algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号