首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
2.
3.
目的 以词袋模型为基础的拷贝图像检索方法是当前最有效的方法。然而,由于局部特征量化存在信息损失,导致视觉词汇区别能力不足和视觉词汇误匹配增加,从而影响了拷贝图像检索效果。针对视觉词汇的误匹配问题,提出一种基于近邻上下文的拷贝图像检索方法。该方法通过局部特征的上下文关系消除视觉词汇歧义,提高视觉词汇的区分度,进而提高拷贝图像的检索效果。方法 首先,以距离和尺度关系选择图像中某局部特征点周围的特征点作为该特征点的上下文,选取的上下文中的局部特征点称为近邻特征点;再以近邻特征点的信息以及与该局部特征的关系为该局部特征构建上下文描述子;然后,通过计算上下文描述子的相似性对局部特征匹配对进行验证;最后,以正确匹配特征点的个数衡量图像间的相似性,并以此相似性选取若干候选图像作为返回结果。结果 在Copydays图像库进行实验,与Baseline方法进行比较。在干扰图像规模为100 k时,相对于Baseline方法,mAP提高了63%。当干扰图像规模从100 k增加到1 M时,Baseline的mAP值下降9%,而本文方法下降3%。结论 本文拷贝图像检索方法对图像编辑操作,如旋转、图像叠加、尺度变换以及裁剪有较高的鲁棒性。该方法可以有效地应用到图像防伪、图像去重等领域。  相似文献   

4.
A class-consistent k-means clustering algorithm (CCKM) and its hierarchical extension (Hierarchical CCKM) are presented for generating discriminative visual words for recognition problems. In addition to using the labels of training data themselves, we associate a class label with each cluster center to enforce discriminability in the resulting visual words. Our algorithms encourage data points from the same class to be assigned to the same visual word, and those from different classes to be assigned to different visual words. More specifically, we introduce a class consistency term in the clustering process which penalizes assignment of data points from different classes to the same cluster. The optimization process is efficient and bounded by the complexity of k-means clustering. A very efficient and discriminative tree classifier can be learned for various recognition tasks via the Hierarchical CCKM. The effectiveness of the proposed algorithms is validated on two public face datasets and four benchmark action datasets.  相似文献   

5.
一种用于图像分类的多视觉短语学习方法   总被引:2,自引:0,他引:2  
针对词袋图像表示模型的语义区分性和描述能力有限的问题,以及由于传统的基于词袋模型的分类方法性能容易受到图像中背景、遮挡等因素影响的问题,本文提出了一种用于图像分类的多视觉短语学习方法.通过构建具有语义区分性和空间相关性的视觉短语取代视觉单词,以改善图像的词袋模型表示的准确性.在此基础上,结合多示例学习思想,提出一种多视觉短语学习方法,使最终的分类模型能反映图像类别的区域特性.在一些标准测试集合如Calrech-101[1]和Scene-15[2]上的实验结果验证了本文所提方法的有效性,分类性能分别相对提高了约9%和7%.  相似文献   

6.
7.
王彦杰  刘峡壁  贾云得 《软件学报》2012,23(7):1787-1795
基于视觉词的统计建模和判别学习,提出一种视觉词软直方图的图像表示方法.假设属于同一视觉词的图像局部特征服从高斯混合分布,利用最大-最小后验伪概率判别学习方法从样本中估计该分布,计算局部特征与视觉词的相似度.累加图像中每个视觉词与对应局部特征的相似度,在全部视觉词集合上进行结果的归一化,得到图像的视觉词软直方图.讨论了两种具体实现方法:一种是基于分类的软直方图方法,该方法根据相似度最大原则建立局部特征与视觉词的对应关系;另一种是完全软直方图方法,该方法将每个局部特征匹配到所有视觉词.在数据库Caltech-4和PASCAL VOC 2006上的实验结果表明,该方法是有效的.  相似文献   

8.
This paper presents a novel approach to automatic image annotation which combines global, regional, and contextual features by an extended cross-media relevance model. Unlike typical image annotation methods which use either global or regional features exclusively, as well as neglect the textual context information among the annotated words, the proposed approach incorporates the three kinds of information which are helpful to describe image semantics to annotate images by estimating their joint probability. Specifically, we describe the global features as a distribution vector of visual topics and model the textual context as a multinomial distribution. The global features provide the global distribution of visual topics over an image, while the textual context relaxes the assumption of mutual independence among annotated words which is commonly adopted in most existing methods. Both the global features and textual context are learned by a probability latent semantic analysis approach from the training data. The experiments over 5k Corel images have shown that combining these three kinds of information is beneficial in image annotation.  相似文献   

9.
Multimodal Retrieval is a well-established approach for image retrieval. Usually, images are accompanied by text caption along with associated documents describing the image. Textual query expansion as a form of enhancing image retrieval is a relatively less explored area. In this paper, we first study the effect of expanding textual query on both image and its associated text retrieval. Our study reveals that judicious expansion of textual query through keyphrase extraction can lead to better results, either in terms of text-retrieval or both image and text-retrieval. To establish this, we use two well-known keyphrase extraction techniques based on tf-idf and KEA. While query expansion results in increased retrieval efficiency, it is imperative that the expansion be semantically justified. So, we propose a graph-based keyphrase extraction model that captures the relatedness between words in terms of both mutual information and relevance feedback. Most of the existing works have stressed on bridging the semantic gap by using textual and visual features, either in combination or individually. The way these text and image features are combined determines the efficacy of any retrieval. For this purpose, we adopt Fisher-LDA to adjudge the appropriate weights for each modality. This provides us with an intelligent decision-making process favoring the feature set to be infused into the final query. Our proposed algorithm is shown to supersede the previously mentioned keyphrase extraction algorithms for query expansion significantly. A rigorous set of experiments performed on ImageCLEF-2011 Wikipedia Retrieval task dataset validates our claim that capturing the semantic relation between words through Mutual Information followed by expansion of a textual query using relevance feedback can simultaneously enhance both text and image retrieval.  相似文献   

10.
Visual vocabulary representation approach has been successfully applied to many multimedia and vision applications, including visual recognition, image retrieval, and scene modeling/categorization. The idea behind the visual vocabulary representation is that an image can be represented by visual words, a collection of local features of images. In this work, we will develop a new scheme for the construction of visual vocabulary based on the analysis of visual word contents. By considering the content homogeneity of visual words, we design a visual vocabulary which contains macro-sense and micro-sense visual words. The two types of visual words are appropriately further combined to describe an image effectively. We also apply the visual vocabulary to construct image retrieving and categorization systems. The performance evaluation for the two systems indicates that the proposed visual vocabulary achieves promising results.  相似文献   

11.
目的 现有视觉问答方法通常只关注图像中的视觉物体,忽略了对图像中关键文本内容的理解,从而限制了图像内容理解的深度和精度。鉴于图像中隐含的文本信息对理解图像的重要性,学者提出了针对图像中场景文本理解的“场景文本视觉问答”任务以量化模型对场景文字的理解能力,并构建相应的基准评测数据集TextVQA(text visual question answering)和ST-VQA(scene text visual question answering)。本文聚焦场景文本视觉问答任务,针对现有基于自注意力模型的方法存在过拟合风险导致的性能瓶颈问题,提出一种融合知识表征的多模态Transformer的场景文本视觉问答方法,有效提升了模型的稳健性和准确性。方法 对现有基线模型M4C(multimodal multi-copy mesh)进行改进,针对视觉对象间的“空间关联”和文本单词间的“语义关联”这两种互补的先验知识进行建模,并在此基础上设计了一种通用的知识表征增强注意力模块以实现对两种关系的统一编码表达,得到知识表征增强的KR-M4C(knowledge-representation-enhanced M4C)方法。结果 在TextVQA和ST-VQA两个场景文本视觉问答基准评测集上,将本文KR-M4C方法与最新方法进行比较。本文方法在TextVQA数据集中,相比于对比方法中最好的结果,在不增加额外训练数据的情况下,测试集准确率提升2.4%,在增加ST-VQA数据集作为训练数据的情况下,测试集准确率提升1.1%;在ST-VQA数据集中,相比于对比方法中最好的结果,测试集的平均归一化Levenshtein相似度提升5%。同时,在TextVQA数据集中进行对比实验以验证两种先验知识的有效性,结果表明提出的KR-M4C模型提高了预测答案的准确率。结论 本文提出的KR-M4C方法的性能在TextVQA和ST-VQA两个场景文本视觉问答基准评测集上均有显著提升,获得了在该任务上的最好结果。  相似文献   

12.
针对基于深度特征的图像标注模型训练复杂、时空开销大的不足,提出一种由深 度学习中间层特征表示图像视觉特征、由正例样本均值向量表示语义概念的图像标注方法。首 先,通过预训练深度学习模型的中间层直接输出卷积结果作为低层视觉特征,并采用稀疏编码 方式表示图像;然后,采用正例均值向量法为每个文本词汇构造视觉特征向量,从而构造出文 本词汇的视觉特征向量库;最后,计算测试图像与所有文本词汇的视觉特征向量相似度,并取 相似度最大的若干词汇作为标注词。多个数据集上的实验证明了所提出方法的有效性,就 F1 值而言,该方法在 IAPR TC-12 数据集上的标注性能比采用端到端深度特征的 2PKNN 和 JEC 分 别提高 32%和 60%。  相似文献   

13.
In this paper, we introduce the backoff hierarchical class n-gram language models to better estimate the likelihood of unseen n-gram events. This multi-level class hierarchy language modeling approach generalizes the well-known backoff n-gram language modeling technique. It uses a class hierarchy to define word contexts. Each node in the hierarchy is a class that contains all the words of its descendant nodes. The closer a node to the root, the more general the class (and context) is. We investigate the effectiveness of the approach to model unseen events in speech recognition. Our results illustrate that the proposed technique outperforms backoff n-gram language models. We also study the effect of the vocabulary size and the depth of the class hierarchy on the performance of the approach. Results are presented on Wall Street Journal (WSJ) corpus using two vocabulary set: 5000 words and 20,000 words. Experiments with 5000 word vocabulary, which contain a small numbers of unseen events in the test set, show up to 10% improvement of the unseen event perplexity when using the hierarchical class n-gram language models. With a vocabulary of 20,000 words, characterized by a larger number of unseen events, the perplexity of unseen events decreases by 26%, while the word error rate (WER) decreases by 12% when using the hierarchical approach. Our results suggest that the largest gains in performance are obtained when the test set contains a large number of unseen events.  相似文献   

14.
This paper proposes an image appearance-based method to deal with the loop closure detection problem of monocular simultaneous localization and mapping for mobile robots. A bag-of-visual words approach is presented for building an appearance-based scene model. Subsequently, a fuzzy $K$ -means method is proposed to build a visual vocabulary synchronously. Each image can be represented by a vector of weighted words. The similarity between images is evaluated by the scalar product between the weighted vectors. A Bayesian filter algorithm is applied to update the detection probability and an inverse image retrieval method is employed to eliminate the wrong loop closure results. The experimental results demonstrate the efficiency of our proposed method.  相似文献   

15.
16.
草图检索是图像处理领域中的重要研究内容。提出了一种将高斯金字塔和局部HOG特征融合的特征提取改进方法,并将其用于草图检索。采用高斯金字塔将图像分解到多尺度空间,在所有尺度上进行兴趣点提取,获得基于兴趣点的多尺度HOG特征。利用图像的多尺度HOG特征集生成视觉词典,最终形成与视觉词典相关的特征描述向量,通过相似度匹配实现草图检索。将该算法与单一尺度下的HOG算法及其他几种算法比较,实验结果表明了其可行性和有效性。  相似文献   

17.
18.
19.
李东艳  李绍滋  柯逍 《计算机应用》2010,30(10):2610-2613
针对图像标注中所使用数据集存在的数据不平衡问题,提出一种新的基于外部数据库的自动平衡模型。该模型先依据原始数据库中词频分布来找出低频点,再根据自动平衡模式,对每个低频词,从外部数据库中增加相应的图片;然后对图片进行特征提取,对Corel 5k数据集中的47065个视觉词汇和从外部数据库中追加的图片中提取出来的996个视觉词汇进行聚类;最后利用基于外部数据库的图像自动标注改善模型对图像进行标注。此方法克服了图像标注中数据库存在的不平衡问题,使得至少被正确标注一次的词的数量、精确率和召回率等均明显提高。  相似文献   

20.
In the article a certain class of feature extractors for face recognition is presented. The extraction is based on simple approaches: image scaling with pixel concatenation into a feature vector, selection of a small number of points from the face area, face image’s spectrum, and finally pixel intensities histogram. The experiments performed on several facial image databases (BioID [4], ORL face database [27], FERET [30]) show that face recognition using this class of extractors is particularly efficient and fast, and can have straightforward implementations in software and hardware systems. They can also be used in fast face recognition system involving feature-integration, as well as a tool for similar faces retrieval in 2-tier systems (as initial processing, before exact face recognition).
Paweł ForczmańskiEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号