首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
三角形约束下的词袋模型图像分类方法   总被引:1,自引:0,他引:1  
汪荣贵  丁凯  杨娟  薛丽霞  张清杨 《软件学报》2017,28(7):1847-1861
视觉词袋模型广泛地应用于图像分类与图像检索等领域.在传统词袋模型中,视觉单词统计方法忽略了视觉词之间的空间信息以及分类对象形状信息,导致图像特征表示区分能力不足.本文提出了一种改进的视觉词袋方法,结合显著区域提取和视觉单词拓扑结构,不仅能够产生更具代表性的视觉单词,而且能够在一定程度上避免复杂背景信息和位置变化带来的干扰.首先,通过对训练图像进行显著区域提取,在得到的显著区域上构建视觉词袋模型.其次,为了更精确的描述图像的特征,抵抗多变的位置和背景信息的影响,该方法采用视觉单词拓扑结构策略和三角剖分方法,融入全局信息和局部信息.通过仿真实验,并与传统的词袋模型及其他模型进行比较,结果表明本文提出的方法获得了更高的分类准确率.  相似文献   

2.
一种用于图像分类的多视觉短语学习方法   总被引:2,自引:0,他引:2  
针对词袋图像表示模型的语义区分性和描述能力有限的问题,以及由于传统的基于词袋模型的分类方法性能容易受到图像中背景、遮挡等因素影响的问题,本文提出了一种用于图像分类的多视觉短语学习方法.通过构建具有语义区分性和空间相关性的视觉短语取代视觉单词,以改善图像的词袋模型表示的准确性.在此基础上,结合多示例学习思想,提出一种多视觉短语学习方法,使最终的分类模型能反映图像类别的区域特性.在一些标准测试集合如Calrech-101[1]和Scene-15[2]上的实验结果验证了本文所提方法的有效性,分类性能分别相对提高了约9%和7%.  相似文献   

3.
基于全局优化策略的场景分类算法   总被引:1,自引:0,他引:1  
提出一种基于全局优化策略的场景分类算法.该算法基于整幅图像提取全局场景特征——空间包络特征.从图像块中提取视觉单词,且定义隐变量表示该视觉单词语义,然后引入隐状态结构图描述整幅图像的视觉单词上下文;在场景分类策略上,构造由相容函数组成的目标函数,其中相容函数度量全局场景特征、隐变量与场景类别标记的相容度,通过求解目标函数的全局最优解推断图像的场景类别标记.在标准场景图像库上的对比实验表明该算法优于当前有代表性的场景分类算法.  相似文献   

4.
为生成有效表示图像场景语义的视觉词典,提高场景语义标注性能,提出一种基于形式概念分析(FCA)的图像场景语义标注模型。该方法首先将训练图像集与其初始的视觉词典抽象为形式背景,采用信息熵标识了各视觉单词的权重,并分别构造了各场景类别概念格结构;然后再利用各视觉单词权重的均值刻画概念格内涵上各组合视觉单词标注图像的贡献,按照类别视觉词典生成阈值,从格结构上有效提取了标注各类场景图像语义的视觉词典;最后,利用K最近邻标注测试图像的场景语义。在Fei-Fei Scene 13类自然场景图像数据集上进行实验,并与Fei-Fei方法和Bai方法相比,结果表明该方法在β=0.05和γ=15时,标注分类精度更优。  相似文献   

5.
感兴趣区域(ROI)的分类是医学图像的计算机辅助诊断过程的最后一步,传统方法只针对每个ROI区域单独提取特征,再利用统计学习的方法训练分类器进行分类.然而图像中每个区域所包含的视觉特征有限,很难进行准确的分类.文中提出一种基于LDA主题模型的改进模型(LDAC),考虑ROI周围区域,即图像的上下文关系,通过利用LDA对ROI周围区域所包含的上下文信息进行建模,同时结合ROI区域的视觉信息和类别标签,从而辅助ROI区域的分类,以达到提高分类准确率的目的.乳腺图像肿块分类实验表明,文中方法可提高分类的准确性.  相似文献   

6.
针对传统"视觉词包(BOW)模型"识别铁路扣件状态时仅利用扣件图像的特征域,忽略其空间域中上下文语义信息的缺点,提出了一种基于上下文语义信息的扣件检测模型.在传统"视觉词包模型"的基础上,引入吉布斯随机场模型对图像中像素的空间相关性进行建模,将图像块在特征域的相似性与空间域的上下文语义约束关系结合,更准确地定义视觉单词;利用潜在狄利克雷分布(LDA)学习扣件图像的主题分布;采用支持向量机(SVM)对扣件进行分类识别.对4类扣件图像的分类实验证明:模型能够有效提高扣件分类精度.  相似文献   

7.
视觉词典容量是影响图像场景分类精度的重要因素之一,大容量的视觉词典因计算量较大影响了分类的效率,而小容量的视觉词典由于多义词问题的严重致使场景分类精度降低.针对该问题,提出一种基于概念格层次分析的视觉词典生成方法.首先生成关于训练图像视觉词包模型的初始视觉词典;然后在构造的概念格上利用概念格的概念层次性,通过动态地调整外延数阈值,获取粒度大小不同容量的描述图像各场景语义的约简视觉词典;最后对各类约简视觉单词构成向量进行异或,删除多义词,进而生成有效描述图像场景语义的视觉词典.实验结果表明,文中方法是有效的.  相似文献   

8.
王彦杰  刘峡壁  贾云得 《软件学报》2012,23(7):1787-1795
基于视觉词的统计建模和判别学习,提出一种视觉词软直方图的图像表示方法.假设属于同一视觉词的图像局部特征服从高斯混合分布,利用最大-最小后验伪概率判别学习方法从样本中估计该分布,计算局部特征与视觉词的相似度.累加图像中每个视觉词与对应局部特征的相似度,在全部视觉词集合上进行结果的归一化,得到图像的视觉词软直方图.讨论了两种具体实现方法:一种是基于分类的软直方图方法,该方法根据相似度最大原则建立局部特征与视觉词的对应关系;另一种是完全软直方图方法,该方法将每个局部特征匹配到所有视觉词.在数据库Caltech-4和PASCAL VOC 2006上的实验结果表明,该方法是有效的.  相似文献   

9.
基于视觉词的统计建模和判别学习,提出一种视觉词软直方图的图像表示方法.假设属于同一视觉词的图像局部特征服从高斯混合分布,利用最大-最小后验伪概率判别学习方法从样本中估计该分布,计算局部特征与视觉词的相似度.累加图像中每个视觉词与对应局部特征的相似度,在全部视觉词集合上进行结果的归一化,得到图像的视觉词软直方图.讨论了两种具体实现方法:一种是基于分类的软直方图方法,该方法根据相似度最大原则建立局部特征与视觉词的对应关系;另一种是完全软直方图方法,该方法将每个局部特征匹配到所有视觉词,在数据库Caltech-4和PASCAL VOC 2006上的实验结果表明,该方法是有效的.  相似文献   

10.
一种结合相关性和多样性的图像标签推荐方法   总被引:1,自引:0,他引:1  
为了帮助用户高效地组织和检索图像资源,多数图像分享站点允许用户为图像添加标签.图像标签推荐系统旨在提供一组标签候选项来方便用户完成添加标签的过程.以往的图像标签推荐方法往往利用标签间的共现信息进行标签推荐.但是,由于忽略了图像的视觉内容信息和被推荐标签之间的多样性,以往方法的推荐结果常存在标签歧义和标签冗余的问题.为了解决上述问题,文中提出了一种新的图像标签推荐方法,该方法综合考虑了被推荐标签的相关性和多样性.首先,利用视觉语言模型,该方法分别计算标签与图像的相关性和标签之间的视觉距离.然后,基于上述计算,给出一个贪心搜索算法来找到能合理地平衡相关性和多样性的标签集合,将该集合作为最终的推荐.在Flickr数据集上的实验结果表明,该方法在准确率、主题覆盖率和F1测度上均优于目前的代表性方法.  相似文献   

11.
Visual vocabulary representation approach has been successfully applied to many multimedia and vision applications, including visual recognition, image retrieval, and scene modeling/categorization. The idea behind the visual vocabulary representation is that an image can be represented by visual words, a collection of local features of images. In this work, we will develop a new scheme for the construction of visual vocabulary based on the analysis of visual word contents. By considering the content homogeneity of visual words, we design a visual vocabulary which contains macro-sense and micro-sense visual words. The two types of visual words are appropriately further combined to describe an image effectively. We also apply the visual vocabulary to construct image retrieving and categorization systems. The performance evaluation for the two systems indicates that the proposed visual vocabulary achieves promising results.  相似文献   

12.
This paper presents a novel appearance-based technique for topological robot localization and place recognition. A vocabulary of visual words is formed automatically, representing local features that frequently occur in the set of training images. Using the vocabulary, a spatial pyramid representation is built for each image by repeatedly subdividing it and computing histograms of visual words at increasingly fine resolutions. An information maximization technique is then applied to build a hierarchical classifier for each class by learning informative features. While top-level features in the hierarchy are selected from the coarsest resolution of the representation, capturing the holistic statistical properties of the images, child features are selected from finer resolutions, encoding more local characteristics, redundant with the information coded by their parents. Exploiting the redundancy in the data enables the localization system to achieve greater reliability against dynamic variations in the environment. Achieving an average classification accuracy of 88.9% on a challenging topological localization database, consisting of twenty seven outdoor places, demonstrates the advantages of our hierarchical framework for dealing with dynamic variations that cannot be learned during training.  相似文献   

13.
14.
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances of the objects. Only a small fraction of local features within any given image are associated with a particular caption word, and captions may contain irrelevant words not associated with any image object. We propose a novel algorithm that uses the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to learn meaningful feature configurations (representing named objects). We also introduce a graph-based appearance model that captures some of the structure of an object by encoding the spatial relationships among the local visual features. In an iterative procedure, we use language (the words) to drive a perceptual grouping process that assembles an appearance model for a named object. Results of applying our method to three data sets in a variety of conditions demonstrate that, from complex, cluttered, real-world scenes with noisy captions, we can learn both the names and appearances of objects, resulting in a set of models invariant to translation, scale, orientation, occlusion, and minor changes in viewpoint or articulation. These named models, in turn, are used to automatically annotate new, uncaptioned images, thereby facilitating keyword-based image retrieval.  相似文献   

15.
随着智能设备的不断出现,图像数量急速增加,但是很多图像因为没有被标注所以未被充分利用.为了能够使该问题得到较好解决,提出了基于LDA和卷积神经网络的半监督图像标注方法.首先把图像训练集中的所有文字信息放入LDA中,生成图像的文字标注词;然后使用卷积神经网络获得图像的高层视觉特征,同时用加入注意力机制和修改损失函数的方法...  相似文献   

16.
提出一种概率签名的图像分布描述及对应的图像分类算法.算法首先通过高斯混合模型建立图像局部特征分布,然后以混合模型中各个模式的均值为聚类中心,以图像中满足约束条件的局部特征对相应模式的后验概率之和为聚类大小来形成初始的概率签名,最后执行一个压缩过程确定最终的概率签名特征,并通过训练基于Earth Mover's Distance (EMD)核的SVM分类器完成图像分类.概率签名允许一个局部特征对多个聚类做出反映,可以编码更多判别信息以及从视觉感知上捕捉更多的相似性.通过与其它图像分类方法在场景识别和对象分类两项任务上的对比实验,验证了文中提出的分类方法的有效性.  相似文献   

17.
With the continuously increasing needs of location information for users around the world, applications of geospatial information have gained a lot of attention in both research and commercial organizations. Extraction of semantics from the image content for geospatial information seeking and knowledge discovery has been thus becoming a critical process. Unfortunately, the available geographic images may be blurred, either too light or too dark. It is therefore often hard to extract geographic features directly from images. In this paper, we describe our developed methods in applying local scale-invariant features and bag-of-keypoints techniques to annotating images, in order to carry out image categorization and geographic knowledge discovery tasks. First, local scale-invariant features are extracted from geographic images as representative geographic features. Subsequently, the bag-of-keypoints methods are used to construct a visual vocabulary and generate feature vectors to support image categorization and annotation. The annotated images are classified by using geographic nouns. The experimental results show that the proposed approach is sensible and can effectively enhance the tasks of geographic knowledge discovery.  相似文献   

18.
综合结构和纹理特征的场景识别   总被引:1,自引:0,他引:1  
当前在计算机视觉领域,场景识别尽管取得了较大进展,但其对于计算机视觉而言,仍然是一个极具挑战的问题.此前的场景识别方法,有些需要预先手动地对训练图像进行语义标注,并且大部分场景识别方法均基于"特征袋"模型,需要对提取的大量特征进行聚类,计算量和内存消耗均很大,且初始聚类中心及聚类数目的选择对识别效果有较大影响.为此本文提出一种不基于"特征袋"模型的无监督场景识别方法.先通过亚采样构建多幅不同分辨率的图像,在多级分辨率图像上,分别提取结构和纹理特征,用本文提出的梯度方向直方图描述方法表示图像的结构特征,用Gabor滤波器组和Schmid滤波集对图像的滤波响应表示图像的纹理特征,并将结构和纹理特征作为相互独立的两个特征通道,最后综合这两个特征通道,通过SVM分类,实现对场景的自动识别.分别在Oliva,Li Fei-Fei和Lazebnik等的8类、13类和15类场景图像库上进行测试实验,实验结果表明,梯度方向直方图描述方法比经典的SIFT描述方法,有着更好的场景识别性能;综合结构和纹理特征的场景识别方法,在通用的三个场景图像库上取得了很好的识别效果.  相似文献   

19.
Many recent state-of-the-art image retrieval approaches are based on Bag-of-Visual-Words model and represent an image with a set of visual words by quantizing local SIFT (scale invariant feature transform) features. Feature quantization reduces the discriminative power of local features and unavoidably causes many false local matches between images, which degrades the retrieval accuracy. To filter those false matches, geometric context among visual words has been popularly explored for the verification of geometric consistency. However, existing studies with global or local geometric verification are either computationally expensive or achieve limited accuracy. To address this issue, in this paper, we focus on partial duplicate Web image retrieval, and propose a scheme to encode the spatial context for visual matching verification. An efficient affine enhancement scheme is proposed to refine the verification results. Experiments on partial-duplicate Web image search, using a database of one million images, demonstrate the effectiveness and efficiency of the proposed approach. Evaluation on a 10-million image database further reveals the scalability of our approach.  相似文献   

20.
Scale-Invariant Visual Language Modeling for Object Categorization   总被引:2,自引:0,他引:2  
In recent years, ldquobag-of-wordsrdquo models, which treat an image as a collection of unordered visual words, have been widely applied in the multimedia and computer vision fields. However, their ignorance of the spatial structure among visual words makes them indiscriminative for objects with similar word frequencies but different word spatial distributions. In this paper, we propose a visual language modeling method (VLM), which incorporates the spatial context of the local appearance features into the statistical language model. To represent the object categories, models with different orders of statistical dependencies have been exploited. In addition, the multilayer extension to the VLM makes it more resistant to scale variations of objects. The model is effective and applicable to large scale image categorization. We train scale invariant visual language models based on the images which are grouped by Flickr tags, and use these models for object categorization. Experimental results show they achieve better performance than single layer visual language models and ldquobag-of-wordsrdquo models. They also achieve comparable performance with 2-D MHMM and SVM-based methods, while costing much less computational time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号