首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.
The selection of canonical images that best represent a scene type is very important for efficiently visualizing search results and re-ranking them. In this paper, we propose the selection of canonical images based on human affects that are hidden in the image. One is a probabilistic affective model (PAM) based probabilistic latent semantic analysis (PLSA) learning to annotate the image by human affects and the other is the cluster ranking algorithm to select the informative summary from vast search results. The PAM first extract the dominant color compositions (CCs) that constitute the image itself, through image segmentation and RAG analysis, then to infer numerical ratings from CCs for affective classes, a PLSA is employed that is well-known method in finding latent semantics from documents. Once converting the images to the affective space using PAM, the clustering is performed. Then to select the images that are representative among the images and are distinctive from each other, we identify three dominant properties such as coverage, affective coherence, and distinctiveness. Based on these, cluster ranking is performed. Finally, the representative images for each cluster are selected, all of which are displayed as canonical images to the user. Experiments were performed on Photo.Net and Google images and compared the results with other existing methods. Then our PAM showed the F1-scores of 0.667 on averages, which can improve 14% of the existing method. In addition, it is proven that the proposed system is superior to the others in selecting the canonical images, when comparing its performance with two baselines in terms of representative and diverse scores.  相似文献   

2.
Many recent image retrieval methods are based on the “bag-of-words” (BoW) model with some additional spatial consistency checking. This paper proposes a more accurate similarity measurement that takes into account spatial layout of visual words in an offline manner. The similarity measurement is embedded in the standard pipeline of the BoW model, and improves two features of the model: i) latent visual words are added to a query based on spatial co-occurrence, to improve query recall; and ii) weights of reliable visual words are increased to improve the precision. The combination of these methods leads to a more accurate measurement of image similarity. This is similar in concept to the combination of query expansion and spatial verification, but does not require query time processing, which is too expensive to apply to full list of ranked results. Experimental results demonstrate the effectiveness of our proposed method on three public datasets.  相似文献   

3.
4.
一种基于稀疏典型性相关分析的图像检索方法   总被引:1,自引:0,他引:1  
庄凌  庄越挺  吴江琴  叶振超  吴飞 《软件学报》2012,23(5):1295-1304
图像语义检索的一个关键问题就是要找到图像底层特征与语义之间的关联,由于文本是表达语义的一种有效手段,因此提出通过研究文本与图像两种模态之间关系来构建反映两者间潜在语义关联的有效模型的思路,基于该模型,可使用自然语言形式(文本语句)来表达检索意图,最终检索到相关图像.该模型基于稀疏典型性相关分析(sparse canonical correlation analysis,简称sparse CCA),按照如下步骤训练得到:首先利用隐语义分析方法构造文本语义空间,然后以视觉词袋(bag of visual words)来表达文本所对应的图像,最后通过Sparse CCA算法找到一个语义相关空间,以实现文本语义与图像视觉单词间的映射.使用稀疏的相关性分析方法可以提高模型可解释性和保证检索结果稳定性.实验结果验证了Sparse CCA方法的有效性,同时也证实了所提出的图像语义检索方法的可行性.  相似文献   

5.
基于全局优化策略的场景分类算法   总被引:1,自引:0,他引:1  
提出一种基于全局优化策略的场景分类算法.该算法基于整幅图像提取全局场景特征——空间包络特征.从图像块中提取视觉单词,且定义隐变量表示该视觉单词语义,然后引入隐状态结构图描述整幅图像的视觉单词上下文;在场景分类策略上,构造由相容函数组成的目标函数,其中相容函数度量全局场景特征、隐变量与场景类别标记的相容度,通过求解目标函数的全局最优解推断图像的场景类别标记.在标准场景图像库上的对比实验表明该算法优于当前有代表性的场景分类算法.  相似文献   

6.
基于词袋模型的图像表示方法的有效性主要受限于局部特征的量化误差。文中提出一种基于多视觉码本的图像表示方法,通过综合考虑码本构建和编码方法这两个方面的因素加以改进。具体包括:1)多视觉码本构建,以迭代方式构建多个紧凑且具有互补性的视觉码本;2)图像表示,首先针对多码本的情况,依次从各码本中选择相应的视觉单词并采用线性回归估计编码系数,然后结合图像的空间金字塔结构形成最终的图像表示。在一些标准测试集合的图像分类结果验证文中方法的有效性。  相似文献   

7.
Retrieving similar images based on its visual content is an important yet difficult problem. We propose in this paper a new method to improve the accuracy of content-based image retrieval systems. Typically, given a query image, existing retrieval methods return a ranked list based on the similarity scores between the query and individual images in the database. Our method goes further by relying on an analysis of the underlying connections among individual images in the database to improve this list. Initially, we consider each image in the database as a query and use an existing baseline method to search for its likely similar images. Then, the database is modeled as a graph where images are nodes and connections among possibly similar images are edges. Next, we introduce an algorithm to split this graph into stronger subgraphs, based on our notion of graph’s strength, so that images in each subgraph are expected to be truly similar to each other. We create for each subgraph a structure called integrated image which contains the visual features of all images in the subgraph. At query time, we compute the similarity scores not only between the query and individual database images but also between the query and the integrated images. The final similarity score of a database image is computed based on both its individual score and the score of the integrated image that it belongs to. This leads effectively to a re-ranking of the retrieved images. We evaluate our method on a common image retrieval benchmark and demonstrate a significant improvement over the traditional bag-of-words retrieval model.  相似文献   

8.
9.
在基于词袋模型的图像检索框架中,图像包含的SIFT特征点往往数量比较大,特征不够强。因此图像检索系统的效率和性能往往受影响。基于SIFT特征点的性质和视觉显著性原理,提出了SIFT特征点的局部对称性度量方法,并且在图像检索框架中嵌入了基于对称性的SIFT特征点过滤方法和加权策略,以提升SIFT特征点的利用效率。在牛津大学建筑物图像集上的实验结果表明,提出的基于对称性的SIFT特征点选择策略能有效地提高图像检索的性能。  相似文献   

10.
11.
Improving Image Classification Using Semantic Attributes   总被引:1,自引:0,他引:1  
The Bag-of-Words (BoW) model??commonly used for image classification??has two strong limitations: on one hand, visual words are lacking of explicit meanings, on the other hand, they are usually polysemous. This paper proposes to address these two limitations by introducing an intermediate representation based on the use of semantic attributes. Specifically, two different approaches are proposed. Both approaches consist in predicting a set of semantic attributes for the entire images as well as for local image regions, and in using these predictions to build the intermediate level features. Experiments on four challenging image databases (PASCAL VOC 2007, Scene-15, MSRCv2 and SUN-397) show that both approaches improve performance of the BoW model significantly. Moreover, their combination achieves the state-of-the-art results on several of these image databases.  相似文献   

12.
Along with the rapid development of mobile terminal devices, landmark recognition applications based on mobile devices have been widely researched in recent years. Due to the fast response time requirement of mobile users, an accurate and efficient landmark recognition system is thus urgent for mobile applications. In this paper, we propose a landmark recognition framework by employing a novel discriminative feature selection method and the improved extreme learning machine (ELM) algorithm. The scalable vocabulary tree (SVT) is first used to generate a set of preliminary codewords for landmark images. An efficient codebook learning algorithm derived from the word mutual information and Visual Rank technique is proposed to filter out those unimportant codewords. Then, the selected visual words, as the codebook for image encoding, are used to produce a compact Bag-of-Words (BoW) histogram. The fast ELM algorithm and the ensemble approach using the ELM classifier are utilized for landmark recognition. Experiments on the Nanyang Technological University campus’s landmark database and the Fifteen Scene database are conducted to illustrate the advantages of the proposed framework.  相似文献   

13.
14.
Image classification is to assign a category of an image and image annotation is to describe individual components of an image by using some annotation terms. These two learning tasks are strongly related. The main contribution of this paper is to propose a new discriminative and sparse topic model (DSTM) for image classification and annotation by combining visual, annotation and label information from a set of training images. The essential features of DSTM different from existing approaches are that (i) the label information is enforced in the generation of both visual words and annotation terms such that each generative latent topic corresponds to a category; (ii) the zero-mean Laplace distribution is employed to give a sparse representation of images in visual words and annotation terms such that relevant words and terms are associated with latent topics. Experimental results demonstrate that the proposed method provides the discrimination ability in classification and annotation, and its performance is better than the other testing methods (sLDA-ann, abc-corr-LDA, SupDocNADE, SAGE and MedSTC) for LabelMe, UIUC, NUS-WIDE and PascalVOC07 images.  相似文献   

15.
图像场景分类中视觉词包模型方法综述   总被引:1,自引:1,他引:0       下载免费PDF全文
目的关于图像场景分类中视觉词包模型方法的综述性文章在国内外杂志上还少有报导,为了使国内外同行对图像场景分类中的视觉词包模型方法有一个较为全面的了解,对这些研究工作进行了系统总结。方法在参考国内外大量文献的基础上,对现有图像场景分类(主要指针对单一图像场景的分类)中出现的各种视觉词包模型方法从低层特征的选择与局部图像块特征的生成、视觉词典的构建、视觉词包特征的直方图表示、视觉单词优化等多方面加以总结和比较。结果回顾了视觉词包模型的发展历程,对目前存在的多种视觉词包模型进行了归纳,比较常见方法各自的优缺点,总结了视觉词包模型性能评价方法,并对目前常用的标准场景库进行汇总,同时给出了各自所达到的最高精度。结论图像场景分类中视觉词包模型方法的研究作为计算机视觉领域方兴未艾的热点研究领域,在国内外研究中取得了不少进展,在计算机视觉领域的研究也不再局限于直接应用模型描述图像内容,而是更多地考虑图像与文本的差异。虽然视觉词包模型在图像场景分类的应用中还存在很多亟需解决的问题,但是这丝毫不能掩盖其研究的重要意义。  相似文献   

16.
The relationship between visual words and local feature (words structure) or the distribution among images (images structure) is important in feature encoding to approximate the intrinsically discriminative structure of images in the Bag-of-Words (BoW) model. However, in recently most methods, the intrinsic invariance in intra-class images is difficultly captured using words structure or images structure for large variability image classification. To overcome this limitation, we propose a local visual feature coding based on heterogeneous structure fusion (LVFC-HSF) that explores the nonlinear relationship between words structure and images structure in feature space, as follows. First, we utilize high-order topology to describe the dependence of the visual words, and use the distance measurement based on the local feature to represent the distribution of images. Then, we construct the unitedly optimal framework according to the relevance between words structure and images structure to solve the projection matrix of local feature and the weight coefficient, which can exploit the nonlinear relationship of heterogeneous structure to balance their interaction. Finally, we adopt the improving fisher kernel(IFK) to fit the distribution of the projected features for obtaining the image feature. The experimental results on ORL, 15 Scenes, Caltech 101 and Caltech 256 demonstrate that heterogeneous structure fusion significantly enhances the intrinsic structure construction, and consequently improves the classification performance in these data sets.  相似文献   

17.
18.
场景理解是机器人在多样化环境中自主执行任务的前提,而场景发现是场景理解的一个重要内容.由于具体场景在空间和时间上存在连续性,可以假定移动机器人在某一段时间内处于同一场景,并且属于同一场景的图像序列的视觉观感是相似的,因此提出无需先验知识的增量式室外场景发现,通过分层词袋模型建立图像和场景的联系,使得场景发现过程更加类似人类认知模式.对于机器人实时获取的每一副图像,首先将其分块,然后利用动态聚类算法增量式地得到相应的低层词典,并据此词典提取高层词袋模型特征,接下来,再用另一动态聚类算法增量式地完成场景发现,从而判断当前图像属于一个已经历场景,或未经历场景,直到发现新场景.实验结果证明,该方法能够在没有先验知识的情况下有效完成自主场景发现.  相似文献   

19.
图像搜索中重要的问题之一是如何有效地对搜索结果进行排序.现有图像搜索引擎的排序模型一般都基于相关文本而没有考虑图像的视觉特征.由于文本特征有时并不能很好地匹配图像的内容,所以搜索结果中会包含被错误排序的图像.针对该问题已经提出了视觉重排序方法,通过视觉信息来精炼基于文本的搜索结果.然而视觉重排序带来的性能提升有限,主要原因是基于文本的搜索结果中的错误会传播到视觉重排序阶段.本文基于排序学习的框架提出一个联合文本和视觉特征的图像排序学习模型,同时考虑了视觉和文本特征来进行排序学习,避免了视觉重排序中的错误传播.实验结果表明本文提出的排序模型显著地好于现有的重排序方法.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号