期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

赵永威周苑李弼程《计算机研究与发展》2016,53(5):1043-1052

基于视觉词典模型(bag of visual words model, BoVWM)的目标检索存在时间效率低、词典区分性不强的问题,以及由于空间信息的缺失及量化误差等导致的视觉语义分辨力不强的问题.针对这些问题,提出了基于词典优化与空间一致性度量的目标检索方法.首先,该方法引入E\\+2LSH(exact Euclidean locality sensitive hashing)过滤图像中的噪声和相似关键点,提高词典生成效率和质量;然后,引入卡方模型(chi-square model, CSM)移除词典中的视觉停用词增强视觉词典的区分性;最后,采用空间一致性度量准则进行目标检索并对初始结果进行K-近邻(K-nearest neighbors, K-NN)重排序.实验结果表明：新方法在一定程度上改善了视觉词典的质量,增强了视觉语义分辨能力,进而有效地提高目标检索性能. 相似文献

2.

构建近邻上下文的拷贝图像检索

下载免费PDF全文

杨醒龙姚金良王小华方小飞《中国图象图形学报》2017,22(8):1098-1105

目的以词袋模型为基础的拷贝图像检索方法是当前最有效的方法。然而,由于局部特征量化存在信息损失,导致视觉词汇区别能力不足和视觉词汇误匹配增加,从而影响了拷贝图像检索效果。针对视觉词汇的误匹配问题,提出一种基于近邻上下文的拷贝图像检索方法。该方法通过局部特征的上下文关系消除视觉词汇歧义,提高视觉词汇的区分度,进而提高拷贝图像的检索效果。方法首先,以距离和尺度关系选择图像中某局部特征点周围的特征点作为该特征点的上下文,选取的上下文中的局部特征点称为近邻特征点;再以近邻特征点的信息以及与该局部特征的关系为该局部特征构建上下文描述子;然后,通过计算上下文描述子的相似性对局部特征匹配对进行验证;最后,以正确匹配特征点的个数衡量图像间的相似性,并以此相似性选取若干候选图像作为返回结果。结果在Copydays图像库进行实验,与Baseline方法进行比较。在干扰图像规模为100 k时,相对于Baseline方法,mAP提高了63%。当干扰图像规模从100 k增加到1 M时,Baseline的mAP值下降9%,而本文方法下降3%。结论本文拷贝图像检索方法对图像编辑操作,如旋转、图像叠加、尺度变换以及裁剪有较高的鲁棒性。该方法可以有效地应用到图像防伪、图像去重等领域。相似文献

3.

基于多视觉码本的图像表示

宋彦蒋兵戴礼荣《模式识别与人工智能》2013,26(10):909-915

基于词袋模型的图像表示方法的有效性主要受限于局部特征的量化误差。文中提出一种基于多视觉码本的图像表示方法,通过综合考虑码本构建和编码方法这两个方面的因素加以改进。具体包括:1)多视觉码本构建,以迭代方式构建多个紧凑且具有互补性的视觉码本;2)图像表示,首先针对多码本的情况,依次从各码本中选择相应的视觉单词并采用线性回归估计编码系数,然后结合图像的空间金字塔结构形成最终的图像表示。在一些标准测试集合的图像分类结果验证文中方法的有效性。相似文献

4.

Improving Image Classification Using Semantic Attributes 总被引：1，自引：0，他引：1

Yu Su Frédéric Jurie 《International Journal of Computer Vision》2012,100(1):59-77

The Bag-of-Words (BoW) model??commonly used for image classification??has two strong limitations: on one hand, visual words are lacking of explicit meanings, on the other hand, they are usually polysemous. This paper proposes to address these two limitations by introducing an intermediate representation based on the use of semantic attributes. Specifically, two different approaches are proposed. Both approaches consist in predicting a set of semantic attributes for the entire images as well as for local image regions, and in using these predictions to build the intermediate level features. Experiments on four challenging image databases (PASCAL VOC 2007, Scene-15, MSRCv2 and SUN-397) show that both approaches improve performance of the BoW model significantly. Moreover, their combination achieves the state-of-the-art results on several of these image databases. 相似文献

5.

Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

《Computer Vision and Image Understanding》2009,113(3):405-414

Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and ontological aspects of visual words for video analysis. Two approaches, soft-weighting and constraint-based earth mover’s distance (CEMD), are proposed to model different aspects of visual word linguistics and proximity. In soft-weighting, visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin-to-bin histogram comparison. In CEMD, a cross-bin matching algorithm is formulated such that the ground distance measure considers the linguistic similarity of words. In particular, a BoW ontology which hierarchically specifies the hyponym relationship of words is constructed to assist the reasoning. We demonstrate soft-weighting and CEMD on two tasks: video semantic indexing and near-duplicate keyframe retrieval. Experimental results indicate that soft-weighting is superior to other popular weighting schemes such as term frequency (TF) weighting in large-scale video database. In addition, CEMD shows excellent performance compared to cosine similarity in near-duplicate retrieval. 相似文献

6.

Latent visual context learning for web image applications

Wengang Zhou Qi Tian Yijuan Lu Linjun Yang Houqiang Li 《Pattern recognition》2011,44(10-11):2263-2273

Recently, image representation based on bag-of-visual-words (BoW) model has been popularly applied in image and vision domains. In BoW, a visual codebook of visual words is defined, usually by clustering local features, to represent any novel image with the occurrence of its contained visual words. Given a set of images, we argue that the significance of each image is determined by the significance of its contained visual words. Traditionally, the significances of visual words are defined by term frequency-inverse document frequency (tf-idf), which cannot necessarily capture the intrinsic visual context. In this paper, we propose a new scheme of latent visual context learning (LVCL). The visual context among images and visual words is formulated from latent semantic context and visual link graph analysis. With LVCL, the importance of visual words and images will be distinguished from each other, which will facilitate image level applications, such as image re-ranking and canonical image selection.We validate our approach on text-query based search results returned by Google Image. Experimental results demonstrate the effectiveness and potentials of our LVCL in applications of image re-ranking and canonical image selection, over the state-of-the-art approaches. 相似文献

7.

Hai Thanh Mai Myoung Ho Kim 《Multimedia Tools and Applications》2014,72(1):331-360

Retrieving similar images based on its visual content is an important yet difficult problem. We propose in this paper a new method to improve the accuracy of content-based image retrieval systems. Typically, given a query image, existing retrieval methods return a ranked list based on the similarity scores between the query and individual images in the database. Our method goes further by relying on an analysis of the underlying connections among individual images in the database to improve this list. Initially, we consider each image in the database as a query and use an existing baseline method to search for its likely similar images. Then, the database is modeled as a graph where images are nodes and connections among possibly similar images are edges. Next, we introduce an algorithm to split this graph into stronger subgraphs, based on our notion of graph’s strength, so that images in each subgraph are expected to be truly similar to each other. We create for each subgraph a structure called integrated image which contains the visual features of all images in the subgraph. At query time, we compute the similarity scores not only between the query and individual database images but also between the query and the integrated images. The final similarity score of a database image is computed based on both its individual score and the score of the integrated image that it belongs to. This leads effectively to a re-ranking of the retrieved images. We evaluate our method on a common image retrieval benchmark and demonstrate a significant improvement over the traditional bag-of-words retrieval model. 相似文献

8.

Toward a higher-level visual representation for content-based image retrieval

Ismail El sayad Jean Martinet Thierry Urruty Chabane Djeraba 《Multimedia Tools and Applications》2012,60(2):455-482

相似文献

9.

Building descriptive and discriminative visual codebook for large-scale image applications 总被引：1，自引：0，他引：1

Qi Tian Shiliang Zhang Wengang Zhou Rongrong Ji Bingbing Ni Nicu Sebe 《Multimedia Tools and Applications》2011,51(2):441-477

相似文献

10.

Fuzzy bag of words for social image description

Yanshan Li Weiming Liu Qinghua Huang Xuelong Li 《Multimedia Tools and Applications》2016,75(3):1371-1390

相似文献

11.

Mobile multi-view object image search

Fatih Çalışır Muhammet Baştan Özgür Ulusoy Uğur Güdükbay 《Multimedia Tools and Applications》2017,76(10):12433-12456

High user interaction capability of mobile devices can help improve the accuracy of mobile visual search systems. At query time, it is possible to capture multiple views of an object from different viewing angles and at different scales with the mobile device camera to obtain richer information about the object compared to a single view and hence return more accurate results. Motivated by this, we propose a new multi-view visual query model on multi-view object image databases for mobile visual search. Multi-view images of objects acquired by the mobile clients are processed and local features are sent to a server, which combines the query image representations with early/late fusion methods and returns the query results. We performed a comprehensive analysis of early and late fusion approaches using various similarity functions, on an existing single view and a new multi-view object image database. The experimental results show that multi-view search provides significantly better retrieval accuracy compared to traditional single view search. 相似文献

12.

Learning non-metric visual similarity for image retrieval

《Image and vision computing》2019

Measuring visual similarity between two or more instances within a data distribution is a fundamental task in image retrieval. Theoretically, non-metric distances are able to generate a more complex and accurate similarity model than metric distances, provided that the non-linear data distribution is precisely captured by the system. In this work, we explore neural networks models for learning a non-metric similarity function for instance search. We argue that non-metric similarity functions based on neural networks can build a better model of human visual perception than standard metric distances. As our proposed similarity function is differentiable, we explore a real end-to-end trainable approach for image retrieval, i.e. we learn the weights from the input image pixels to the final similarity score. Experimental evaluation shows that non-metric similarity networks are able to learn visual similarities between images and improve performance on top of state-of-the-art image representations, boosting results in standard image retrieval datasets with respect standard metric distances. 相似文献

13.

Coherent Phrase Model for Efficient Image Near-Duplicate Retrieval

《Multimedia, IEEE Transactions on》2009,11(8):1434-1445

相似文献

14.

Robust visual tracking via bag of superpixels

Heng Fan Jinhai Xiang Liang Zhao 《Multimedia Tools and Applications》2016,75(14):8781-8798

The Bag of Words (BoW) model is one of the most popular and effective image representation methods and has been drawn increasing interest in computer vision filed. However, little attention is paid on it in visual tracking. In this paper, a visual tracking method based on Bag of Superpixels (BoS) is proposed. In BoS, the training samples are oversegmented to generate enough superpixel patches. Then K-means algorithm is performed on the collected patches to form visual words of the target and a superpixel codebook is constructed. Finally the tracking is accomplished via searching for the highest likelihood between candidates and codebooks within Bayesian inference framework. In this process, an effective updating scheme is adopted to help our tracker resist occlusions and deformations. Experimental results demonstrate that the proposed method outperforms several state-of-the-art trackers. 相似文献

15.

一种基于视觉词典优化和查询扩展的图像检索方法

柯圣财李弼程陈刚赵永威魏晗《自动化学报》2018,44(1):99-105

视觉词典方法（Bag of visual words,BoVW）是当前图像检索领域的主流方法,然而,传统的视觉词典方法存在计算量大、词典区分性不强以及抗干扰能力差等问题,难以适应大数据环境.针对这些问题,本文提出了一种基于视觉词典优化和查询扩展的图像检索方法.首先,利用基于密度的聚类方法对SIFT特征进行聚类生成视觉词典,提高视觉词典的生成效率和质量;然后,通过卡方模型分析视觉单词与图像目标的相关性,去除不包含目标信息的视觉单词,增强视觉词典的分辨能力;最后,采用基于图结构的查询扩展方法对初始检索结果进行重排序.在Oxford5K和Paris6K图像集上的实验结果表明,新方法在一定程度上提高了视觉词典的质量和语义分辨能力,性能优于当前主流方法. 相似文献

16.

深度多模态融合服装风格检索

下载免费PDF全文

苏卓柯司博王若梅周凡《中国图象图形学报》2021,26(4):857-871

目的服装检索方法是计算机视觉与自然语言处理领域的研究热点,其包含基于内容与基于文本的两种查询模态。然而传统检索方法通常存在检索效率低的问题,且很少研究关注服装在风格上的相似性。为解决这些问题,本文提出深度多模态融合的服装风格检索方法。方法提出分层深度哈希检索模型,基于预训练的残差网络ResNet（residual network）进行迁移学习,并把分类层改造成哈希编码层,利用哈希特征进行粗检索,再用图像深层特征进行细检索。设计文本分类语义检索模型,基于LSTM（long short-term memory）设计文本分类网络以提前分类缩小检索范围,再以基于doc2vec提取的文本嵌入语义特征进行检索。同时提出相似风格上下文检索模型,其参考单词相似性来衡量服装风格相似性。最后采用概率驱动的方法量化风格相似性,并以最大化该相似性的结果融合方法作为本文检索方法的最终反馈。结果在Polyvore数据集上,与原始ResNet模型相比,分层深度哈希检索模型的top5平均检索精度提高11.6%,检索速度提高2.57 s/次。与传统文本分类嵌入模型相比,本文分类语义检索模型的top5查准率提高29.96%,检索速度提高16.53 s/次。结论提出的深度多模态融合的服装风格检索方法获得检索精度与检索速度的提升,同时进行了相似风格服装的检索使结果更具有多样性。相似文献

17.

The effects of multiple query evidences on social image retrieval

Zhiyong Cheng Jialie Shen Haiyan Miao 《Multimedia Systems》2016,22(4):509-523

System performance assessment and comparison are fundamental for large-scale image search engine development. This article documents a set of comprehensive empirical studies to explore the effects of multiple query evidences on large-scale social image search. The search performance based on the social tags, different kinds of visual features and their combinations are systematically studied and analyzed. To quantify the visual query complexity, a novel quantitative metric is proposed and applied to assess the influences of different visual queries based on their complexity levels. Besides, we also study the effects of automatic text query expansion with social tags using a pseudo relevance feedback method on the retrieval performance. Our analysis of experimental results shows a few key research findings: (1) social tag-based retrieval methods can achieve much better results than content-based retrieval methods; (2) a combination of textual and visual features can significantly and consistently improve the search performance; (3) the complexity of image queries has a strong correlation with retrieval results’ quality—more complex queries lead to poorer search effectiveness; and (4) query expansion based on social tags frequently causes search topic drift and consequently leads to performance degradation. 相似文献

18.

A visual word weighting scheme based on emerging itemsets for video annotation

Guiguang Ding Jianmin Wang 《Information Processing Letters》2010,110(16):692-696

The method based on Bag-of-visual-Words (BoW) deriving from local keypoints has recently appeared promising for video annotation. Visual word weighting scheme has critical impact to the performance of BoW method. In this paper, we propose a new visual word weighting scheme which is referred as emerging patterns weighting (EP-weighting). The EP-weighting scheme can efficiently capture the co-occurrence relationships of visual words and improve the effectiveness of video annotation. The proposed scheme firstly finds emerging patterns (EPs) of visual keywords in training dataset. And then an adaptive weighting assignment is performed for each visual word according to EPs. The adjusted BoW features are used to train classifiers for video annotation. A systematic performance study on TRECVID corpus containing 20 semantic concepts shows that the proposed scheme is more effective than other popular existing weighting schemes. 相似文献

19.

Visual feature coding based on heterogeneous structure fusion for image classification

《Information Fusion》2017

The relationship between visual words and local feature (words structure) or the distribution among images (images structure) is important in feature encoding to approximate the intrinsically discriminative structure of images in the Bag-of-Words (BoW) model. However, in recently most methods, the intrinsic invariance in intra-class images is difficultly captured using words structure or images structure for large variability image classification. To overcome this limitation, we propose a local visual feature coding based on heterogeneous structure fusion (LVFC-HSF) that explores the nonlinear relationship between words structure and images structure in feature space, as follows. First, we utilize high-order topology to describe the dependence of the visual words, and use the distance measurement based on the local feature to represent the distribution of images. Then, we construct the unitedly optimal framework according to the relevance between words structure and images structure to solve the projection matrix of local feature and the weight coefficient, which can exploit the nonlinear relationship of heterogeneous structure to balance their interaction. Finally, we adopt the improving fisher kernel(IFK) to fit the distribution of the projected features for obtaining the image feature. The experimental results on ORL, 15 Scenes, Caltech 101 and Caltech 256 demonstrate that heterogeneous structure fusion significantly enhances the intrinsic structure construction, and consequently improves the classification performance in these data sets. 相似文献

20.

Supervised learning of Gaussian mixture models for visual vocabulary generation

Basura Fernando Elisa Fromont Damien Muselet Marc Sebban 《Pattern recognition》2012,45(2):897-907

相似文献