期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Scene categorization via contextual visual words

Jianzhao Qin^{Author Vitae} Nelson H.C. Yung Author Vitae 《Pattern recognition》2010,43(5):1874-1888

In this paper, we propose a novel scene categorization method based on contextual visual words. In the proposed method, we extend the traditional ‘bags of visual words’ model by introducing contextual information from the coarser scale and neighborhood regions to the local region of interest based on unsupervised learning. The introduced contextual information provides useful information or cue about the region of interest, which can reduce the ambiguity when employing visual words to represent the local regions. The improved visual words representation of the scene image is capable of enhancing the categorization performance. The proposed method is evaluated over three scene classification datasets, with 8, 13 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results show that the proposed method achieves 90.30%, 87.63% and 85.16% recognition success for Dataset 1, 2 and 3, respectively, which significantly outperforms the methods based on the visual words that only represent the local information in the statistical manner. We also compared the proposed method with three representative scene categorization methods. The result confirms the superiority of the proposed method. 相似文献

2.

Incremental visual objects clustering with the growing vocabulary tree

Zhenyong Fu Hongtao Lu Wenbin Li 《Multimedia Tools and Applications》2012,56(3):535-552

相似文献

3.

Toward a higher-level visual representation for content-based image retrieval

Ismail El sayad Jean Martinet Thierry Urruty Chabane Djeraba 《Multimedia Tools and Applications》2012,60(2):455-482

相似文献

4.

Building descriptive and discriminative visual codebook for large-scale image applications 总被引：1，自引：0，他引：1

Qi Tian Shiliang Zhang Wengang Zhou Rongrong Ji Bingbing Ni Nicu Sebe 《Multimedia Tools and Applications》2011,51(2):441-477

相似文献

5.

Toward a higher-level visual representation for object-based image retrieval

Yan-Tao Zheng Shi-Yong Neo Tat-Seng Chua Qi Tian 《The Visual computer》2009,25(1):13-23

相似文献

6.

Comparing compact codebooks for visual categorization

《Computer Vision and Image Understanding》2010,114(4):450-462

In the face of current large-scale video libraries, the practical applicability of content-based indexing algorithms is constrained by their efficiency. This paper strives for efficient large-scale video indexing by comparing various visual-based concept categorization techniques. In visual categorization, the popular codebook model has shown excellent categorization performance. The codebook model represents continuous visual features by discrete prototypes predefined in a vocabulary. The vocabulary size has a major impact on categorization efficiency, where a more compact vocabulary is more efficient. However, smaller vocabularies typically score lower on classification performance than larger vocabularies. This paper compares four approaches to achieve a compact codebook vocabulary while retaining categorization performance. For these four methods, we investigate the trade-off between codebook compactness and categorization performance. We evaluate the methods on more than 200 h of challenging video data with as many as 101 semantic concepts. The results allow us to create a taxonomy of the four methods based on their efficiency and categorization performance. 相似文献

7.

Latent visual context learning for web image applications

Wengang Zhou Qi Tian Yijuan Lu Linjun Yang Houqiang Li 《Pattern recognition》2011,44(10-11):2263-2273

Recently, image representation based on bag-of-visual-words (BoW) model has been popularly applied in image and vision domains. In BoW, a visual codebook of visual words is defined, usually by clustering local features, to represent any novel image with the occurrence of its contained visual words. Given a set of images, we argue that the significance of each image is determined by the significance of its contained visual words. Traditionally, the significances of visual words are defined by term frequency-inverse document frequency (tf-idf), which cannot necessarily capture the intrinsic visual context. In this paper, we propose a new scheme of latent visual context learning (LVCL). The visual context among images and visual words is formulated from latent semantic context and visual link graph analysis. With LVCL, the importance of visual words and images will be distinguished from each other, which will facilitate image level applications, such as image re-ranking and canonical image selection.We validate our approach on text-query based search results returned by Google Image. Experimental results demonstrate the effectiveness and potentials of our LVCL in applications of image re-ranking and canonical image selection, over the state-of-the-art approaches. 相似文献

8.

Latent mixture vocabularies for object categorization and segmentation

Diane Larlus Frédéric Jurie 《Image and vision computing》2009,27(5):523-534

相似文献

9.

基于最长公共视觉词串的图像检索方法

下载免费PDF全文

苗军崔嵩段立娟张璇许少武《计算机工程与应用》2018,54(15):192-196

词袋模型是图像检索中的一种关键技术。词袋模型中每张图像表示为视觉词在码本中的频率直方图。这样的检索方式忽视了视觉词间对于图像表示很重要的空间信息。提出一种全新的基于最长公共视觉词串的图像检索方法。词串的提取基于视觉词间的拓扑关系,包含很多图像的空间信息。在Holiday数据集上的实验结果表明提出的方法提升了词袋模型的检索效果。相似文献

10.

Universal and adapted vocabularies for generic visual categorization 总被引：2，自引：0，他引：2

Perronnin F 《IEEE transactions on pattern analysis and machine intelligence》2008,30(7):1243-1256

相似文献

11.

用于视觉词语生成的概率预测器

下载免费PDF全文

史淼晶徐蕊鑫许超《中国图象图形学报》2013,18(6):706-710

视觉词语的产生是基于字袋模型的图像检索中的重要一环:根据已知的视觉词典,查询图像特征被映射到词典中相应的视觉词语。提出一种新的基于空间相关性的快速视觉词语产生算法。统计视觉词典中任意两个词语在数据库中的共生次数,构建视觉词语共生表。利用共生表,建立一种新的概率预测器来辅助预测已知词语的近邻词语。将预测器与快速近似最近邻查找算法结合,在标准图像检索数据库上进行实验测试,相比较传统的树形搜索算法或哈希算法,新算法在时间效率上获得明显提高。相似文献

12.

Scene classification using a hybrid generative/discriminative approach

Bosch A Zisserman A Muñoz X 《IEEE transactions on pattern analysis and machine intelligence》2008,30(4):712-727

相似文献

13.

Region Contextual Visual Words for scene categorization

Shuoyan Liu De Xu Songhe Feng 《Expert systems with applications》2011,38(9):11591-11597

This paper proposes a method for scene categorization by integrating region contextual information into the popular Bag-of-Visual-Words approach. The Bag-of-Visual-Words approach describes an image as a bag of discrete visual words, where the frequency distributions of these words are used for image categorization. However, the traditional visual words suffer from the problem when faced these patches with similar appearances but distinct semantic concepts. The drawback stems from the independently construction each visual word. This paper introduces Region-Conditional Random Fields model to learn each visual word depending on the rest of the visual words in the same region. Comparison with the traditional Conditional Random Fields model, there are two areas of novelty. First, the initial label of each patch is automatically defined based on its visual feature rather than manually labeling with semantic labels. Furthermore, the novel potential function is built under the region contextual constraint. The experimental results on the three well-known datasets show that Region Contextual Visual Words indeed improves categorization performance compared to traditional visual words. 相似文献

14.

Learning descriptive visual representation for image classification and annotation

Zhiwu Lu Liwei Wang 《Pattern recognition》2015

相似文献

15.

Learning natural scene categories by selective multi-scale feature extraction

Alessandro Perina Marco Cristani Vittorio Murino 《Image and vision computing》2010

Natural scene categorization from images represents a very useful task for automatic image analysis systems. In the literature, several methods have been proposed facing this issue with excellent results. Typically, features of several types are clustered so as to generate a vocabulary able to describe in a multi-faceted way the considered image collection. This vocabulary is formed by a discrete set of visual codewords whose co-occurrence and/or composition allows to classify the scene category. A common drawback of these methods is that features are usually extracted from the whole image, actually disregarding whether they derive properly from the natural scene to be classified or from foreground objects, possibly present in it, which are not peculiar for the scene. As quoted by perceptual studies, objects present in an image are not useful to natural scene categorization, indeed bringing an important source of clutter, in dependence of their size. 相似文献

16.

Multiple visual concept discovery using concept-based visual word clustering

Jun-Bin Yeh Chung-Hsien Wu Shi-Xin Mai 《Multimedia Systems》2013,19(4):381-393

In recent research, visual concept discovery was used to fill the semantic gap for representing the visual content. However, multiple concepts in an image generally degrade the discovery accuracy. In this paper, a Concept-based Visual Word Clustering (CVWC) method is proposed to discover multiple concepts from an image without pre-segmented training images. The CVWC is based on prior knowledge of concepts, which are trained from meta-text of web images. First, concepts are obtained by clustering the visual words in the regions extracted from image segmentation. A concept-based genetic algorithm (CBGA) is applied for searching the near-optimal clusters according to the visual words (VWs) in a concept and the co-occurrence probability of two concepts. The clustering procedure is also performed on the neighboring VWs to discover all the regions for concept representation. A concept extension method (CE) is further applied for iteratively updating the discovered concepts from the clustered results. In the experiments on the application to video retrieval, the mAP of the proposed CVWC method based on CBGA and CE obtained satisfactory improvements of 0.04 and 0.06, compared to pixel-based image segmentation approach and conventional concept model approach for the category “nation defense,” and 0.06 and 0.05 for the category “ecology,” respectively. 相似文献

17.

Visual synonyms for landmark image retrieval

Efstratios Gavves Cees G.M. Snoek Arnold W.M. Smeulders 《Computer Vision and Image Understanding》2012,116(2):238-249

相似文献

18.

Feature grouping and local soft match for mobile visual search 总被引：1，自引：0，他引：1

Xianglong Liu Bo LangYi Xu Bo Cheng 《Pattern recognition letters》2012,33(3):239-246

More powerful mobile devices stimulate mobile visual search to become a popular and unique image retrieval application. A number of challenges come up with such application, resulting from appearance variations in mobile images. Performance of state-of-the-art image retrieval systems is improved using bag-of-words approaches. However, for visual search by mobile images with large variations, there are at least two critical issues unsolved: (1) the loss of features discriminative power due to quantization; and (2) the underuse of spatial relationships among visual words. To address both issues, this paper presents a novel visual search method based on feature grouping and local soft match, which considers properties of mobile images and couples visual and spatial information consistently. First features of the query image are grouped using both matched visual features and their spatial relationships; and then grouped features are softly matched to alleviate quantization loss. An efficient score scheme is devised to utilize inverted file index and compared with vocabulary-guided pyramid kernels. Finally experiments on Stanford mobile visual search database and a collected database with more than one million images show that the proposed method achieves promising improvement over the approach with a vocabulary tree, especially when large variations exist in query images. 相似文献

19.

Recognizing in the depth: Selective 3D Spatial Pyramid Matching Kernel for object and scene categorization

Carolina Redondo-Cabrera Roberto J. López-SastreJavier Acevedo-Rodríguez Saturnino Maldonado-Bascón 《Image and vision computing》2014

相似文献

20.

Tutor-based learning of visual categories using different levels of supervision

《Computer Vision and Image Understanding》2010,114(5):564-573

In recent years we have seen lots of strong work in visual recognition, dialogue interpretation and multi-modal learning that is targeted at provide the building blocks to enable intelligent robots to interact with humans in a meaningful way and even continuously evolve during this process. Building systems that unify those components under a common architecture has turned out to be challenging, as each approach comes with it’s own set of assumptions, restrictions, and implications.For example, the impact of recent progress on visual category recognition has been limited from a perspective of interactive systems. Reasons for this are diverse. We identify and address two major challenges in order to integrate modern techniques for visual categorization in an interactive learning system: reducing the number of required labelled training examples and dealing with potentially erroneous input.Today’s object categorization methods use either supervised or unsupervised training methods. While supervised methods tend to produce more accurate results, unsupervised methods are highly attractive due to their potential to use far more and unlabeled training data. We proposes a novel method that uses unsupervised training to obtain visual groupings of objects and a cross-modal learning scheme to overcome inherent limitations of purely unsupervised training. The method uses a unified and scale-invariant object representation that allows to handle labeled as well as unlabeled information in a coherent way. First experiments demonstrate the ability of the system to learn object category models from many unlabeled observations and a few dialogue interactions that can be ambiguous or even erroneous. 相似文献