首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we propose a novel scene categorization method based on contextual visual words. In the proposed method, we extend the traditional ‘bags of visual words’ model by introducing contextual information from the coarser scale and neighborhood regions to the local region of interest based on unsupervised learning. The introduced contextual information provides useful information or cue about the region of interest, which can reduce the ambiguity when employing visual words to represent the local regions. The improved visual words representation of the scene image is capable of enhancing the categorization performance. The proposed method is evaluated over three scene classification datasets, with 8, 13 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results show that the proposed method achieves 90.30%, 87.63% and 85.16% recognition success for Dataset 1, 2 and 3, respectively, which significantly outperforms the methods based on the visual words that only represent the local information in the statistical manner. We also compared the proposed method with three representative scene categorization methods. The result confirms the superiority of the proposed method.  相似文献   

2.
3.
4.
5.
6.
In the face of current large-scale video libraries, the practical applicability of content-based indexing algorithms is constrained by their efficiency. This paper strives for efficient large-scale video indexing by comparing various visual-based concept categorization techniques. In visual categorization, the popular codebook model has shown excellent categorization performance. The codebook model represents continuous visual features by discrete prototypes predefined in a vocabulary. The vocabulary size has a major impact on categorization efficiency, where a more compact vocabulary is more efficient. However, smaller vocabularies typically score lower on classification performance than larger vocabularies. This paper compares four approaches to achieve a compact codebook vocabulary while retaining categorization performance. For these four methods, we investigate the trade-off between codebook compactness and categorization performance. We evaluate the methods on more than 200 h of challenging video data with as many as 101 semantic concepts. The results allow us to create a taxonomy of the four methods based on their efficiency and categorization performance.  相似文献   

7.
Recently, image representation based on bag-of-visual-words (BoW) model has been popularly applied in image and vision domains. In BoW, a visual codebook of visual words is defined, usually by clustering local features, to represent any novel image with the occurrence of its contained visual words. Given a set of images, we argue that the significance of each image is determined by the significance of its contained visual words. Traditionally, the significances of visual words are defined by term frequency-inverse document frequency (tf-idf), which cannot necessarily capture the intrinsic visual context. In this paper, we propose a new scheme of latent visual context learning (LVCL). The visual context among images and visual words is formulated from latent semantic context and visual link graph analysis. With LVCL, the importance of visual words and images will be distinguished from each other, which will facilitate image level applications, such as image re-ranking and canonical image selection.We validate our approach on text-query based search results returned by Google Image. Experimental results demonstrate the effectiveness and potentials of our LVCL in applications of image re-ranking and canonical image selection, over the state-of-the-art approaches.  相似文献   

8.
9.
词袋模型是图像检索中的一种关键技术。词袋模型中每张图像表示为视觉词在码本中的频率直方图。这样的检索方式忽视了视觉词间对于图像表示很重要的空间信息。提出一种全新的基于最长公共视觉词串的图像检索方法。词串的提取基于视觉词间的拓扑关系,包含很多图像的空间信息。在Holiday数据集上的实验结果表明提出的方法提升了词袋模型的检索效果。  相似文献   

10.
11.
视觉词语的产生是基于字袋模型的图像检索中的重要一环:根据已知的视觉词典,查询图像特征被映射到词典中相应的视觉词语。提出一种新的基于空间相关性的快速视觉词语产生算法。统计视觉词典中任意两个词语在数据库中的共生次数,构建视觉词语共生表。利用共生表,建立一种新的概率预测器来辅助预测已知词语的近邻词语。将预测器与快速近似最近邻查找算法结合,在标准图像检索数据库上进行实验测试,相比较传统的树形搜索算法或哈希算法,新算法在时间效率上获得明显提高。  相似文献   

12.
13.
This paper proposes a method for scene categorization by integrating region contextual information into the popular Bag-of-Visual-Words approach. The Bag-of-Visual-Words approach describes an image as a bag of discrete visual words, where the frequency distributions of these words are used for image categorization. However, the traditional visual words suffer from the problem when faced these patches with similar appearances but distinct semantic concepts. The drawback stems from the independently construction each visual word. This paper introduces Region-Conditional Random Fields model to learn each visual word depending on the rest of the visual words in the same region. Comparison with the traditional Conditional Random Fields model, there are two areas of novelty. First, the initial label of each patch is automatically defined based on its visual feature rather than manually labeling with semantic labels. Furthermore, the novel potential function is built under the region contextual constraint. The experimental results on the three well-known datasets show that Region Contextual Visual Words indeed improves categorization performance compared to traditional visual words.  相似文献   

14.
15.
Natural scene categorization from images represents a very useful task for automatic image analysis systems. In the literature, several methods have been proposed facing this issue with excellent results. Typically, features of several types are clustered so as to generate a vocabulary able to describe in a multi-faceted way the considered image collection. This vocabulary is formed by a discrete set of visual codewords whose co-occurrence and/or composition allows to classify the scene category. A common drawback of these methods is that features are usually extracted from the whole image, actually disregarding whether they derive properly from the natural scene to be classified or from foreground objects, possibly present in it, which are not peculiar for the scene. As quoted by perceptual studies, objects present in an image are not useful to natural scene categorization, indeed bringing an important source of clutter, in dependence of their size.  相似文献   

16.
In recent research, visual concept discovery was used to fill the semantic gap for representing the visual content. However, multiple concepts in an image generally degrade the discovery accuracy. In this paper, a Concept-based Visual Word Clustering (CVWC) method is proposed to discover multiple concepts from an image without pre-segmented training images. The CVWC is based on prior knowledge of concepts, which are trained from meta-text of web images. First, concepts are obtained by clustering the visual words in the regions extracted from image segmentation. A concept-based genetic algorithm (CBGA) is applied for searching the near-optimal clusters according to the visual words (VWs) in a concept and the co-occurrence probability of two concepts. The clustering procedure is also performed on the neighboring VWs to discover all the regions for concept representation. A concept extension method (CE) is further applied for iteratively updating the discovered concepts from the clustered results. In the experiments on the application to video retrieval, the mAP of the proposed CVWC method based on CBGA and CE obtained satisfactory improvements of 0.04 and 0.06, compared to pixel-based image segmentation approach and conventional concept model approach for the category “nation defense,” and 0.06 and 0.05 for the category “ecology,” respectively.  相似文献   

17.
18.
Feature grouping and local soft match for mobile visual search   总被引:1,自引:0,他引:1  
More powerful mobile devices stimulate mobile visual search to become a popular and unique image retrieval application. A number of challenges come up with such application, resulting from appearance variations in mobile images. Performance of state-of-the-art image retrieval systems is improved using bag-of-words approaches. However, for visual search by mobile images with large variations, there are at least two critical issues unsolved: (1) the loss of features discriminative power due to quantization; and (2) the underuse of spatial relationships among visual words. To address both issues, this paper presents a novel visual search method based on feature grouping and local soft match, which considers properties of mobile images and couples visual and spatial information consistently. First features of the query image are grouped using both matched visual features and their spatial relationships; and then grouped features are softly matched to alleviate quantization loss. An efficient score scheme is devised to utilize inverted file index and compared with vocabulary-guided pyramid kernels. Finally experiments on Stanford mobile visual search database and a collected database with more than one million images show that the proposed method achieves promising improvement over the approach with a vocabulary tree, especially when large variations exist in query images.  相似文献   

19.
20.
In recent years we have seen lots of strong work in visual recognition, dialogue interpretation and multi-modal learning that is targeted at provide the building blocks to enable intelligent robots to interact with humans in a meaningful way and even continuously evolve during this process. Building systems that unify those components under a common architecture has turned out to be challenging, as each approach comes with it’s own set of assumptions, restrictions, and implications.For example, the impact of recent progress on visual category recognition has been limited from a perspective of interactive systems. Reasons for this are diverse. We identify and address two major challenges in order to integrate modern techniques for visual categorization in an interactive learning system: reducing the number of required labelled training examples and dealing with potentially erroneous input.Today’s object categorization methods use either supervised or unsupervised training methods. While supervised methods tend to produce more accurate results, unsupervised methods are highly attractive due to their potential to use far more and unlabeled training data. We proposes a novel method that uses unsupervised training to obtain visual groupings of objects and a cross-modal learning scheme to overcome inherent limitations of purely unsupervised training. The method uses a unified and scale-invariant object representation that allows to handle labeled as well as unlabeled information in a coherent way. First experiments demonstrate the ability of the system to learn object category models from many unlabeled observations and a few dialogue interactions that can be ambiguous or even erroneous.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号