首页 | 本学科首页   官方微博 | 高级检索  
     

图像场景分类中视觉词包模型方法综述
引用本文:赵理君,唐娉,霍连志,郑柯. 图像场景分类中视觉词包模型方法综述[J]. 中国图象图形学报, 2014, 19(3): 333-343
作者姓名:赵理君  唐娉  霍连志  郑柯
作者单位:中国科学院遥感与数字地球研究所,中国科学院遥感与数字地球研究所,中国科学院遥感与数字地球研究所,中国科学院遥感与数字地球研究所
基金项目:国家高技术研究发展计划(863计划)
摘    要:目的关于图像场景分类中视觉词包模型方法的综述性文章在国内外杂志上还少有报导,为了使国内外同行对图像场景分类中的视觉词包模型方法有一个较为全面的了解,对这些研究工作进行了系统总结。方法在参考国内外大量文献的基础上,对现有图像场景分类(主要指针对单一图像场景的分类)中出现的各种视觉词包模型方法从低层特征的选择与局部图像块特征的生成、视觉词典的构建、视觉词包特征的直方图表示、视觉单词优化等多方面加以总结和比较。结果回顾了视觉词包模型的发展历程,对目前存在的多种视觉词包模型进行了归纳,比较常见方法各自的优缺点,总结了视觉词包模型性能评价方法,并对目前常用的标准场景库进行汇总,同时给出了各自所达到的最高精度。结论图像场景分类中视觉词包模型方法的研究作为计算机视觉领域方兴未艾的热点研究领域,在国内外研究中取得了不少进展,在计算机视觉领域的研究也不再局限于直接应用模型描述图像内容,而是更多地考虑图像与文本的差异。虽然视觉词包模型在图像场景分类的应用中还存在很多亟需解决的问题,但是这丝毫不能掩盖其研究的重要意义。

关 键 词:场景分类  视觉词包  低层特征  直方图表示
收稿时间:2013-07-09
修稿时间:2013-09-10

Review of the bag-of-visual-words models in image scene classification
Zhao Lijun,Tang Ping,Huo Lianzhi and Zheng Ke. Review of the bag-of-visual-words models in image scene classification[J]. Journal of Image and Graphics, 2014, 19(3): 333-343
Authors:Zhao Lijun  Tang Ping  Huo Lianzhi  Zheng Ke
Affiliation:Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences,,
Abstract:Objective: With the rapid development of computer multi-media technique, database technique and computer network technique, there have been more and more images to classify and label. Instead of using traditional manual mode, it has been a hot research field to use computer-aided automatic image scene classification technique. Among numerous image scene classification methods, the bag-of-visual-words (BOVW) model has become a widely adopted one, which, as a middle level feature, can narrow the gap between low level visual features and high level semantic features. However, reviews about BOVW model in image scene classification are rarely seen on journals at home and abroad, so in order to give a comprehensive understanding of this method to researchers in this field, this paper systematically summarizes these studies. Method: Based on numerous references about the BOVW model in image scene classification at home and abroad during the past almost ten years, this paper divides the general process of development of BOVW into five stages, that is, the stage of direct application of early bag-of-words model in image field, the stage of studying latent semantic information in BOVW model, the stage of studying spatial layout or structure information in BOVW model, the stage of studying context information in BOVW model, and the stage of optimizing visual word semantics and introducing new methods into BOVW model. Also, this paper sums up and compares various existing BOVW models in image scene classification in terms of local feature selection, feature generation of local image patches, visual vocabulary construction, histogram representation of bag of visual words feature, optimization of visual words, and so on. Result: The development history of BOVW and the research status of BOVW based image scene classification are reviewed, which gives a clear trail of the development of BOVW model; the numerous existing BOVW models are categorized according to their working mechanism; the advantages and disadvantages of commonly used methods are compared; the performance evaluation method for BOVW model is described and the commonly used standard scene databases are collected, with their best classification accuracies given separately. Conclusion: As a hot research field that is currently rising, studies of BOVW methods in image scene classification have produced quite a few research progress at home and abroad. The research in computer vision field has no longer been limited to directly applying original BOVW model to describe image content, and more and more differences between images and texts are considered. The urgent problems to be solved are as follows: the performance of BOVW will be greatly influenced when the bag of visual words are applied to the samples that are quite different from the training ones, while training new bag of visual words based on new training samples is very time and labor consuming; there is still no theoretical guide for determining the size of visual vocabulary; the relationship between visual words and semantics is still not fully exploited; the application of BOVW in special fields, such as high resolution remote sensing land-use scene classification, is far from satisfactory. Besides, based on these problems, there may be some research directions: constructing universal self-adaptive bag of visual words for different sample sets, automatically selecting optimal vocabulary size, adding more spatial layout and context information to BOVW and exploring latent semantic information in visual words, studying image visual grammars for image understanding, studying scene classification problems in images of special fields, such as high resolution remote sensing images, and investigating new well-characterized low level feature extraction algorithms to construct high level bag of visual words. To conclude, although there are still a number of urgent problems to be solved in the application of BOVW model based image scene classification, the important meanings of the studies of BOVW model cannot be covered up.
Keywords:scene classification   bag-of-visual-words   low-level feature   histogram representation
本文献已被 CNKI 等数据库收录!
点击此处可从《中国图象图形学报》浏览原始摘要信息
点击此处可从《中国图象图形学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号