首页 | 本学科首页   官方微博 | 高级检索  
     

用于提高谷歌图像搜索结果的二分类器在线学习方法
引用本文:万玉钗,刘峡壁,韩菲霏,童坤琦,刘宇.用于提高谷歌图像搜索结果的二分类器在线学习方法[J].自动化学报,2014,40(8):1699-1708.
作者姓名:万玉钗  刘峡壁  韩菲霏  童坤琦  刘宇
作者单位:1.北京理工大学计算机学院 北京 100081
摘    要:对于基于关键词的图像检索,利用检索结果的视觉相似性学习二分类器有望成为改善检索结果的最有效途径之一. 为改善搜索引擎的搜索结果,本文提出一种算法框架并且基于此框架着重研究训练数据选择这一关键问题. 训练数据选择过程由两个阶段组成:1)训练数据初始化以开始分类器学习过程;2)分类器迭代学习过程中的动态数据选择. 对于初始训练数据的选择,我们探讨了基于聚类和基于排序两种方法,并且对比了自动训练数据选择与人工标注的结果. 对于动态数据选择,我们比较了支持向量机和基于最大最小后验伪概率的贝叶斯分类器的分类效果. 组合上述两个阶段的不同方法,我们得到了8种不同的算法,并将其用于谷歌搜索引擎进行基于关键词的图像检索. 实验结果证明,如何从含有噪声的搜索结果中选择训练数据是搜索结果改善的关键问题. 实验显示我们的方法能够有效的改善谷歌搜索的结果,尤其是排序在前的结果. 尽早为用户提供更相关的结果能够更大程度的减少用户逐个翻页查看结果的工作. 另外,如何使自动训练数据选择与人工标注媲美仍是需要继续研究的一个问题.

关 键 词:图像搜索引擎    基于内容的图像检索    检索结果改善    图像分类器学习    训练数据选择
收稿时间:2012-09-24

Online Learning a Binary Classifier for Improving Google Image Search Results
WAN Yu-Chai,LIU Xia-Bi,HAN Fei-Fei,TONG Kun-Qi,LIU Yu.Online Learning a Binary Classifier for Improving Google Image Search Results[J].Acta Automatica Sinica,2014,40(8):1699-1708.
Authors:WAN Yu-Chai  LIU Xia-Bi  HAN Fei-Fei  TONG Kun-Qi  LIU Yu
Affiliation:1.Beijing Laboratory of Intelligent Information Technology, School of Computre Science, Beijing Institute of Technology, Bei-jing 100081, China
Abstract:It is promising to improve web image search results through exploiting the results visual contents for learning a binary classifier which is used to refine the results relevance degrees to the given query. This paper proposes an algorithm framework as a solution to this problem and investigates the key issue of training data selection under the framework. The training data selection process is divided into two stages: initial selection for triggering the classifier learning and dynamic selection in the iterations of classifier learning. We investigate two main ways of initial training data selection, including clustering based and ranking based, and compare automatic training data selection schemes with manual manner. Furthermore, support vector machines and the max-min pseudo-probability(MMP) based Bayesian classifier are employed to support image classification, respectively. By varying these factors in the framework, we implement eight algorithms and tested them on keyword based image search results from Google search engine. The experimental results confirm that how to select the training data from noisy search results is really a key issue in the problem considered in this paper and show that the proposed algorithm is effective to improve Google search results, especially at top ranks, thus is helpful to reduce the user labor in finding the desired images by browsing the ranking in depth. Even so, it is still worth meditative to make automatic training data selection scheme better towards perfect human annotation.
Keywords:Image search engine  content-based image retrieval(CBIR)  search results improvement  image classifier learning  training data selection
本文献已被 CNKI 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号