首页 | 本学科首页   官方微博 | 高级检索  
     

基于SVM区域构造的复杂中文版面分析
引用本文:党兴,龚声蓉,刘全.基于SVM区域构造的复杂中文版面分析[J].计算机工程,2010,36(12):200-203.
作者姓名:党兴  龚声蓉  刘全
作者单位:苏州大学计算机科学与技术学院,苏州,215006
基金项目:国家自然科学基金资助项目(60873116);教育部科研基金资助重点项目(205059);江苏省高校自然科学基金资助项目(07KJD520186)
摘    要:针对现有的版面分析算法对参数的敏感性以及弱适用性等缺陷,提出基于SVM区域构造的复杂中文文档版面分析算法。该算法通过选取最能代表区域字符特征的连通区(种子连通区)作为测试的第一特征,利用具有强学习和泛化能力的支持向量机实现区域构造,在构造的区域中运用投影快速判断文档阅读顺序。实验结果表明,该方法具有更好的适应性,对复杂的中文版面有满意的分析结果。

关 键 词:种子连通区  支持向量机  区域构造  投影

Complex Chinese Document Layout Analysis Based on SVM Region Formation
DANG Xing,GONG Sheng-rong,LIU Quan.Complex Chinese Document Layout Analysis Based on SVM Region Formation[J].Computer Engineering,2010,36(12):200-203.
Authors:DANG Xing  GONG Sheng-rong  LIU Quan
Affiliation:(School of Computer Science & Technology, Soochow University, Suzhou 215006)
Abstract:Most existing algorithms for document layout analysis are sensitive to the parameters and have weak applicability. In order to make up these deficiencies, this paper presents an algorithm of region formation based on Support Vector Machine(SVM) for analyzing Chinese document. In the proposed technique, it selects connected components called seed connected components which can be used to represent regions as the first feature for training. SVM is employed to form region since it has strong learning and generalization ability. The technique decides the reading order by exploiting the projection method. Extensive experimental results show that the proposed algorithm is more effective to analyze different kinds of document layout than the state-of-the-art methods.
Keywords:seed connected component  Support Vector Machine(SVM)  region formation  projection
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号