首页 | 本学科首页   官方微博 | 高级检索  
     

朝汉混排古籍的文字切分方法
引用本文:刘星辰,金小峰. 朝汉混排古籍的文字切分方法[J]. 计算机工程与应用, 2020, 56(11): 135-141. DOI: 10.3778/j.issn.1002-8331.1902-0119
作者姓名:刘星辰  金小峰
作者单位:延边大学 计算机科学与技术学科智能信息处理研究室,吉林 延吉 133002
基金项目:吉林省教育厅"十三五"科学技术项目;延边大学世界一流学科建设培育项目
摘    要:为解决朝鲜语古籍数字化中朝汉文种混排字符切分困难的问题,提出一种朝鲜语古籍图像的文字切分算法。针对古籍列与列之间存在不连续间隔线、倾斜或者粘连等问题,提出一种基于连通域投影的列切分方法。利用连通域的删除、合并、拆分等操作对文字进行切分。使用一种多步切分法完成了具有文字大小不一,横向、纵向混合排版特点图像的字符切分工作。对于粘连字,采用改进的滴水算法进行有效切分。实验结果表明所提出的算法能够很好地完成朝、汉文种混排,文字大小不一,排版情况复杂的朝鲜语古籍图像的文字切分工作。该算法的列切分准确率为97.69%,字切分准确率为87.79%。

关 键 词:古籍数字化  朝鲜语古籍  列切分  字符切分

Characters Segmentation Method of Historical Documents Mixed in Korean and Chinese
LIU Xingchen,JIN Xiaofeng. Characters Segmentation Method of Historical Documents Mixed in Korean and Chinese[J]. Computer Engineering and Applications, 2020, 56(11): 135-141. DOI: 10.3778/j.issn.1002-8331.1902-0119
Authors:LIU Xingchen  JIN Xiaofeng
Affiliation:Intelligent Information Processing Laboratory, Department of Computer Science & Technology, Yanbian University, Yanji, Jilin 133002, China
Abstract:To solve the character segmentation problem for Korean historical document digitization, the paper proposes an effective character segmentation algorithm. In the algorithm, it first divides the document according to columns based on connected component rule and projection method which can handle the scenario of discontinuity separator lines, skew or joined characters contained in Korean historical documents. And then, the characters are segmented by employing the operation of deletion, merging and splitting on the connected components. It uses a multi-step technique which makes full use of the characteristics of different character sizes, horizontal and vertical mixed arrangement in the text image to complete this segmentation. For connected characters, an improved drop fall algorithm is adopted to get effective segmentation. The experimental results show that the proposed algorithm can effectively accomplish the segmentation of Korean history documents which have multi-language, different character size and complex arrangement. In the dataset, the accuracy of column segmentation and character segmentation can achieve 97.69% and 87.79% separately.
Keywords:ancient books digitalization  Korean historical documents  column segmentation  character segmentation  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号