首页 | 本学科首页   官方微博 | 高级检索  
     

MHW蒙古文脱机手写数据库及其应用
引用本文:范道尔吉,高光来,武慧娟.MHW蒙古文脱机手写数据库及其应用[J].中文信息学报,2018,32(1):89-95.
作者姓名:范道尔吉  高光来  武慧娟
作者单位:1.内蒙古大学 计算机学院,内蒙古 呼和浩特 010021;2.内蒙古大学 电子信息工程学院,内蒙古 呼和浩特 010021
基金项目:内蒙古自治区自然科学基金(2016MS0603)
摘    要:建立公开、权威的蒙古文手写数据库是研究和开发蒙古文手写识别系统的基础。该文在蒙古文编码、构词和语法的研究基础上,公开了一个蒙古文大词汇量脱机手写数据库MHW,其中训练集由5 000个单词构成,每个词采集了20个样本,共包含10万样本,测试集Ⅰ包含5 000样本,测试集Ⅱ包含14 085样本。该文利用蒙古文文字长度可变特征研究了自动错误检测算法,提高了字库的可靠性。在三种常用手写识别模型上评估了字库的性能,其中基于循环神经网络的模型表现出最佳性能,在字典受限条件下测试集Ⅰ的词错误率达到2.20%,测试集Ⅱ达到了5.55%。

关 键 词:蒙古文  手写识别  字库  HMM  LSTM  

MHW Mongolian Offline Handwritten Dataset and Its Application
FAN Daoerji,GAO Guanglai,WU Huijuan.MHW Mongolian Offline Handwritten Dataset and Its Application[J].Journal of Chinese Information Processing,2018,32(1):89-95.
Authors:FAN Daoerji  GAO Guanglai  WU Huijuan
Affiliation:1. College of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia 010021, China; 2. College of Electronic Information Engineering, Inner Mongolia University, Hohhot, Inner Mongolia 010021, China
Abstract:A public well-recognized Mongolian offline handwritten database is the basis for the research and development of Mongolian handwriting recognition system. Based on the research on Mongolian coding, word formation and grammar, a large-vocabulary Mongolian offline handwritten database (MHW) is constructed, which contains 100000 pieces of Mongolian words, i.e. 20 samples for each of 5000 words. The test set I contains 5000 samples and test set II contains 14085 samples. An automatic error detection algorithm is applied, which is based on the variable length of each Mongolian word. The performance of MHW is validated on three propular handwriting recognition models, among which the Recurrent Neural Network based model shows best performance of 2.20% on test set I and 5.55% on test set II with constrained dictionary.
Keywords:Mongolian  handwriting recognition  dataset  HMM  LSTM  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号