首页 | 本学科首页   官方微博 | 高级检索  
     

基于汉盲对照语料库和深度学习的汉盲自动转换
引用本文:蔡佳,王向东,唐李真,崔晓娟,刘宏,钱跃良.基于汉盲对照语料库和深度学习的汉盲自动转换[J].中文信息学报,2019,33(4):60-67.
作者姓名:蔡佳  王向东  唐李真  崔晓娟  刘宏  钱跃良
作者单位:1.中国科学院 计算技术研究所 移动计算与新型终端北京市重点实验室,北京 100190;
2.中国科学院大学,北京 100049;
3.中国盲文出版社,北京 100142
基金项目:国家科技支撑计划课题(2014BAK15B02)
摘    要:汉盲转换是指将汉字文本自动转换为对应的盲文文本,其在盲文出版、盲人教育等领域具有重要应用价值,但当前已有系统性能难以满足实用需求。该文提出一种基于汉盲对照语料库和深度学习的汉盲自动转换方法,首次将深度学习技术引入该领域,采用按照盲文规则分词的汉字文本训练双向LSTM模型,从而实现准确度高的盲文分词。为支持模型训练,提出了从不精确对照的汉字和盲文文本中自动匹配抽取语料的方法,构建了规模为27万句、234万字、448万方盲文的篇章、句子、词语多级对照的汉盲语料库。实验结果表明,该文所提出的基于汉盲对照语料库和深度学习的汉盲转换方法准确率明显优于基于纯盲文语料库和传统机器学习模型的方法。

关 键 词:汉盲转换  中国盲文  盲文语料库  深度学习

A Deep Learning Method for Chinese-Braille Conversion Based onParallel Corpora
CAI Jia,WANG Xiangdong,TANG Lizhen,CUI Xiaojuan,LIU Hong,QIAN Yueliang.A Deep Learning Method for Chinese-Braille Conversion Based onParallel Corpora[J].Journal of Chinese Information Processing,2019,33(4):60-67.
Authors:CAI Jia  WANG Xiangdong  TANG Lizhen  CUI Xiaojuan  LIU Hong  QIAN Yueliang
Affiliation:1.Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2.University of Chinese Academy of Sciences, Beijing 100049, China;
3.China Braille Press, Beijing 100142, China
Abstract:The Chinese-Braille conversion can be applied to fields such as Braille publication, education for the blind, etc. This paper presents a deep learning solution to automatic Chinese-Braille conversion based on parallel corpora. A Bi-directional LSTM model is trained using segmented Chinese texts according to the Braille segmentation rules and achieves high accuracy of Braille word segmentation. In order to support the model training, this paper also presents a strategy of automatically generating a corpus from Chinese and braille texts with the same content, with alignments at article-level, sentence-level and word-level, totaling 270 000 sentences, 2.34 million Chinese characters, and 4.48 million Braille symbols. The experimental results show that the proposed method outperforms the existing models.
Keywords:Chinese-Braille conversion  Chinese Braille  Braille corpus  deep learning  
本文献已被 维普 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号