首页 | 本学科首页   官方微博 | 高级检索  
     

基于字典树语言模型的专业课查询文本校对方法
引用本文:李丹阳,赵亚慧,罗梦江,崔荣一.基于字典树语言模型的专业课查询文本校对方法[J].延边大学理工学报,2020,0(3):260-264.
作者姓名:李丹阳  赵亚慧  罗梦江  崔荣一
作者单位:延边大学 工学院, 吉林 延吉 133002
摘    要:针对中文文本校对技术中存在的校对准确率较低的问题,提出了一种基于字典树模型的专业课查询文本校对方法.首先,通过计算错误文本与匹配文本间的编辑距离对错误关键词进行模糊匹配; 其次,采用字典树语言模型建立搜索树,以提高查询效率.最后,通过对比不同文本相似度阈值下的校对效果选取最佳文本相似度阈值.在最佳阈值下(0.5),将本文模型与传统的拼音模型和N -gram模型进行问句校对对比显示,本文方法的准确率(77.91%)、召回率(67%)、F值(72.04%)比传统的拼音模型校正方法分别提高了5.69%、23.67% 和11.57%,比N -gram模型校正方法分别提高了0.64%、10.33%和7.89%.因此,本文提出的方法在专业课查询文本校对方面具有很好的应用价值.

关 键 词:字典树  文本校对  语言模型  自动纠正

Query text proofreading method of professional courses based on trie tree language model
LI Danyang,ZHAO Yahui,LUO Mengjiang,CUI Rongyi.Query text proofreading method of professional courses based on trie tree language model[J].Journal of Yanbian University (Natural Science),2020,0(3):260-264.
Authors:LI Danyang  ZHAO Yahui  LUO Mengjiang  CUI Rongyi
Affiliation:College of Engineering, Yanbian University, Yanji 133002, China
Abstract:Aiming at the problem of low accuracy in Chinese text proofreading technology, a method of text query and proofreading is proposed for professional courses based on trie tree model. Firstly, the error keywords were fuzzy matched by calculating the edit distance between the error text and the matching text. Then, the trie tree language model was used to build the search tree to improve query efficiency. Finally, by comparing the proofreading effect under different text similarity thresholds, the best text similarity threshold was selected. Under the best threshold(0.5), the model was compared with the traditional Pinyin model and N -gram model in question proofreading. The accuracy rate(77.91%), recall rate(67%)and F value(72.04%)of the proposed method are 5.69%, 23.67% and 11.57% higher than those of the traditional Pinyin model correction method, and 0.64%, 10.33% and 7.89% higher than that of the N-gram model correction method. Therefore, the method proposed in this paper has good application value in the text query and proofreading of professional courses.
Keywords:trie tree  text proofreading  language model  automatic correction
点击此处可从《延边大学理工学报》浏览原始摘要信息
点击此处可从《延边大学理工学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号