首页 | 本学科首页   官方微博 | 高级检索  
     

面向军事领域的中文分词技术研究
引用本文:李健龙,王盼卿,韩琪宇. 面向军事领域的中文分词技术研究[J]. 计算机与现代化, 2018, 0(11): 115. DOI: 10.3969/j.issn.1006-2475.2018.11.020
作者姓名:李健龙  王盼卿  韩琪宇
摘    要:在分词模型跨领域分词时,其性能会有明显的下降。由于标注军队遗留系统开发文档语料的工作比较复杂,本文提出n-gram与词典相结合的中文分词领域自适应方法。该方法通过提取目标语料的n-gram特征训练适应特征领域的分词模型,然后利用领域词典对分词结果进行逆向最大匹配的校正。实验结果表明,在军队遗留系统相关文档语料上,该方法训练的分词模型将F值提高了12.4%。

关 键 词:条件随机场   n-gram特征   领域词典  
收稿时间:2018-11-23

Research on Chinese Word Segmentation Technology for Military Field
LI Jian-long,WANG Pan-qing,HAN Qi-yu. Research on Chinese Word Segmentation Technology for Military Field[J]. Computer and Modernization, 2018, 0(11): 115. DOI: 10.3969/j.issn.1006-2475.2018.11.020
Authors:LI Jian-long  WANG Pan-qing  HAN Qi-yu
Abstract:When the word segmentation model cross-field word segmentation, the performance will be significantly reduced. Due to the complexity of annotating the corpus work of the legacy system development documents of the army, this paper proposes an adaptation method of Chinese word segmentation in combination with n-gram and domain dictionary. By extracting the n-gram features of the target corpus, the method adapts to the word segmentation model in the feature domain. Then, the domain dictionary is used to perform reverse maximum matching correction on the word segmentation results. Experimental results show that in the corpus of documents related to the legacy system of the army, the word segmentation model trained by this method improves the F-measure by 12.4%.
Keywords:   n-gram characteristics; domain dictionary  
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号