首页 | 本学科首页   官方微博 | 高级检索  
     

融合上下文字符信息的泰语神经网络分词方法
引用本文:陶广奉,线岩团,王红斌,汪淑娟.融合上下文字符信息的泰语神经网络分词方法[J].计算机工程与科学,2018,40(5):943-949.
作者姓名:陶广奉  线岩团  王红斌  汪淑娟
基金项目:国家自然科学基金(61363044,61462054);云南省科技厅面上项目(2015FB135);云南省教育厅科学研究基金(2014Z021)
摘    要:自动分词是自然语言处理的关键基础技术。针对传统泰语统计分词方法特征模板复杂、搜索空间大的问题,提出融合上下文字符信息的泰语神经网络分词模型。该模型借助词分布表示方法,训练泰语字符表示向量,利用多层神经网络分类器实现泰语分词。基于InterBEST 2009泰语分词评测语料的实验结果表明,所提方法相较于条件随机场分词模型、Character-Cluster Hybrid 分词模型以及 GLR and N-gram 分词模型取得了更好的分词效果,分词准确率、召回率和F值分别达到了97.27%、99.26 %及98.26 %,相比条件随机场分词速度提高了112.78%。

关 键 词:泰语分词  神经网络模型  上下文字符信息  字符向量  
收稿时间:2016-11-18
修稿时间:2018-05-25

A context character feature based neural network model for Thai word segmentation
TAO Guang-feng,XIAN Yan-tuan,WANG Hong-bin,WANG Shu-juan.A context character feature based neural network model for Thai word segmentation[J].Computer Engineering & Science,2018,40(5):943-949.
Authors:TAO Guang-feng  XIAN Yan-tuan  WANG Hong-bin  WANG Shu-juan
Affiliation:(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
Abstract:Automatic word segmentation is a fundamental technology of natural language processing. Aiming at the problem of complex feature template and large search space in the traditional Thai word segmentation method, this paper proposes a context character feature based neural network model for Thai word segmentation. The proposed model uses the word distribution table to train the word representation vector, and utilizes a multi-layer neural network classifier for Thai word segmentation. Experimental results on InterBEST 2009 Thai word evaluation corpus show that, compared with the conditional random field model, the Character-Cluster Hybrid segmentation model, and the GLR and N-gram segmentation model, our proposal achieves better performance. Word segmentation accuracy, recall ratio and F value reach 97.27%, 99.26% and 98.26%, respectively. Our model improves the segmentation speed by 112.78% in comparison to the conditional random field model.
Keywords:Thai word segmentation  neural network model  context character feature  characters vector  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号