首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件随机场的中文分词算法改进
引用本文:顾佼佼,杨志宏,姜文志,胡文萱.基于条件随机场的中文分词算法改进[J].太赫兹科学与电子信息学报,2012,10(2):184-187.
作者姓名:顾佼佼  杨志宏  姜文志  胡文萱
作者单位:1. 海军航空工程学院兵器科学与技术系,山东烟台,264001
2. 海军装备部驻武汉地区军事代表局,湖北武汉,430064
3. 海军航空工程学院外训系,山东烟台,264001
摘    要:在中文分词领域,基于字标注的方法得到广泛应用,通过字标注分词问题可转换为序列标注问题,现在分词效果最好的是基于条件随机场(CRFs)的标注模型。作战命令的分词是进行作战指令自动生成的基础,在将CRFs模型应用到作战命令分词时,时间和空间复杂度非常高。为提高效率,对模型进行分析,根据特征选择算法选取特征子集,有效降低分词的时间与空间开销。利用CRFs置信度对分词结果进行后处理,进一步提高分词精确度。实验结果表明,特征选择算法及分词后处理方法可提高中文分词识别性能。

关 键 词:中文分词  条件随机场  特征选择  置信度
收稿时间:2011/5/24 0:00:00
修稿时间:2011/8/23 0:00:00

Improvement on CRFs-based Chinese word segmentation algorithm
GU Jiao-jiao,YANG Zhi-hong,JIANG Wen-zhi and HU Wen-xuan.Improvement on CRFs-based Chinese word segmentation algorithm[J].Journal of Terahertz Science and Electronic Information Technology,2012,10(2):184-187.
Authors:GU Jiao-jiao  YANG Zhi-hong  JIANG Wen-zhi and HU Wen-xuan
Affiliation:1b(1a.Department of Ordnance Science and Technology;1b.Department of Foreign Training,Naval Aeronautical and Astronautical University,Yantai Shandong 264001,China;2.Military Representatives Bureau of NED in Wuhan,Wuhan Hubei 430064,China)
Abstract:In Chinese word segmentation fields,the most widely used method is character-based tagging,which reformulates segmentation task to a sequence tagging task.The Conditional Random Fields(CRFs) tagger is the best tagger which can achieve state-of-the-art performance.The segmentation of the command orders is one of the basics of the auto-generation of command orders.Yet when using the model for command orders segmentation,problems of bad time and space efficiency are encountered.The model is analyzed and feature subsets are selected by using the feature selection algorithm,which cut the overhead of time and space effectively and improve the efficiency of the model.Then a novel post-process using CRFs confidence is presented to further improve performance.By combining the feature selection method and the confidence-based post-process,great improvement is achieved and the experimental results are satisfactory.
Keywords:Chinese word segmentation  Conditional Random Fields  feature selection  confidence
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《太赫兹科学与电子信息学报》浏览原始摘要信息
点击此处可从《太赫兹科学与电子信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号