首页 | 本学科首页   官方微博 | 高级检索  
     

中文分词及词性标注一体化模型研究
引用本文:佟晓筠,宋国龙,刘强,张俐,姜伟. 中文分词及词性标注一体化模型研究[J]. 计算机科学, 2007, 34(9): 174-175
作者姓名:佟晓筠  宋国龙  刘强  张俐  姜伟
作者单位:哈尔滨工业大学(威海)计算机科学与技术学院,威海,264209;东北大学信息科学与工程学院,沈阳,110004;辽东学院计算中心,丹东,118000
摘    要:本文应用N-最短路径法,构造了一种中文自动分词和词性自动标注一体化处理的模型,在分词阶段召回N个最佳结果作为候选集,最终的结果会在未登录词识别和词性标注之后,从这N个最有潜力的候选结果中选优得到,并基于该模型实现了一个中文自动分词和词性自动标注一体化处理的中文词法分析器。初步的开放测试证明,该分析器的分词准确率和词性标注准确率分别达到98.1%和95.07%。

关 键 词:中文分词  词性标注  N-最短路径法

Research on the Model of Integrating Chinese Word Segmentation with Part-of-speech Tagging
TONG Xiao-Jun,SONG Guo-Long,LIU Qiang,ZHANG Li,JIANG Wei. Research on the Model of Integrating Chinese Word Segmentation with Part-of-speech Tagging[J]. Computer Science, 2007, 34(9): 174-175
Authors:TONG Xiao-Jun  SONG Guo-Long  LIU Qiang  ZHANG Li  JIANG Wei
Abstract:In this paper, we present a model integrating Chinese word segment with part-of-speech tagging. In the early stage, reserves the top N segmentation results as candidates. After Unknown words recognized and POS tagging finished, we get the final result by select form the top N segmentation candidates. We also develop a Chinese lexical analyzer based on this model. The primary experiment proved that the overall accuracy of the proposed analyzer is 98. 1 for segmentation and 95.7% for POS tagging respectively.
Keywords:Chinese word segmentation   Part-of-speech tagging   N-shortest paths method
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号