首页 | 本学科首页   官方微博 | 高级检索  
     

二次回溯中文分词方法*
引用本文:袁健,张劲松,马良b.二次回溯中文分词方法*[J].计算机应用研究,2009,26(9):3321-3323.
作者姓名:袁健  张劲松  马良b
作者单位:1. 上海理工大学,光电信息与计算机工程学院,上海,200093
2. 上海理工大学,管理学院,上海,200093
基金项目:上海市重点学科建设资助项目(T0502)
摘    要:在最大匹配法(MM)的基础上,提出了二次回溯中文分词方法。该方法首先对待切文本进行预处理,将文本分割成长度较短的细粒度文本;利用正向匹配、回溯匹配、尾词匹配、碎片检查来有效发现歧义字段;利用长词优先兼顾二词簇的方式对交集型歧义字段进行切分,并对难点的多链长交集型歧义字段进行有效发现和切分。从随机抽取的大量语料实验结果上证明了该方法的有效性。

关 键 词:中文分词    回溯匹配    交集型歧义    多链长    碎片检查

Two times backtracking chinese word segmentation method
YUAN Jian,ZHANG Jin-song,MA Liangb.Two times backtracking chinese word segmentation method[J].Application Research of Computers,2009,26(9):3321-3323.
Authors:YUAN Jian  ZHANG Jin-song  MA Liangb
Affiliation:a.School of Optical-Electrical & Computer Engineering;b.Business School;University of Shanghai for Science & Technology;Shanghai 200093;China
Abstract:This paper proposed two times backtracking Chinese word segmentation method based on the MM. The text was pretreatment by the method in the first, then cut the text into shorter lengths granular text. Found ambiguity field effective by forward matching method, backtracking matching, last words matching and debris inspection. Cut crossing ambiguity field by long term priorities and 2-words rules, and found the difficult and multi-linked crossing ambiguity field and cut effectively. The large number of randomly selected language materials being tested and results show that method is effective.
Keywords:Chinese word segmentation  backtracking matching  crossing ambiguity  multi-linked  debris inspection
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号