首页 | 本学科首页   官方微博 | 高级检索  
     

基于渐进式丰富词典的分词方法研究
引用本文:杨柳,袁方,霍亮.基于渐进式丰富词典的分词方法研究[J].计算机工程与应用,2006,42(32):164-166.
作者姓名:杨柳  袁方  霍亮
作者单位:1. 河北大学,数学与计算机科学学院,河北,保定,071002;河北大学,经济学院,河北,保定,071002
2. 河北大学,数学与计算机科学学院,河北,保定,071002
3. 保定金融高等专科学校,计算机系,河北,保定,071000
基金项目:河北省科技攻关项目;河北省教育厅科研项目
摘    要:由于现代社会飞速发展,一些新的名词不断出现,在已有的字符串匹配的分词方法中,大部分的词典是固定的,如果出现新的词,那么就不能被正确识别出来。由此该文提出了渐进式丰富词典的分词方法,把那些不能正确分出来的字符串,利用统计词频的方法记录下来,如果词频达到一定阈值,就可以把它认为是新词,可以把它加入到词典中,使得词典动态的增加。实验证明,该方法在保证分词速度不受影响的基础上,可以提高分词的精度。

关 键 词:渐进式丰富词典  字符串匹配分词方法  统计分词方法
文章编号:1002-8331(2006)32-0164-03
收稿时间:2006-02-01
修稿时间:2006-02-01

Word Segmentation Method Research Based on Enriching Dictionary Gradually
YANG Liu,YUAN Fang,HUO Liang.Word Segmentation Method Research Based on Enriching Dictionary Gradually[J].Computer Engineering and Applications,2006,42(32):164-166.
Authors:YANG Liu  YUAN Fang  HUO Liang
Affiliation:1.College of Mathematics and Computer Science, Hebei University, Baoding, Hebei 071002, China; 2.Hebei University, Baoding, Hebei 071002, China;3.Dept.of Computer Science and Tech., Baoding College of Finance, Baoding, Hebei 071000, China
Abstract:With the fast development of modern society,many new words appear continuously.In the existing word segmentation methods based on matching strings,most of them dictionaries are changeless.If a new word appears,it can't be recognized accurately.So this paper puts forward the method of enriching words to dictionary gradually.It registers the strings of being segmented mistakenly by statistics method.If the word frequency exceeds the threshold,it can be taken for a new word and it will be put into the dictionary.Then the dictionary can be enriched dynamically.Experiment shows this method can improve the segmentation accuracy while retaining its speed.
Keywords:enriching dictionary  gradually the matching method  the statistic method
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号