首页 | 本学科首页   官方微博 | 高级检索  
     

基于遗传算法的汉语未登录词识别
引用本文:闫蓉,张蕾. 基于遗传算法的汉语未登录词识别[J]. 计算机应用与软件, 2008, 25(7)
作者姓名:闫蓉  张蕾
作者单位:内蒙古大学计算机学院,内蒙古,呼和浩特,010021;西北大学计算机科学系,陕西,西安,710069
摘    要:针对汉语分词处理中未登录词识别这一难点,提出一种应用遗传算法识别的新方法.该方法扩大了分词碎片容量,将未登录词识别问题看成二值分类问题,即在预处理后产生的分词碎片中,单字存在"可组合"和"不可组合"两类,采用遗传算法先将分词碎片中的单字词确定下来,然后将其余相邻单字组合,完成未登录词识别.实验结果表明,该方法可有效地解决未登录词识别问题,提高未登录词识别的精确率和召回率.

关 键 词:自然语言处理  未登录词识别  遗传算法

IDENTIFICATION OF CHINESE UNKNOWN WORDS BASED ON GENETIC ALGORITHM
Yan Rong,Zhang Lei. IDENTIFICATION OF CHINESE UNKNOWN WORDS BASED ON GENETIC ALGORITHM[J]. Computer Applications and Software, 2008, 25(7)
Authors:Yan Rong  Zhang Lei
Affiliation:Yan Rong1 Zhang Lei21(School of Computer Science,Inner Mongolia University,Huhhot 010021,Inner Mongolia,China)2(Department of Computer Science,Northwest University,Xi'an 710069,Shaanxi,China)
Abstract:A new recognition method by genetic algorithm is put forward in this paper against the difficult point of recognition of Chinese unknown words in words segmentation processing.This method expands the segmentation capacity and deals with the unknown words recognition problem as a binary classification problem,that is,after being pre-processed,single character words in segmentation fragments are divided into two categories:'combinable' and 'not combinable'.Genetic algorithm is used to determine the single cha...
Keywords:Natural language processing Unknown word recognition Genetic algorithm  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号