首页 | 本学科首页   官方微博 | 高级检索  
     

应用粗糙集理论提取特征的词性标注模型
引用本文:姜维,王晓龙,关毅,徐志明.应用粗糙集理论提取特征的词性标注模型[J].高技术通讯,2006,16(10):996-1000.
作者姓名:姜维  王晓龙  关毅  徐志明
作者单位:哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001
基金项目:国家高技术研究发展计划(863计划) , 国家自然科学基金
摘    要:针对词性标注中的复杂特征提取问题,应用粗糙集理论(rough sets),有效地挖掘了包括长距离特征在内的复杂特征,并有效地处理了语料库噪声问题.最后,将这些特征融合于最大熵模型中,训练时按模型整体性能为其分配权重.开放实验表明:增加粗规则后获得96.29%的标注精度,相比原有模型提高了0.83%.

关 键 词:粗糙集  特征提取  词性标注
收稿时间:2005-09-06
修稿时间:2005-09-06

Applying rough sets to feature extraction in POS tagging
Jiang Wei,Wang Xiaolong,Guan Yi,Xu Zhiming.Applying rough sets to feature extraction in POS tagging[J].High Technology Letters,2006,16(10):996-1000.
Authors:Jiang Wei  Wang Xiaolong  Guan Yi  Xu Zhiming
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, 150001 Harbin
Abstract:In order to extract the complicated contextual features in the part-of-speech tagging task, a novel approach based on rough sets is presented in this paper to collect the complex and long-distance features from the corpus effectively, and to overcome the noise and inconsistent sample problem existing in the corpus. In addition, these rough rules are added into the maximum entropy model. The experiment achieved the precision of 96.29 %, and increased the tagging precision by O. 83 % compared with the former model.
Keywords:rough sets  feature extraction  POS tagging
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号