首页 | 本学科首页   官方微博 | 高级检索  
     

基于组块及记忆的词性自动标注
引用本文:石晶,戴国忠. 基于组块及记忆的词性自动标注[J]. 吉林大学学报(工学版), 2006, 36(4): 560-563
作者姓名:石晶  戴国忠
作者单位:中国科学院,软件研究所,人机交互技术与智能信息处理实验室,北京,100080;中国科学院,软件研究所,人机交互技术与智能信息处理实验室,北京,100080
摘    要:基于组块及记忆的模型(BMM)采用与传统方法明显不同的标注思路,以汉语中的整句为处理单元,从组块出发,立足于单个词汇,分析更为丰富的上下文语境知识,并借助知网词典记忆词性集合,同时采用渐增式的机械学习方式获取参数值。对于棘手的稀疏数据问题只简单地设置平伏常数加以平滑,最后利用少量人工规则修正标注结果。实验表明,该模型的封闭式测试准确率将近99%,开放式测试准确率为95%以上。

关 键 词:人工智能  词性自动标注  基于组块及记忆的模型  渐增式学习
文章编号:1671-5497(2006)04-0560-04
收稿时间:2005-11-12
修稿时间:2005-11-12

Block and memory based part of speech tagging
Shi Jing,Dai Guo-zhong. Block and memory based part of speech tagging[J]. Journal of Jilin University:Eng and Technol Ed, 2006, 36(4): 560-563
Authors:Shi Jing  Dai Guo-zhong
Affiliation:Computer Human Interaction and Intelligent Information Processing Laboratory, Institute of Software, The Chinese Academy of Sciences, Beijing 100080, China
Abstract:Automatic part-of-speech tagging is often applied to natural language processing.The approach of Block and Memory based Model(BMM) is other than that of the traditional models.BMM takes a whole Chinese sentence as a processing unit.Each word is considered respectively in a more abundant and informative context environment.The lexicon of WordNet is employed to store the tag sets,and,to improve the efficiency,the incremental learning method is applied to obtain parameters.A constant is given to smooth the sparse data and some handcrafting rules are used to amend the results.Experiments show that the accuracy of close test is about 99% and the accuracy of open test is higher than 95%.
Keywords:artificial intelligence   automatic part-of-speech tagging   block and memory based model  incremental learning
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《吉林大学学报(工学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号