基于组块及记忆的词性自动标注 Block and memory based part of speech tagging期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于组块及记忆的词性自动标注

引用本文：	石晶,戴国忠. 基于组块及记忆的词性自动标注[J]. 吉林大学学报(工学版), 2006, 36(4): 560-563

作者姓名：	石晶戴国忠

作者单位：	中国科学院,软件研究所,人机交互技术与智能信息处理实验室,北京,100080;中国科学院,软件研究所,人机交互技术与智能信息处理实验室,北京,100080

摘要：	基于组块及记忆的模型(BMM)采用与传统方法明显不同的标注思路,以汉语中的整句为处理单元,从组块出发,立足于单个词汇,分析更为丰富的上下文语境知识,并借助知网词典记忆词性集合,同时采用渐增式的机械学习方式获取参数值。对于棘手的稀疏数据问题只简单地设置平伏常数加以平滑,最后利用少量人工规则修正标注结果。实验表明,该模型的封闭式测试准确率将近99%,开放式测试准确率为95%以上。
关键词：	人工智能词性自动标注基于组块及记忆的模型渐增式学习
文章编号：	1671-5497（2006）04-0560-04
收稿时间：	2005-11-12
修稿时间：	2005-11-12
Block and memory based part of speech tagging

Shi Jing,Dai Guo-zhong. Block and memory based part of speech tagging[J]. Journal of Jilin University:Eng and Technol Ed, 2006, 36(4): 560-563

Authors:	Shi Jing Dai Guo-zhong

Affiliation:	Computer Human Interaction and Intelligent Information Processing Laboratory, Institute of Software, The Chinese Academy of Sciences, Beijing 100080, China

Abstract:	Automatic part-of-speech tagging is often applied to natural language processing.The approach of Block and Memory based Model(BMM) is other than that of the traditional models.BMM takes a whole Chinese sentence as a processing unit.Each word is considered respectively in a more abundant and informative context environment.The lexicon of WordNet is employed to store the tag sets,and,to improve the efficiency,the incremental learning method is applied to obtain parameters.A constant is given to smooth the sparse data and some handcrafting rules are used to amend the results.Experiments show that the accuracy of close test is about 99% and the accuracy of open test is higher than 95%.

Keywords:	artificial intelligence automatic part-of-speech tagging block and memory based model incremental learning
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《吉林大学学报(工学版)》浏览原始摘要信息
	点击此处可从《吉林大学学报(工学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏