基于Boosting的半结构化信息抽取 Semi-structured Text Information Extraction Based on Boosting Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Boosting的半结构化信息抽取

引用本文：	刘椿年,宋霞.基于Boosting的半结构化信息抽取[J].北京工业大学学报,2005,31(2):199-203.

作者姓名：	刘椿年宋霞

作者单位：	1.北京工业大学计算机学院多媒体与智能软件技术实验室, 北京 100022

基金项目：	国家自然科学基金;60173014;

摘要：	为了对半结构化文本实现自动抽取信息.介绍了一种基于Boosting算法的信息抽取方法,它能够自动对一个训练例生成规则,将该规则应用于正例集并改变正例集权重分布,找到权重最大的正例生成下一条规则.给出了一种能描述不符合英文词法的词的模式匹配约束.试验表明:在特征简单的抽取规则学习中,该方法精确度与召回率可达100%.在特征比较复杂的抽取规则学习中,该方法F₁评估值也能达到80%以上.
关键词：	Boosting算法抽取规则半结构化文本
文章编号：	0254-0037(2005)02-0199-05
收稿时间：	2003-11-10
修稿时间：	2003年11月10日
Semi-structured Text Information Extraction Based on Boosting Algorithm

LIU Chun-nian,SONG Xia.Semi-structured Text Information Extraction Based on Boosting Algorithm[J].Journal of Beijing Polytechnic University,2005,31(2):199-203.

Authors:	LIU Chun-nian SONG Xia

Affiliation:	1.Multimedia and Intelligent Software Technology Lab, College of Computer Science, Beijing University of Technology, Beijing 100022, China

Abstract:	A new information extraction method which is based on Boosting algorithm is provided. It can automatically generate a rule based on an training instance. This rule is applied to training set and change the probability distribution on the weights of positive examples. Next instance will be selected from training set based on this distribution. A constraint named mode-match which can describe words that do not accord with lexical rules is provided too. As experiments show, for the texts with simple characters, both recall and precision can be achieved to 100%. Even for the texts with complex characters, the evaluation of F1 can be achieved to 80%.

Keywords:	Boosting algorithm extraction rule semi-structured text
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《北京工业大学学报》浏览原始摘要信息
	点击此处可从《北京工业大学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏