首页 | 本学科首页   官方微博 | 高级检索  
     

基于Boosting的半结构化信息抽取
引用本文:刘椿年,宋霞.基于Boosting的半结构化信息抽取[J].北京工业大学学报,2005,31(2):199-203.
作者姓名:刘椿年  宋霞
作者单位:1.北京工业大学 计算机学院 多媒体与智能软件技术实验室, 北京 100022
基金项目:国家自然科学基金;60173014;
摘    要:为了对半结构化文本实现自动抽取信息.介绍了一种基于Boosting算法的信息抽取方法,它能够自动对一个训练例生成规则,将该规则应用于正例集并改变正例集权重分布,找到权重最大的正例生成下一条规则.给出了一种能描述不符合英文词法的词的模式匹配约束.试验表明:在特征简单的抽取规则学习中,该方法精确度与召回率可达100%.在特征比较复杂的抽取规则学习中,该方法F1评估值也能达到80%以上.

关 键 词:Boosting算法  抽取规则  半结构化文本
文章编号:0254-0037(2005)02-0199-05
收稿时间:2003-11-10
修稿时间:2003年11月10日

Semi-structured Text Information Extraction Based on Boosting Algorithm
LIU Chun-nian,SONG Xia.Semi-structured Text Information Extraction Based on Boosting Algorithm[J].Journal of Beijing Polytechnic University,2005,31(2):199-203.
Authors:LIU Chun-nian  SONG Xia
Affiliation:1.Multimedia and Intelligent Software Technology Lab, College of Computer Science, Beijing University of Technology, Beijing 100022, China
Abstract:A new information extraction method which is based on Boosting algorithm is provided. It can automatically generate a rule based on an training instance. This rule is applied to training set and change the probability distribution on the weights of positive examples. Next instance will be selected from training set based on this distribution. A constraint named mode-match which can describe words that do not accord with lexical rules is provided too. As experiments show, for the texts with simple characters, both recall and precision can be achieved to 100%. Even for the texts with complex characters, the evaluation of F1 can be achieved to 80%.
Keywords:Boosting algorithm  extraction rule  semi-structured text
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《北京工业大学学报》浏览原始摘要信息
点击此处可从《北京工业大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号