首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件随机场的科研论文信息分层抽取*
引用本文:张玉芳,莫凌琳,熊忠阳,耿晓斐.基于条件随机场的科研论文信息分层抽取*[J].计算机应用研究,2009,26(10):3690-3693.
作者姓名:张玉芳  莫凌琳  熊忠阳  耿晓斐
作者单位:重庆大学,计算机学院,重庆,400030
基金项目:重庆市科委自然科学基金计划资助项目(2007BB2372) ; 中国博士后科学基金资助项目(20070420711)
摘    要:在利用条件随机场进行信息抽取时,单纯基于词或基于块的方法,不能充分利用上下文信息在恰当粒度上进行切分和抽取,因此提出了一种基于条件随机场的科研论文信息分层抽取方法,利用分隔符、换行符、行首字符等格式信息,结合条件随机场的特征函数,将文本切分成文本行、块或单个的词等恰当的层次,再采用L-BFGS算法学习模型参数并进行特定文本域的抽取。实验结果表明,该方法的抽取性能优于基于词或块的条件随机场模型的信息抽取方法。

关 键 词:信息抽取    条件随机场    分层

Hierarchical information extraction from research papers based on conditional random fields
ZHANG Yu-fang,MO Ling-lin,XIONG Zhong-yang,GENG Xiao-fei.Hierarchical information extraction from research papers based on conditional random fields[J].Application Research of Computers,2009,26(10):3690-3693.
Authors:ZHANG Yu-fang  MO Ling-lin  XIONG Zhong-yang  GENG Xiao-fei
Affiliation:(School of Computer Science, Chongqing University, Chongqing 400030, China)
Abstract:Current information extractions from research papers based on CRFs just segment text into total blocks or words, so can not fully utilize the context information to segment and extract them in the proper granularity.This paper proposed a hierarchical information extraction from research papers based on CRFs.The algorithm made use of the format information such as list separator, new line character and line header character, and combined them with the feature functions of CRFs to segment the text hierarchically into proper lines, blocks and words. Finally on different hierarchy applied the CRFs to the extraction information in special fields. Experimental results show that the proposed method possesses better performance than that based on the CRFs simply segments text into total blocks or words.
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号