首页 | 本学科首页   官方微博 | 高级检索  
     

基于约束条件随机场的Web数据语义标注
引用本文:董永权,李庆忠,丁艳辉,彭朝晖.基于约束条件随机场的Web数据语义标注[J].计算机研究与发展,2012,49(2):361-371.
作者姓名:董永权  李庆忠  丁艳辉  彭朝晖
作者单位:1. 山东大学计算机科学与技术学院 济南 250100;徐州师范大学计算机科学与技术学院 江苏徐州221116
2. 山东大学计算机科学与技术学院 济南 250100
基金项目:国家自然科学基金,江苏省自然科学基金,江苏省高校自然科学基金,山东省自然科学基金
摘    要:Web数据语义标注是Web信息抽取中的关键步骤.条件随机场是利用序列特征处理序列标注问题的经典方法.然而现有条件随机场模型无法综合利用已有的Web数据库信息和Web数据元素之间的逻辑关系,导致Web数据语义标注准确率不高.因此,提出一种约束条件随机场模型(CCRF).该模型通过引入可信约束和逻辑约束,有效利用了已有的Web数据库信息和Web数据元素之间的逻辑关系.为了克服现有条件随机场模型Viterbi推理方法无法综合利用这2类约束的不足,该模型采用整数线性规划推理方法,将两类约束同时引入推理过程.通过在多个领域的真实数据集上的实验结果表明,所提出的模型能够显著提高Web数据语义标注的性能,并且为Web信息抽取奠定了良好的基础.

关 键 词:语义标注  Web信息抽取  条件随机场  整数线性规划  Web数据集成

Constrained Conditional Random Fields for Semantic Annotation of Web Data
Dong Yongquan , Li Qingzhong , Ding Yanhui , Peng Zhaohui.Constrained Conditional Random Fields for Semantic Annotation of Web Data[J].Journal of Computer Research and Development,2012,49(2):361-371.
Authors:Dong Yongquan  Li Qingzhong  Ding Yanhui  Peng Zhaohui
Affiliation:1(School of Computer Science and Technology,Shandong University,Jinan 250100) 2(School of Computer Science and Technology,Xuzhou Normal University,Xuzhou,Jiangsu 221116)
Abstract:Semantic annotation of Web data is a key step for Web information extraction.The goal of semantic annotation is to assign meaningful semantic labels to data elements of the extracted Web object.It is a hot research topic that has gained increasing attention all over the world in recent years.Conditional random fields are the state-of-the-art approaches taking the sequence characteristics to do better labeling.However,traditional conditional random fields can not simultaneously use existing Web databases and logical relationships among Web data elements,which lead to low precision of Web data semantic annotation.To solve the problems,this paper presents a constrained conditional random fields(CCRF) model to annotate Web data.The model incorporates confidence constraints and logical constraints to efficiently utilize existing Web databases and logical relationships among Web data elements.In order to solve the problem that the Viterbi inference approach of traditional CRF model can not simultaneously utilize two kinds of constraints,the model incorporates a novel inference procedure based on integer linear programming and extends CRF to naturally and efficiently support two kinds of constraints.Experimental results on a large number of real-world data collected from diverse domains show that the proposed approach significantly improves the accuracy of semantic annotation of Web data,and lays a solid foundation for Web information extraction.
Keywords:semantic annotation  Web information extraction  conditional random field  integer linear programming  Web data integration
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号