首页 | 本学科首页   官方微博 | 高级检索  
     

基于语义的林产品贸易Web信息抽取算法
引用本文:李嘉,徐前,王梓,陈钊.基于语义的林产品贸易Web信息抽取算法[J].计算机工程与应用,2014(19):199-204.
作者姓名:李嘉  徐前  王梓  陈钊
作者单位:北京林业大学 信息学院,北京,100083
基金项目:中央高校基本科研业务费专项资金(No.BLYX200928)。
摘    要:针对现有Web信息抽取技术存在的准确率不高,自动化程度较低以及通用性较弱等诸多不足,结合林产品贸易Web信息推送中对信息源进行结构化存储的需要,提出一种新的基于语义的林产品贸易Web信息抽取算法;充分分析并利用林产品贸易Web信息的特征,结合语义识别的基本原理,构建林产品贸易语义词典,同时利用所需抽取的目标信息在网页中的布局特征,结合信息熵理论提出了基于语义信息熵的目标信息自动定位抽取方法,以抽取需要的目标信息,并以一种结构化的形式存储于数据库中。通过实验对实际林产品贸易Web信息网页的抽取,证明了该算法能够降低人工干预,在林产品贸易信息推送中对信息源的处理具有较好的应用价值。

关 键 词:Web信息抽取  林产品贸易语义词典  语义信息熵  模板  目标信息定位

Forest products trading Web messages extraction algorithm based on semantic
LI Jia,XU Qian,WANG Zi,CHEN Zhao.Forest products trading Web messages extraction algorithm based on semantic[J].Computer Engineering and Applications,2014(19):199-204.
Authors:LI Jia  XU Qian  WANG Zi  CHEN Zhao
Affiliation:(School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China)
Abstract:Based on the shortages of the existing Web information extraction technique in the presence of the accuracy is not high, a low degree of automation and the weaker commonality, combined with the structured storage needs of information source in forest products trade Web information push, a new algorithm on forest products trading Web messages structuring based on semantic is proposed. The paper analyzes and takes advantage of forest products trade Web information feature,and combined with the basic principle of semantic recognition, it constructs of the forest product trade semantic dictionary,uses the layout features of the target information that need to extract in the Web pages at the same time and combined with the information entropy theory, a method of target information automatic extraction based on the semantic information entropy is proposed to extract target information, and the information is stored in the database as a structured form. The experiments on actual forest product trade Web pages information extraction, prove that this algorithm can reduce manual intervention and has good value in processing information source in forest products trade information push.
Keywords:Web information extraction  forest product trade semantic dictionary  semantic information entropy  template  target information location
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号