首页 | 本学科首页   官方微博 | 高级检索  
     

Web信息抽取和展现系统的设计与实现
引用本文:彭祥礼,朱小军,查志勇.Web信息抽取和展现系统的设计与实现[J].电力信息化,2012(2):23-26.
作者姓名:彭祥礼  朱小军  查志勇
作者单位:湖北省电力公司信息通信中心
摘    要:随着计算机网络技术的高速发展,如何高效准确地识别和获取Web信息变得至关重要。文章介绍了一个完整的Web信息抽取和展现系统,其总体架构由Web网站集、抽取规则库、内容定制模块和内容展现模块4部分组成。该系统支持用户通过可视化交互式界面定制信息抽取规则,实现了用户个性化抽取规则的存储。在数据项定位方式上采用基于DOM树和分层区域划分的方法,结合父子结点信息进行数据校验,既可以快速定位到信息抽取的目标区域,又能有效保证抽取方法的精度。

关 键 词:Web信息抽取  抽取规则  HTML  DOM树

Design and Implementation of Web Information Extraction and Visualization System
PENG Xiang-li,ZHU Xiao-jun,ZHA Zhi-yong.Design and Implementation of Web Information Extraction and Visualization System[J].Electric Power Information Technology,2012(2):23-26.
Authors:PENG Xiang-li  ZHU Xiao-jun  ZHA Zhi-yong
Affiliation:(Information Communication Center of Hubei Electric Power Company,Wuhan 430077,China)
Abstract:With the rapid development of computer network technologies,it is of critical importance to efficiently and accurately recognize and acquire web information.This paper describes a complete Web information extraction and visualization system,which consists of four components,web sites,extraction rules repository,content customization module and content displaying module respectively.The system supports the user to customize and save information extraction rules through visual and interactive interfaces.And the data items searching method used in the system integrates the DOM tree based method with the hierarchical area partition method,and validate data by combining parent node information with child node information,which can not only quickly navigate to information extraction target area,but also effectively guarantee the accuracy of extraction methods.
Keywords:Web information extraction  extraction rules  HTML  DOM tree
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号