首页 | 本学科首页   官方微博 | 高级检索  
     

基于页面对比分析的数据提取
引用本文:张聚弘,山岚. 基于页面对比分析的数据提取[J]. 计算机与数字工程, 2006, 34(1): 49-52
作者姓名:张聚弘  山岚
作者单位:北京化工大学信息科学与技术学院,北京,100029
摘    要:针对提供大规模数据查询的Web页面,提出了一种基于站点内页面对比分析的web数据提取方法。在对页面建树和分块的基础上对比分析获得页面数据块,然后利用同结构多页面对比和格式判断提取出数据,最后将数据存入到主据库中。该方法成功运用到多个信息提取系统中,实现了高效、准确的数据提取。

关 键 词:数据提取  页面结构  半结构化
修稿时间:2005-04-15

Data Extracting Based on the Page Comparison and Analysis
Zhang Juhong,Shan Lan. Data Extracting Based on the Page Comparison and Analysis[J]. Computer and Digital Engineering, 2006, 34(1): 49-52
Authors:Zhang Juhong  Shan Lan
Abstract:The Web based data service is expanding quickly with the dramatic expanse of Internet.In this paper a Web data extraction method is proposed,which is based on Page Comparison and structure analysis.Firstly it parses the semi-structured HTML documents and partitions it.Then Analysis relied on table structure can extract data from significative area which is extracted through the similar Pages Comparison.Finally these data can be integrated into database.This approach has been efficiently and accurately applied in many retrieval systems.
Keywords:data extracting  Web page structure  semi-structured
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号