基于标准XML的Web信息高效抽取算法 On High-efficiency Web-information Extraction Algorithm Based on XML期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于标准XML的Web信息高效抽取算法

引用本文：	王奔.基于标准XML的Web信息高效抽取算法[J].湖北工业大学学报,2010,25(2):63-67.

作者姓名：	王奔

作者单位：	湖北工业大学计算机学院,湖北,武汉,430068

摘要：	讨论了一种基于XML在网络中抽取信息的方法.理想的数据抽取过程是仅仅分析由HTML页面组成的网站数据库.然而,全面的信息抽取过程需要面对许多障碍.正确的数据抽取还需要有可靠的数据验证和错误恢复服务,以应对无法避免的数据抽取故障.提出一个名为NIES的软件框架,它可以大大提高网络信息抽取的效率和准确度,保证了网络信息抽取的质量.NIES的关键部分是用XML技术来进行数据抽取,它包含了XHTML和XSLT并且支持连接"深度网络".
关键词：	NIES爬虫深度网络网络数据抽取
On High-efficiency Web-information Extraction Algorithm Based on XML

WANG Ben.On High-efficiency Web-information Extraction Algorithm Based on XML[J].Journal of Hubei University of Technology,2010,25(2):63-67.

Authors:	WANG Ben

Affiliation:	WANG Ben(School of Computer Science,Hubei Univ.of Technology,Wuhan 430068,China)

Abstract:	The methodology of extracting the Web information based on XML is described.The ideal way of Web data extraction is to analyze the database which includes HTML pages only.However,the whole process will meet so much trouble.Data extraction needs data validation and error recovery to face the failures which may happen in Web data extraction.In this paper,a software framework named NIES is presented,which is able to enhance the efficiency of extracting the information in Web and guarantee the quality of the extracting.The key part of NIES is to extract information by using XML technology,which includes XHTML and XSLT and connects ＂deep Web＂.

Keywords:	NIES crawling deep Web Web data extraction
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏