HtmlParser提取网页信息的设计与实现 Design and Implementation of Web Information Extraction Based on HtmlParser期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

HtmlParser提取网页信息的设计与实现

引用本文：	黄颖,黄治平.HtmlParser提取网页信息的设计与实现[J].南方冶金学院学报,2007,28(6):26-28,35.

作者姓名：	黄颖黄治平

作者单位：	[1]江西理工大学信息工程学院,江西赣州341000 [2]赣南师范学院,江西赣州341000

摘要：	互联网上信息量的激增,迫切需要一些自动化的工具帮助人们在海量信息源中迅速找到真正需要的信息,如标题、链接、email和图片等,而HTML语言所表述的Web页面经浏览器分析后只适合浏览,不适合作为一种数据交换的方式由机器处理.文中详细介绍了如何使用HtmlParser来提取网页当中的超链接信息,将其清洗后存入SQL数据库当中,以备后续工作使用.
关键词：	HtmlParser 信息提取网页解析提取网页信息设计 Based Information Extraction Implementation 工作使用数据库超链接信息机器处理数据交换合作浏览器分析语言 HTML email 信息源自动化信息量
文章编号：	1007-1229（2007）06-0026-03
修稿时间：	2007-04-28
Design and Implementation of Web Information Extraction Based on HtmlParser

HUANG Ying,HUANG Zhi-ping.Design and Implementation of Web Information Extraction Based on HtmlParser[J].Journal of Southern Institute of Metallurgy,2007,28(6):26-28,35.

Authors:	HUANG Ying HUANG Zhi-ping

Affiliation:	1.Facuhy of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China; 2.Gannan Teachers CoLlege, Ganzhou 341000, China

Abstract:	The rapid growth of the Web contents increases the need for some automatic tools to help people find the information among the magnanimous information sources such as titles, links, emails, pictures etc. The Web pages expressed by HTML, after analyzed by Internet Explorer, are only suitable for browse ,but not for machine processing as the way of data exchange. The paper explains how to use HtmlParser to extract hyperlink information from web page, then store in SQL database after cleaning in information detail.

Keywords:	htmlparser information extraction web analysis
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏