基于DOM模型扩展的Web信息提取 Extraction of Information from Web Pages Based on Extended DOM Tree期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于DOM模型扩展的Web信息提取

引用本文：	顾韵华,田伟.基于DOM模型扩展的Web信息提取[J].计算机科学,2009,36(11):235-237.

作者姓名：	顾韵华田伟

作者单位：	南京信息工程大学计算机与软件学院,南京,210044

基金项目：	江苏省产业技术研究与开发基金项目

摘要：	提出了一种基于DOM模型扩展的Web信息提取方法.将Web页面表示为DOM树结构,对DOM树结点进行语义扩展并计算其影响度因子,依据结点的影响度因子进行剪枝,进而提取Web页面信息内容.该方法不要求对网页的结构有预先认识,具有自动和通用的特点.提取结果除可以直接用于Web浏览外,还可用于互联网数据挖掘、基于主题的搜索引擎等应用中.
关键词：	文档对象模型 Web信息提取影响度因子 DOM树扩展
收稿时间：	1/3/2009 12:00:00 AM
修稿时间：	2009/9/24 0:00:00
Extraction of Information from Web Pages Based on Extended DOM Tree

GU Yun-hu,TIAN Wei.Extraction of Information from Web Pages Based on Extended DOM Tree[J].Computer Science,2009,36(11):235-237.

Authors:	GU Yun-hu TIAN Wei

Affiliation:	(School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044,China)

Abstract:	A method of information extraction from Web pages was presented, and it is based on extended DOM tree. Web pages were firstly transformed to DOM tree, then the DOM tree was extended by adding semantic expression to node and influence degree was calculated for each node. According to influence degree of nodes, the DOM tree was pruned,and it can automatically extract the useful relevant content from Web pages. This approach is a universal method, which does not recauire to pre-know the structure of the Web page. The results of the information extraction are used not only for browsing but also for further Web information process, such as Internet data mining, topic-based search engine.

Keywords:	DOM Extraction of information from Web pages Influence degree Extended DOM tree
本文献已被万方数据等数据库收录！
	点击此处可从《计算机科学》浏览原始摘要信息
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏