网页信息抽取方法的研究 Research on Information extraction algorithm based on Web期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

网页信息抽取方法的研究

引用本文：	徐铁,耿佳宁.网页信息抽取方法的研究[J].信息技术,2009(4).

作者姓名：	徐铁耿佳宁

作者单位：	1. 黑龙江省电子信息产品监督检验院,哈尔滨,150090 2. 中国政法大学,北京,102249

摘要：	信息抽取技术属于人工智能的一个分支.使用信息抽取技术可以人性化地从网页中把人们需要的信息抽取出来.文中提出的信息抽取技术是基于DOM和网页模板的一种归纳网页模板的新方法,它能很好地对各种布局元素的网页进行模板归纳,同时给出核心算法的C++实现.
关键词：	网页抽取网页模板网页相似度网页聚类
Research on Information extraction algorithm based on Web

XU Tie,GENG Jia-ning.Research on Information extraction algorithm based on Web[J].Information Technology,2009(4).

Authors:	XU Tie GENG Jia-ning

Affiliation:	1.Heilongjiang Province Electronic Information Products Supervision & Inspection Institute;Harbin 150090;China;2.China University of Political Science and Law;Beijing 102249;China

Abstract:	Information extraction belongs to the branch of artificial intelligence.Interesting information can be abstracted from Web pages by means of it.Information extraction algorithm proposed is based on DOM and Web template,and it is a new way of inducting Web page templates for the Web pages that are laid out by table element.Implementation of regarding the proposed algorithm is provided by C+ + language at the end of the paper.

Keywords:	Web extraction Web template Web similar Web cluster
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏