基于树比较的Web页面主题信息抽取 Topic information extraction from Web pages based on tree comparison期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于树比较的Web页面主题信息抽取

引用本文：	朱梦麟,李光耀,周毅敏. 基于树比较的Web页面主题信息抽取[J]. 微型机与应用, 2011, 30(19): 67-69

作者姓名：	朱梦麟李光耀周毅敏

作者单位：	同济大学电子与信息工程学院,上海,201804

基金项目：	上海市科委国际合作项目

摘要：	为了从具有海量信息的Internet上自动抽取Web页面的信息,提出了一种基于树比较的Web页面主题信息抽取方法。通过目标页面与其相似页面所构建的树之间的比较,简化了目标页面,并在此基础上生成抽取规则,完成了页面主题信息的抽取。对国内主要的一些网站页面进行的抽取检测表明,该方法可以准确、有效地抽取Web页面的主题信息。
关键词：	信息抽取相似页面树比较抽取规则
Topic information extraction from Web pages based on tree comparison

Zhu Menglin,Li Guangyao,Zhou Yimin. Topic information extraction from Web pages based on tree comparison[J]. Microcomputer & its Applications, 2011, 30(19): 67-69

Authors:	Zhu Menglin Li Guangyao Zhou Yimin

Affiliation:	Zhu Menglin,Li Guangyao,Zhou Yimin (Department of Electronics and Information Engineering,Tongji University,Shanghai 201804,China)

Abstract:	In order to automatically extract Web page information from Internet that contains magnanimous information, this paper presented an approach based on tree comparison. This approach compared tree built from the target page with that ones built from its similar pages to simplify the target page. Extraction rules were generated on this basis, and then we used the rules to extract topic information from the target Web page. Experiment result shows this extraction method is precise and efficient.

Keywords:	information extraction similar pages tree comparison extraction rules
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏