基于标记树的Web页面区域划分和搜索方法 How to Get the Main Part of Web Pages期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于标记树的Web页面区域划分和搜索方法

引用本文：	胡飞. 基于标记树的Web页面区域划分和搜索方法[J]. 计算机科学, 2005, 32(8): 182-185

作者姓名：	胡飞

作者单位：	重庆教育学院,重庆,400067;南京大学计算机科学与技术系,南京,210093

基金项目：	鸣谢：拙作是在南京大学计算机科学与技术系博士生导师陈世福悉心指导下完成的,在此表示由衷感谢.

摘要：	Web页面的布局可以分为:主要内容、单位标识、导航信息、交互信息和版权申明。我们在处理这些页面时往往只关心主要内容,而且可以从语义上快速定位到主要内容,但是软件系统要做到这一点就非常困难。本文提出一种基于标记树的Web页面区域划分和搜索方法,让软件系统可以忽略别的区域,快速定位到主要内容。对于大量Web页面处理而言,这种方法可以起到减少时间,缩小空间的作用,Web页面越多,效果就越显著。
关键词：	Web页面布局页面结构页面区域标记树标记树模式
How to Get the Main Part of Web Pages

Hu Fei. How to Get the Main Part of Web Pages[J]. Computer Science, 2005, 32(8): 182-185

Authors:	Hu Fei

Abstract:	A Web page can be divided into several parts, they are "the main part, the department logo, the navigation bar, the hyperlinks and the copyright". How to get the main part of Web pages. It's easy for humankind, but hard for computer pocessing. In this paper we tackle the problem by exploring a tag tree, which can suitably express the struc- ture and the layout of Web pages. Here we propose a method to build the tag tree, in addition to develop a single path tag tree named tag tree model, which only describe the main part of Web pages.

Keywords:	Web page layout Web page structure Web page area Tag tree Tag tree model
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机科学》浏览原始摘要信息
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏