首页 | 本学科首页   官方微博 | 高级检索  
     

基于分块的网页信息解析器的研究与设计
引用本文:于满泉,陈铁睿,许洪波.基于分块的网页信息解析器的研究与设计[J].计算机应用,2005,25(4):974-976.
作者姓名:于满泉  陈铁睿  许洪波
作者单位:1.中国科学院计算技术研究所; 2.中国科学院研究生院
摘    要:详细介绍了网页信息解析的基本技术手段,在综合权衡优缺点的基础上,提出了针对新 闻网站复杂结构页面较为有效的分块算法,并结合实际的项目需求,设计实现了网页信息解析器 TVPS,实验结果表明,该解析器具有良好的性能,满足实际的需求。

关 键 词:Web挖掘    HTML标记    视觉特征    网页分块
文章编号:1001-9081(2005)04-0974-03

Research and design of HTML parser based on page segmentation
YU Man-quan,CHEN Tie-rei,XU Hong-bo.Research and design of HTML parser based on page segmentation[J].journal of Computer Applications,2005,25(4):974-976.
Authors:YU Man-quan  CHEN Tie-rei  XU Hong-bo
Affiliation:1.Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080,China; 2.Graduate School,Chinese Academy of Sciences,Beijing 100039,China
Abstract:The technologies of Web page parser were introduced. And after making a best estimation of the merits and weakness of the existing methods, a more effective method for segmenting the HTML page in the news Web site was proposed. And then,a HTML Parser named TVPS was designed and realized based on the requirement of the projects. The experimental results show that the system has achieved great performance and meets the needs.
Keywords:Web mining  HTML tag  visual cues  page segmentation  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号