首页 | 本学科首页   官方微博 | 高级检索  
     

基于提取网站层次结构的网页分类方法
引用本文:邓健爽,郑启伦,彭宏.基于提取网站层次结构的网页分类方法[J].计算机应用,2006,26(5):1134-1136.
作者姓名:邓健爽  郑启伦  彭宏
作者单位:华南理工大学,计算机科学与工程学院,广东,广州,510641
基金项目:广东省科技攻关项目;广东省广州市科技攻关项目
摘    要:网页自动分类是当前互联网搜索领域一个热点研究课题,目前主要有基于网页文本内容的分类和基于网页间超链接结构的分类。但是这些分类都只利用了网页的信息,没有考虑到网页所在网站提供的信息。文中提出了一种全新的对网站内部拓扑结构进行简约的算法,提取网站隐含的层次结构,生成层次结构树,从而达到对网站内部网页实现多层次分类的目的,并且已经成功应用到电子商务智能搜索和挖掘系统中。

关 键 词:网页分类  网站层次结构  URL聚类
文章编号:1001-9081(2006)05-1134-03
收稿时间:2005-11-02
修稿时间:2005-11-022006-01-07

Web page classification based on extracting hierarchy from Web site
DENG Jian-shuang,ZHENG Qi-lun,PENG Hong.Web page classification based on extracting hierarchy from Web site[J].journal of Computer Applications,2006,26(5):1134-1136.
Authors:DENG Jian-shuang  ZHENG Qi-lun  PENG Hong
Abstract:Web page classification was one of the hot study problems in the domain of Internet Search currently. Now there were the classifiers based on text and the hyperlinks. But all these methods of classification only used the information of the pages without the information that was provided from the whole web site. In the article, there was a new arithmetic that simplifies the topology structure of the Web site and extracted the connotative hierarchy of the classification to build the classified tree, through which we could achieve the multi-level classification. This method has been applied to the system of intelligent searching and mining of electronic business successfully.
Keywords:Web page classification  Hierarchy of Web site  URL clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号