基于主题的Web文档聚类研究 Study on Topic- Based Web Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于主题的Web文档聚类研究

引用本文：	孙学刚,陈群秀,马亮.基于主题的Web文档聚类研究[J].中文信息学报,2003,17(3):22-27.

作者姓名：	孙学刚陈群秀马亮

作者单位：	智能技术与系统国家重点实验室,清华大学计算机科学与技术系

基金项目：	国家 8 63资助项目 ( 2 0 0 1AA1140 4 0 )

摘要：	网络资源的不断膨胀和新旧信息的迅速更迭,使传统的手工分检的方法难以适应对海量电子数据的管理需要。Web文档聚类可以快速地将文档进行自动归类,并能够发现新的信息资源。针对Web文档数据的复杂性,本文提出了通过二次特征提取和聚类的方法,将Web文档按照主题进行自动聚类。在主题特征被有效提取的同时,实现了较高质量的Web文档聚类。
关键词：	计算机应用中文信息处理 Web文档聚类 OPTICS算法特征提取 K近邻准则二次特征提取和聚类的方法
文章编号：	1003-0077(2003)03-0021-06
修稿时间：	2003年1月20日
Study on Topic- Based Web Clustering

SUN Xue-gang,CHEN Qun-xiu,MA Liang.Study on Topic- Based Web Clustering[J].Journal of Chinese Information Processing,2003,17(3):22-27.

Authors:	SUN Xue-gang CHEN Qun-xiu MA Liang

Affiliation:	State Key Laboratory of Intelligent Technology and System, Dept. of Computer Science & Technology, Tsinghua University

Abstract:	With the ceaseless resource inflation and rapid change of information on Web, it has become difficult to manage vast e-data through traditional manual method. Web clustering can automatically classify documents and help us to discover new information. Considering the complexity of Web documents, we offer a method of feature re-select and document re-cluster and perform a good Web clustering.

Keywords:	computer application Chinese information processing Web clustering OPTICS algorithm feature selection K-NN method of feature re-selection and re-cluster
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏