一种高效的用于文本聚类的无监督特征选择算法 An Effective Unsupervised Feature Selection Method for Text Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种高效的用于文本聚类的无监督特征选择算法

引用本文：	刘涛, 吴功宜, 陈正. 一种高效的用于文本聚类的无监督特征选择算法[J]. 计算机研究与发展, 2005, 42(3).

作者姓名：	刘涛吴功宜陈正

作者单位：	南开大学信息技术科学学院,天津,300071;微软亚洲研究院,北京,100080

基金项目：	This research was published on the 20th International Conference on Machine Learning(ICML'03).

摘要：	特征选择虽然非常成功地应用于文本分类，但却很少用于文本聚类，这是因为那些高效的特征选择方法通常都是有监督的特征选择算法，它们因为需要类信息而无法直接应用于文本聚类.为了能将这些方法应用到文本聚类上，提出了一种新的无监督特征选择算法：基于K-Means的特征选择算法(KFS).这个算法通过在不同K-Means聚类结果上使用有监督特征选择的方法，成功地选择出了最为重要的一小部分特征，使文本聚类的性能提高了近15%.
关键词：	特征选择文本聚类
An Effective Unsupervised Feature Selection Method for Text Clustering

Liu Tao, Wu Gongyi, Chen Zheng. An Effective Unsupervised Feature Selection Method for Text Clustering[J]. Journal of Computer Research and Development, 2005, 42(3).

Authors:	Liu Tao Wu Gongyi Chen Zheng

Abstract:	Feature selection has been successfully applied to text categorization, but rarely applied to text clustering, because those effective supervised feature selection methods can't be applied to text clustering due to the unavailability of class label information. So a new feature selection method called "K-Means based feature selection (KFS)" method is proposed in this paper, which addresses the unavailability of label information by performing effective supervised feature selections on different K-Means clustering results. Experimental results show that (1) KFS successfully selects out the best small part of features and significantly improves the clustering performance; and (2) Compared with other feature selection methods, KFS is very close to the ideal supervised feature selection methods and much better than any unsupervised methods.

Keywords:	feature selection text clustering
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机研究与发展》浏览原始摘要信息
	点击此处可从《计算机研究与发展》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏