首页 | 本学科首页   官方微博 | 高级检索  
     

一种用于Web文本聚类的特征选择方法
引用本文:王卫玲,刘培玉,刘克非. 一种用于Web文本聚类的特征选择方法[J]. 计算机应用与软件, 2007, 24(1): 154-156
作者姓名:王卫玲  刘培玉  刘克非
作者单位:山东师范大学信息科学与工程学院,山东,济南,250014;山东师范大学信息科学与工程学院,山东,济南,250014;山东师范大学信息科学与工程学院,山东,济南,250014
摘    要:特征选择已经广泛地应用在文本分类和文本聚类中,相对于无监督的特征选择方法,有监督的特征选择方法在过滤噪音等方面更为有效.但是,由于缺少类标签,它很难应用到文本聚类中.提出了一种针对Web文本聚类的新的特征选择算法--基于k-means的多特征联合选择算法(MFCC).MFCC充分利用了一个特征空间的中间聚类结果来帮助另一个特征空间进行特征选择.实验证明,MFCC有效地提高了聚类质量.

关 键 词:Web挖掘  聚类  向量空间模型
修稿时间:2006-04-17

A FEATURE SELECTION ALGORITHM FOR WEB DOCUMENTS CLUSTERING
Wang Weiling,Liu Peiyu,Liu Kefei. A FEATURE SELECTION ALGORITHM FOR WEB DOCUMENTS CLUSTERING[J]. Computer Applications and Software, 2007, 24(1): 154-156
Authors:Wang Weiling  Liu Peiyu  Liu Kefei
Affiliation:School of Information Science and Engineering, Shandong Normal University, Jinan Shandong 250014, China
Abstract:Feature selection has been widely applied in text categorization and clustering.Compared to unsupervised selection,supervised feature selection is more successful in filtering out noise in most cases.However,due to a lack of label information,clustering can hardly exploit supervised selection.In this paper,We proposed a novel feature coselection for Web documents clustering,which is called Multitype Features Coselection for Clustering(MFCC).MFCC uses intermediate clustering results in one type of feature space to help the selection in other types of feature spaces.Our experiments show that for most selection criteria,MFCC reduces effectively the noise introduced by pesudoclass,and further improves clustering performance.
Keywords:Web mining Clustering VSM
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号