首页 | 本学科首页   官方微博 | 高级检索  
     

TGFCM:基于模糊聚类的中文文本挖掘的新方法
引用本文:耿新青,王正欧.TGFCM:基于模糊聚类的中文文本挖掘的新方法[J].计算机工程,2006,32(5):7-9.
作者姓名:耿新青  王正欧
作者单位:天津大学系统工程研究所,天津,300072
摘    要:提出一种新的动态模糊聚类的方法,针对传统的模糊聚类需要预先确定聚类数的问题,提出采用动态自组织映射神经网络来确定聚类数,并通过文本向量空间模型和TF-IDF方法来确定文本的特征向量,再将动态自组织映射神经网络得到的聚类数,用模糊C均值算法(FCM)函数处理,得到聚类的结果。该算法同仅用动态自组织映射神经网络算法的运行结果相比,具有运行聚类结果精度高的优点,模糊聚类更适合处理语义的多样性和文本归属的模糊性,实验验证了算法的有效性。

关 键 词:自组织映射网络  文本特征向量  模糊聚类  聚类数
文章编号:1000-3428(2006)05-0007-03
收稿时间:2005-02-26
修稿时间:2005年2月26日

TGFCM: A Novel Approach of Chinese Text Mining Based on Fuzzy Clustering
GENG Xinqing,WANG Zheng'ou.TGFCM: A Novel Approach of Chinese Text Mining Based on Fuzzy Clustering[J].Computer Engineering,2006,32(5):7-9.
Authors:GENG Xinqing  WANG Zheng'ou
Affiliation:Institute of Systems Engineering, Tianjin University, Tianjin 300072
Abstract:A novel approach is presented.The main defect of traditional methods of fuzzy clustering is to known the number of clustering in advance.This paper applies the dynamic self-organizing maps algorithm to determining the number of clustering.The text eigenvector is acquired based on the vector space model(VSM) and TF?IDF method.The result of clustering is attained by fuzzy C mean algorithm(FCM).The number of clustering acquired by the dynamic self-organizing maps is introduced into the fuzzy C mean algorithm(FCM).Compared to the dynamic self-organizing maps algorithm,the present algorithm possesses higher precision.The fuzzy clustering is suitable for dealing with the semantic variety and complexity.The example demonstrates the effectiveness of the present algorithm.
Keywords:Self-organizing maps  Text eigenvector  Fuzzy clustering  Number of clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号