首页 | 本学科首页   官方微博 | 高级检索  
     

改进的OPTICS算法及其在文本聚类中的应用
引用本文:曾依灵,许洪波,白硕.改进的OPTICS算法及其在文本聚类中的应用[J].中文信息学报,2008,22(1):51-55,60.
作者姓名:曾依灵  许洪波  白硕
作者单位:1. 中国科学院 计算技术研究所 智能安全中心,北京 100080;2. 中国科学院 研究生院,北京 100080
基金项目:国家973资助项目(2004CB318109)
摘    要:基于密度的OPTICS聚类算法以可视化的结果输出方式直观呈现语料结构,但由于其结果组织策略在处理稀疏点时的局限性,算法实际性能未能得到充分发挥。本文针对此缺陷提出一种有效的结果重组织策略以辅助稀疏点的重新定位,并针对文本领域的特点改变距离度量方法,形成了OPTICS-Plus文本聚类算法。在真实文本分类语料上的实验表明,我们的结果重组织策略能够辅助算法产生更为清晰反映语料结构的可达图,与K-means算法的比较则证实了OPTICS-Plus具有较为良好的聚类性能。

关 键 词:计算机应用  中文信息处理  OPTICS算法  密度聚类  文本挖掘  
文章编号:1003-0077(2008)01-0051-05
收稿时间:2007-05-02
修稿时间:2007-12-03

OPTICS-Plus for Text Clustering
ZENG Yi-ling,XU Hong-bo,BAI Shuo.OPTICS-Plus for Text Clustering[J].Journal of Chinese Information Processing,2008,22(1):51-55,60.
Authors:ZENG Yi-ling  XU Hong-bo  BAI Shuo
Affiliation:1. Research Center of Information Intelligence and Information Security,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China;
2. Graduate University, Chinese Academy of Sciences, Beijing 100080, China
Abstract:As a density-based clustering algorithm,OPTICS is capable of showing the intrinsic corpus structure within a visual plot.However,due to the improper strategy in organizing the points in sparse space,the algorithm does not reach its best performance.To solve this problem,we proposed an effective result-reorganization strategy for reordering those sparse points.Based on this strategy,a new text clustering algorithm named OPTICS-Plus was proposed according to the characteristic of text mining fields.Experiment on FuDan text classification corpus shows that our result-reorganization strategy is capable of helping the reachability plots generating clearer views of corpus structures.Furthermore,a comparison with K-means proves that the clustering performance of OPTICS-Plus is actually satisfactory.
Keywords:computer application  Chinese information processing  OPTICS  density-based clustering  text mining
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号