首页 | 本学科首页   官方微博 | 高级检索  
     

基于概念和语义相似度的文本聚类算法
引用本文:焦芬芬.基于概念和语义相似度的文本聚类算法[J].计算机工程与应用,2012,48(18):136-141.
作者姓名:焦芬芬
作者单位:中国空空导弹研究院,河南 洛阳 471009
摘    要:提出一种基于概念和语义相似度的聚类算法TCBCSS(Text Clustering Based on Concept and Semantic Similarity),TCBCSS算法基于WordNet对文档概念进行抽取和归并,形成语义网络,利用小世界理论和网络的几何特性对其进行分析并构建概念列表来表示文档,不仅有效解决了“表达差异”问题也有利于文档相似度的计算。TCBCSS算法利用两个概念列表的语义相似度作为文档间相近程度的度量,以图为基础进行聚类分析,避免了有些聚类算法对聚簇形状的限制,试验证明TCBCSS算法提高了聚类质量。

关 键 词:文本聚类  概念  文本表示  小世界理论  语义相似度  

Clustering method based on concept and semantic similarity
JIAO Fenfen.Clustering method based on concept and semantic similarity[J].Computer Engineering and Applications,2012,48(18):136-141.
Authors:JIAO Fenfen
Affiliation:Avic China Airborne Missile Academy, Luoyang, Henan 471009, China
Abstract:This paper introduces a new document clustering method using concept and semantic similarity—Text Clustering Based on Concept and Semantic Similarity(TCBCSS).Key concept is extracted,instead of the keyword,to form semantic network.The semantic network is analyzed using Six Degrees of Separation and geometric characteristics,to build concept lists,which represent the document.This not only resolves the problem of differentially expressed,but also is more convenient for similarity computation.TCBCSS algorithm uses semantic similarity of concept lists as a measure of similarity between the two documents,and clusters the document based on graph,to avoid some limitations of the clustering algorithm on the clustered shape.Experimental results prove that TCBCSS algorithm improves the quality of the clustering.
Keywords:text clustering  concept  text representation  Six Degrees of Separation  semantic similarity
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号