首页 | 本学科首页   官方微博 | 高级检索  
     

特征加权的CLSVSM
引用本文:牛奉高,闫涛.特征加权的CLSVSM[J].计算机与现代化,2021,0(5):59-65.
作者姓名:牛奉高  闫涛
作者单位:山西大学数学科学学院,山西 太原 030006
基金项目:山西省应用基础研究计划项目(优秀青年基金)(201801D211002); 山西省高等学校优秀成果培育项目(2019KJ004); 太原市小店区科技项目(2019)
摘    要:运用空间向量对文本信息进行合理且有效的表示对文本聚类以及检索的结果有较大影响。共现潜在语义向量空间模型(CLSVSM)深度挖掘了文本特征词之间的共现潜在语义信息并且提升了文本聚类的性能。本文在CLSVSM基础上先引入特征词词频信息,再将引入的词频作为权重赋予CLSVSM的共现强度,最终构建特征加权的CLSVSM。特征加权的CLSVSM在中文数据上的聚类效果如下:在F值方面,相比CLSVSM和Word2vec文本模型分别提高将近2.4%、5.2%,在熵值上相比90%CLSVSM_K和Word2vec文本模型分别降低了将近3.1%、9.0%,相比词频CLSVSM和TF-IDF模型在聚类效果上都有所提高。在英文数据上聚类效果也与其他模型相当。特征加权的CLSVSM的稳定性有待提高,受限于关键词词频信息表达完整程度。

关 键 词:CLSVSM    特征加权    TF-IDF    聚类  
收稿时间:2021-06-03

Feature Weighted CLSVSM
NIU Feng-gao,YAN Tao.Feature Weighted CLSVSM[J].Computer and Modernization,2021,0(5):59-65.
Authors:NIU Feng-gao  YAN Tao
Abstract:The rational and effective representation of document information using spatial vectors has a larger impact on text clustering and retrieval results. The Co-occurrence Latent Semantic Vector Space Model (CLSVSM) deeply excavates the co-occurrence latent semantic information between document feature words and improves the performance of document clustering. Based on CLSVSM, this paper first introduces word frequency information, then, the introduced word frequency is used as a weight to assign the co-occurrence strength in CLSVSM, and finally constructs feature weighted CLSVSM. The clustering effect of feature weighted CLSVSM on Chinese data is as follows: compared with CLSVSM and Word2vec text models, the F value is increased respectively by nearly 2.4% and 5.2%; compared with 90%CLSVSM_K and Word2vec text models, the entropy value is reduced respectively by nearly 3.1% and 9.0%; compared with the word frequency CLSVSM and TF-IDF models, the clustering effect is improved. The clustering effect of feature weighted CLSVSM on English data is similar to that of other models. The stability of feature weighted CLSVSM needs to be improved, which is limited by the completeness of keyword frequency information expression.
Keywords:CLSVSM  feature weighted  TF-IDF  clustering  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号