首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于词义和词频的向量空间模型改进方法
引用本文:邓晓衡,杨子荣,关培源.一种基于词义和词频的向量空间模型改进方法[J].计算机应用研究,2019,36(5).
作者姓名:邓晓衡  杨子荣  关培源
作者单位:中南大学软件学院,长沙,410075;中南大学软件学院,长沙,410075;中南大学软件学院,长沙,410075
基金项目:中南大学研究生创新基金资助项目(2017zzts732)
摘    要:向量空间模型(VSM)是一种使用特征向量对文本进行建模的方法,广泛应用于文本分类、模式识别等领域。但文本内容较多时,传统的VSM建模可能产生维数爆炸现象,效率低下且难以保证分类效果。针对VSM高维现象,提出一种利用词义和词频降低文本建模维度的方法,以提高效率和准确度。提出一种多义词判别优化的同义词聚类方法,结合上下文判别多义词的词义后,根据特征项词义相似度进行加权,合并词义相近的特征项。新方法使特征向量维度大大降低,多义词判别提高了文章特征提取的准确性。与其他文本特征提取和文本分类方法进行比较,结果表明,该算法在效率和准确度上有明显提高。

关 键 词:文本分类  特征选择  卡方分布  向量空间模型
收稿时间:2017/12/1 0:00:00
修稿时间:2018/4/19 0:00:00

Method based on word meaning and word frequency to improve vector space model
Yang Zirong,Deng Xiaoheng and Guan Peiyuan.Method based on word meaning and word frequency to improve vector space model[J].Application Research of Computers,2019,36(5).
Authors:Yang Zirong  Deng Xiaoheng and Guan Peiyuan
Affiliation:School of Software, Central South University,,
Abstract:Vector space Model (VSM) is a method of modeling text using Eigenvector, which is widely used in the fields of text categorization and pattern recognition. But when the text content is more, the traditional VSM model may produce the dimension explosion phenomenon, the efficiency is low and the classification effect is difficult to guarantee. Aiming at the phenomenon of VSM, this paper proposes a method to reduce the dimension of text modeling by means of word meaning and frequency, in order to improve efficiency and accuracy. In this paper, we propose a synonym clustering method for polysemy discriminant optimization, combining with the context distinguishing word meaning, weighted by the similarity of the word meaning, and merging the feature items with similar meanings. The new method has greatly reduced the dimension of eigenvector, and polysemy has improved the accuracy of feature extraction. Compared with other text feature extraction and text categorization methods, the results show that the algorithm has a significant improvement in efficiency and accuracy.
Keywords:text categorization  feature selection  chi-square  vector space model
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号