首页 | 本学科首页   官方微博 | 高级检索  
     

一种改进的VSM及其在文本自动分类中的应用
引用本文:张婷慧,耿焕同,蔡庆生. 一种改进的VSM及其在文本自动分类中的应用[J]. 微电子学与计算机, 2005, 22(12): 24-27
作者姓名:张婷慧  耿焕同  蔡庆生
作者单位:中国科学技术大学计算机科学技术系,安徽,合肥,230027
基金项目:中国科学院资助项目,皖泰开发基金
摘    要:目前大多数文本自动分类系统都采用向量空间模型(VSM)来表示文档.针对常规的VSM文档表示方法不能反映概念的问题,文章对VSM进行了改进.在VSM的基础上,选取在同一个窗口单元中出现的高频词,用Apriori算法从这些高频词中挖掘出最大频繁词共现集,以此对VSM进行扩展后用来表示文档.实验表明,与用VSM表示文档相比,该方法使文本自动分类系统的性能有了显著的提高.

关 键 词:文本自动分类  向量空间模型  Apriori算法  词共现
文章编号:1000-7180(2005)12-024-04
收稿时间:2005-03-21
修稿时间:2005-03-21

A Modified VSM and its Application to Automatic Text Categorization
ZHANG Ting-hui,GENG Huan-tong,CAI Qing-sheng. A Modified VSM and its Application to Automatic Text Categorization[J]. Microelectronics & Computer, 2005, 22(12): 24-27
Authors:ZHANG Ting-hui  GENG Huan-tong  CAI Qing-sheng
Affiliation:Department of Computer Science and Technology, USTC, Hefei 230027 China
Abstract:Most automatic text categorization systems are using the VSM to present documents. The general text presentation model using VSM usually cannot present the concept of the document. This paper presents an improvement of the model. On the base of VSM, the high frequency words in the same window are selected, and then the Apriori algorithm is used to select the maximum frequent term co-occurrence set, which is used to expand the VSM to present the document. It is shown in the experiment that the improved model enhances the performance of the automatic text categorization system, comparing to the traditional VSM model.
Keywords:Automatic text categorization   Vector space model   Apriori algorithm   Term co-occurrence
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号