首页 | 本学科首页   官方微博 | 高级检索  
     

WCBVSM与SACA结合的文本分类模型
引用本文:张燕平,刘超,曲永花.WCBVSM与SACA结合的文本分类模型[J].计算机工程与应用,2012,48(11):137-142.
作者姓名:张燕平  刘超  曲永花
作者单位:1.安徽大学 计算智能与信号处理教育部重点实验室,合肥 230039 2.安徽大学 计算机科学与技术学院,合肥 230039 3.南京师范大学 计算机科学与技术学院,南京 210046
基金项目:国家自然科学基金(No.60675031 No.61073117); 国家重点基础研究规划(973)计划项目(No.2004CB318108 No.2007CB311003); 教育部社科研究基金青年资助项目(No.07JC870006)
摘    要:给出了一个词共现改进的向量空间模型(Word Co-Occurrence Mode Based On VSM,WCBVSM)与模拟退火交叉覆盖算法(Cross Cover Algorithm Based On Simulated Annealing Algorithm,SACA)相结合的文本分类新模型。传统的向量空间模型(VSM)采用词条作为文档的语义载体,没有考虑文本上下文词语之间的语义隐含信息,在词共现模型的启发下,提出WCBVSM,它通过统计文本中的词共现信息,加入VSM,以获得文档隐含的语义信息。针对交叉覆盖算法中识别精度与泛化能力之间的一对矛盾,结合模拟退火算法的思想,提出了SACA,改进了传统交叉覆盖在覆盖初始点选取时的随机性,并通过增加每个覆盖所包含的样本点来减少覆盖数,从而增强了覆盖的泛化能力。实验结果表明提出的文本分类新模型在加快识别速度的基础上,提高了分类的精度。

关 键 词:文本分类  向量空间模型  词共现模型  模拟退火  交叉覆盖算法  

Text categorization model based on WCBVSM and SACA
ZHANG Yanping , LIU Chao , QU Yonghua.Text categorization model based on WCBVSM and SACA[J].Computer Engineering and Applications,2012,48(11):137-142.
Authors:ZHANG Yanping  LIU Chao  QU Yonghua
Affiliation:1.Key Lab of Intelligent Computing & Signal Processing, MoE, Anhui University, Hefei 230039, China 2.School of Computer Science and Technology, Anhui University, Hefei 230039, China 3.School of Computer Science and Technology, Nanjing Normal University, Nanjing 210046, China
Abstract:A new text categorization model based on the method which combines WCBVSM with SACA is proposed.The traditional methods of vector space model adopt the key words as the document semantic carrier.These traditional methods ignore the semantic information between the words of text.According to the word co-occurrence model,the Word Co-Occurrence Model Based VSM(WCBVSM)is presented.The model counts the word co-occurrence information of the texts,and adds this information into VSM.Therefore,it is easy to get the semantic information.In addition,because of the conflict between validity and extensibility in cross covering algorithm,this paper presents a Cross Cover Algorithm based on Simulated Annealing algorithm(SACA).This algorithm improves the situation that the selection of cross cover’s center is random,and reduces the number of cover by increasing the sample number in each cover.It enhances the extensibility of the cover classification.The test results show that the proposed model accelerates the speed of recognition and improves the classification accuracy.
Keywords:text categorization  vector space model  term co-occurrence model  simulated annealing algorithm  cross cover algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号