首页 | 本学科首页   官方微博 | 高级检索  
     

基于WordNet概念向量空间模型的文本分类
引用本文:张剑,李春平.基于WordNet概念向量空间模型的文本分类[J].计算机工程与应用,2006,42(4):174-178.
作者姓名:张剑  李春平
作者单位:清华大学软件学院,北京,100084;清华大学软件学院,北京,100084
摘    要:文章提出了一种文本特征提取方法,以WordNet语言本体库为基础,以同义词集合概念代替词条,同时考虑同义词集合间的上下位关系,建立文本的概念向量空间模型作为文本特征向量,使得在训练过程中能够提取出代表类别的高层次信息。实验结果表明,当训练文本集合很小时,方法能够较大地提高文本的分类准确率。

关 键 词:文本自动分类  WordNet  概念向量  向量空间模型
文章编号:1002-8331-(2006)04-0174-05
收稿时间:2005-07
修稿时间:2005-07

WordNet-based Concept Vector Space Model for Text Classification
Zhang Jian,Li Chunping.WordNet-based Concept Vector Space Model for Text Classification[J].Computer Engineering and Applications,2006,42(4):174-178.
Authors:Zhang Jian  Li Chunping
Affiliation:School of Software,Tsinghua University,Beijing 100084
Abstract:In this paper,we design and implement an automatic text classification system,aiming at improving the accuracy of text classification.In current existing automatic text classification systems,the content of text is described by N-dimension feature vector model,but the approaches for establishing the model have great influence on the accuracy of text classification.Vector Space Model(VSM),as one of the most effective approaches,describes a document as orthogonal term vectors.The assumption of the VSM approach is that the semantic relation between terms is ignored.But in the real world,semantic relations between terms usually exist,such as synonymy and hypernymy-hyponymy,etc.Here we introduce a novel approach,based on WordNet,for describing a text by establishing concept vector space model.In our approach,we can extract the high-level information on categories during training process by replacing terms with synonymy sets in WordNet and considering hypernymy-hyponymy relation between synonymy sets.We carry on a series of experiments to compare our approach with the term-based VSM approach.The results show that our approach could improve the accuracy of text classification especially when the size of trainning set is small.
Keywords:WordNet
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号