首页 | 本学科首页   官方微博 | 高级检索  
     

中文文献的层次分类方法
引用本文:战学刚,林鸿飞.中文文献的层次分类方法[J].中文信息学报,1999,13(6):21-26.
作者姓名:战学刚  林鸿飞
作者单位:东北大学计算机科学与工程系
摘    要:现有的分类系统通常忽略类别体系的层次结构,在对文献进行分类时,往往很难区分类别相近的文献属于哪一类。本文基于向量空间模型,提出根据类别体系的层次结构,自顶向下,逐层分类的方法。其目的是提高分类精度;并根据概念词典,将同义词或下位概念映射到单一的概念词上,由这些概念词构成一个规模很小的特征集,以缩小特征向量空间的维数,从而减少分类系统的计算量。此外,通过对类别层次体系的分析,压缩特征向量,从另一方面减少分类系统的计算量。

关 键 词:文献分类  向量空间模型  类别层次结构  

Hierarchical Method for Chinese Document Classification
Zhan Xuegang,Lin Hongfei,Yao Tianshun.Hierarchical Method for Chinese Document Classification[J].Journal of Chinese Information Processing,1999,13(6):21-26.
Authors:Zhan Xuegang  Lin Hongfei  Yao Tianshun
Affiliation:Department of Computer Science , Northeastern University
Abstract:Existing statistical document classification systems often ignore the hierarchical structure of the pre defined topics. This makes it difficult to identify which category a document belongs to when the possible categories are somewhat similar. In this article, we propose a top down classification method according to the hierarchical structure of topics. The purpose is to improve precision and reduce computation of classification systems. Through a concept dictionary (thesaurus), we map the synonyms or lower level concepts in a document to a small set of concept words that are used as terms. This reduces the computational complexity from another aspect by reducing the dimension of the vector space.
Keywords:Document classification  Vector space model  Topic category hierarchy  
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号