基于后缀树模型的文本实时分类系统的研究和实现 Resarch and Implementation of On-line Text Categorization System Based on Suffix Tree期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于后缀树模型的文本实时分类系统的研究和实现

引用本文：	郭莉,张吉,谭建龙.基于后缀树模型的文本实时分类系统的研究和实现[J].中文信息学报,2005,19(5):18-25.

作者姓名：	郭莉张吉谭建龙

作者单位：	中国科学院计算技术研究所,北京　100080

摘要：	本文在面向网络内容分析的前提下,提出了一种基于后缀树的文本向量空间模型(VSM) ,并在此模型之上实现了文本分类系统。对比基于词的VSM,该模型利用后缀树的快速匹配,实时获得文本的向量表示,不需要对文本进行分词、特征抽取等复杂计算。同时,该模型能够保证训练集中文本的更改,对分类结果产生实时影响。实验结果和算法分析表明,我们系统的文本预处理的时间复杂度为O(N) ,远远优于分词系统的预处理时间复杂度。此外,由于不需要分词和特征抽取,分类过程与具体语种无关,所以是一种独立语种的分类方法。
关键词：	计算机应用中文信息处理实时文本分类向量空间模型后缀树
文章编号：	1003-0077（2005）05-0016-08
收稿时间：	2004-07-21
修稿时间：	2005-01-19
Resarch and Implementation of On-line Text Categorization System Based on Suffix Tree

GUO Li,ZHANG Ji,TAN Jian-long.Resarch and Implementation of On-line Text Categorization System Based on Suffix Tree[J].Journal of Chinese Information Processing,2005,19(5):18-25.

Authors:	GUO Li ZHANG Ji TAN Jian-long

Affiliation:	Institute of Computing Technology , Chinese Academy of Sciences , Beijing 100080 ,China

Abstract:	We propose a text vector space model(VSM) base d on suffix tree and implement a text categorizing system on the model. The model can perform fast matching by the support of suffix tree, obtain the vector prese ntation of text and avoid the complex computation such as word segmentation or f eature extraction of the text. In addition, this model can guarantee that the al teration of the training set can affect the result of classification in real tim e. Experiment and analysis of the algorithm show that, the time complexity of te xt preprocessing in our system is O(N), which is much better than that of word s egmentation method. Besides, the avoidance of word segmentation and feature extr action shows that the categorizing process is irrelevant to do with the concrete language and is a language independent method.

Keywords:	computer application Chinese information processing online text categorization vector space model suffix tree
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏