首页 | 本学科首页   官方微博 | 高级检索  
     

文本层次分类系统的研究
引用本文:高波,赵政. 文本层次分类系统的研究[J]. 计算机工程与应用, 2006, 42(11): 176-178
作者姓名:高波  赵政
作者单位:天津大学信息学院计算机系,天津,300072;天津大学信息学院计算机系,天津,300072
摘    要:文章提出了层次分类模型,将类别按相似程度形成一棵树形结构,对文章分类时是一层一层逐层比较的,这样就使得文本分类时文本与类别之间的比较次数大为减少,同时由于大的类别的特征之间的区别比较明显,因此又能在一定程度上提高文本分类的精准率。考虑到一篇文章的标题和正文对决定文章所处的类别上所起的作用是不同的,文中将标题和正文分开处理。还有在进行特征选择时将TFIDF和MI结合起来,这也是该文的创新之处。实验结果表明,层次分类的方法在速度上比一般分类快15%左右,而精准率又有一定程度的提高。

关 键 词:文本分类  向量空间  精准率  层次分类
文章编号:1002-8331-(2006)11-0176-03
收稿时间:2005-07-01
修稿时间:2005-07-01

Research on Text Hierarchical Classification System
Gao Bo,Zhao Zheng. Research on Text Hierarchical Classification System[J]. Computer Engineering and Applications, 2006, 42(11): 176-178
Authors:Gao Bo  Zhao Zheng
Affiliation:Institute of Electronic and Information Engineering,Tianjin University,Tianjin 300072
Abstract:We bring forward the level-classified model,which puts together the alike class to become the construction of a tree form based on their similarity,so when deciding the class of a text,the comparison is from layer to layer,and this makes the times of comparison decreasing greatly,at the same time,because of the greater distinction of big category,again on a certain degree increasing the precision of classification.In consideration of the function of the headline of article is different to the text in deciding its class,we treat them separately in computing the value of similarity,and still when calculating the eigenvalue we use both TFIDF and MI algorithm,these are all the innovation of this thesis.The result of the research indicates that,the speed of level-classified algorithm is 15 percent quicker than the general algorithm,again on a certain degree can increase the precision of classification.
Keywords:text classification   vector space   precision   hierarchical classification
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号