首页 | 本学科首页   官方微博 | 高级检索  
     

基于全路径相似度的大规模层次分类算法
引用本文:朱建林,陈忠阳,张永俊,孙存一.基于全路径相似度的大规模层次分类算法[J].计算机工程与设计,2019,40(5):1300-1304,1333.
作者姓名:朱建林  陈忠阳  张永俊  孙存一
作者单位:中国人民大学财政金融学院,北京,100872;中国人民大学信息学院,北京,100872;北京大学光华管理学院,北京,100871
基金项目:国家自然科学基金;北京市自然科学基金
摘    要:为快速准确地实现大规模层次分类问题,提出词类区分度概念,并以此作为计算类向量的基础。基于类向量,以改进的Rocchio算法计算待分类文本与目标类的相似度,候选出N个最可能的目标类别;根据目标类别的层次拓扑结构,计算待分类文本与N个目标类别的全路径相似度,确定分类类别。实验结果表明,该方法分类效果优于传统算法,其基于文本类全路径相似度的策略明显改善了单纯基于词类区分度的分类算法。

关 键 词:词类区分度  全路径相似度  大规模层次分类  文本分类  化繁为简策略

Large scale hierarchical classification algorithm based on full-path similarity
ZHU Jian-lin,CHEN Zhong-yang,ZHANG Yong-jun,SUN Cun-yi.Large scale hierarchical classification algorithm based on full-path similarity[J].Computer Engineering and Design,2019,40(5):1300-1304,1333.
Authors:ZHU Jian-lin  CHEN Zhong-yang  ZHANG Yong-jun  SUN Cun-yi
Affiliation:(School of Finance,Renmin University of China,Beijing 100872,China;School of Information,Renmin University ofChina,Beijing 100872,China;Guanghua School of Management,Peking University,Beijing 100871,China)
Abstract:A large-scale hierarchical classification algorithm based on full-path similarity was proposed.The concept of word-class discrimination was proposed and used as the basis of class vector.An improved Rocchio algorithm was proposed to calculate text-class similarity and N most likely classes were selected as candidates.The full-path similarities between text and candidate classes were calculated according to hierarchical structure of classes,by which the classification results were determined.The time complexity of the algorithm was linearly correlated with the number of classes.Experimental results show that the effects of the algorithm are better.
Keywords:word-class discrimination  full-path similarity  large-scale hierarchical classification  text classification  simplify strategy
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号