首页 | 本学科首页   官方微博 | 高级检索  
     

统计流形扩散核的文本分类方法
引用本文:李侃,周世斌,刘玉树.统计流形扩散核的文本分类方法[J].模式识别与人工智能,2012,25(2):339-345.
作者姓名:李侃  周世斌  刘玉树
作者单位:1. 北京理工大学计算机学院 北京100081
2. 中国矿业大学计算机科学与技术学院 徐州221116
基金项目:国家自然科学基金(No.60903071);北京市重点学科基金(No.xk100070427)资助项目
摘    要:提出Dirichlet混合多项式(DCM)流形,并利用DCM流形可与正半球流形建立同胚和等距关系的性质,通过拉回映射将正半球流形的测地距离映射为DCM流形的测地距离,从而在DCM流形上建立距离度量,构建统计流形上的Dirichlet混合多项式扩散核和Dirichlet混合多项式倒排文档频率(DCMIDF)扩散核.利用WebKB Top4和20 Newsgroups语料库上进行实验,DCM流形能比欧氏空间更能准确地描述文本.与多项式核支持向量机算法、,负测地距离核支持向量机算法相比,实验结果显示文中基于DCM扩散核和DCMIDF扩散核的支持向量机算法可取得良好的文本分类效果.

关 键 词:统计流形  扩散核  Dirichlet分布  文本分类

Text Classification Using Diffusion Kernel on Statistical Manifold
LI Kan , ZHOU Shi-Bin , LIU Yu-Shu.Text Classification Using Diffusion Kernel on Statistical Manifold[J].Pattern Recognition and Artificial Intelligence,2012,25(2):339-345.
Authors:LI Kan  ZHOU Shi-Bin  LIU Yu-Shu
Affiliation:1(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081)(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou 221116)
Abstract:Dirichlet compound multinomial manifold(DCM manifold) is proposed.DCM manifold with positive sphere manifold is homeomorphic and isometric,so the geodesic distance of positive sphere manifold can be mapped as the geodesic distance of DCM manifold through pullback mapping.Then the distance metric is built on DCM manifold.DCM diffusion kernel function and DCMIDF diffusion kernel function are built on DCM manifold.The performance of the proposed algorithms for text classification are tested on the corpuses of WebKB Top 4 and 20 Newsgroups,and the experimental results show that DCM manifold is more desirable than that of Euclidean space in modeling texts on the corpuses.Compared with polynomial kernel based support vector machine and NGD kernel based support vector machine,the proposed DCM diffusion kernel and DCMIDF diffusion kernel based support vector machine algorithms show better computational accuracy for text classification.
Keywords:Statistical Manifold  Diffusion Kernel  Dirichlet Distribution  Text Classification
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号