首页 | 本学科首页   官方微博 | 高级检索  
     

基于多示例学习框架的文本分类算法
引用本文:徐建国,肖海峰,赵华.基于多示例学习框架的文本分类算法[J].计算机工程与设计,2020,41(4):1017-1023.
作者姓名:徐建国  肖海峰  赵华
作者单位:山东科技大学计算机科学与工程学院,山东青岛266590;山东科技大学计算机科学与工程学院,山东青岛266590;山东科技大学计算机科学与工程学院,山东青岛266590
基金项目:教育部人文社会科学研究项目;国家社会科学基金
摘    要:针对有特殊结构的文本,传统的文本分类算法已经不能满足需求,为此提出一种基于多示例学习框架的文本分类算法。将每个文本当作一个示例包,文本中的标题和正文视为该包的两个示例;利用基于一类分类的多类分类支持向量机算法,将包映射到高维特征空间中;引入高斯核函数训练分类器,完成对无标记文本的分类预测。实验结果表明,该算法相较于传统的机器学习分类算法具有更高的分类精度,为具有特殊文本结构的文本挖掘领域研究提供了新的角度。

关 键 词:文本分类  多示例学习  支持向量机  多类分类方法  高斯核函数

Text classification algorithm based on multi-instance learning framework
XU Jian-guo,XIAO Hai-feng,ZHAO Hua.Text classification algorithm based on multi-instance learning framework[J].Computer Engineering and Design,2020,41(4):1017-1023.
Authors:XU Jian-guo  XIAO Hai-feng  ZHAO Hua
Affiliation:(College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China)
Abstract:The traditional text categorization algorithms are unable to deal with the text with special structure.A text classification algorithm based on multi-instance learning was presented.Each text was taken as a sample package,and the title and body in the text were treated as two examples of the package.A packet was mapped into a high-dimensional feature space using a multi-class classification support vector machine algorithm based on one-class classification.Gaussian kernel function training classifier was introduced to conduct classification prediction of unmarked text.Experimental results show that the proposed algorithm has higher classification accuracy than the traditional machine learning classification algorithm,and it provides a new perspective for the research of text mining in special text structure.
Keywords:text classification  multi-instance learning  SVM  multi-class classification method  Gaussian kernel function
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号