首页 | 本学科首页   官方微博 | 高级检索  
     

基于支持向量的最近邻文本分类方法
引用本文:古丽娜孜·艾力木江1,2,3,乎西旦·居马洪1,孙铁利2,梁义1. 基于支持向量的最近邻文本分类方法[J]. 智能系统学报, 2018, 13(5): 799-807. DOI: 10.11992/tis.201711007
作者姓名:古丽娜孜·艾力木江1  2  3  乎西旦·居马洪1  孙铁利2  梁义1
作者单位:1. 伊犁师范学院 电子与信息工程学院, 新疆 伊宁 835000;2. 东北师范大学 计算机科学与技术学院, 吉林 长春 130117;3. 东北师范大学 地理科学学院, 吉林 长春 130024
摘    要:文本分类为一个文档自动分配一组预定义的类别或主题。文本分类中,文档的表示对学习机的学习性能有很大的影响。以实现哈萨克语文本分类为目的,根据哈萨克语语法规则设计实现哈萨克语文本的词干提取,完成哈萨克语文本的预处理。提出基于最近支持向量机的样本距离公式,避免k参数的选定,以SVM与KNN分类算法的特殊组合算法(SV-NN)实现了哈萨克语文本的分类。结合自己构建的哈萨克语文本语料库的语料进行文本分类仿真实验,数值实验展示了提出算法的有效性并证实了理论结果。

关 键 词:词干提取  预处理  支持向量机  文本分类  分类精度

The nearest neighbor text classification method based on support vector
GULNAZ Alimjan1,2,3,HURXIDA Jumahun1,SUN Tieli2,LIANG Yi1. The nearest neighbor text classification method based on support vector[J]. CAAL Transactions on Intelligent Systems, 2018, 13(5): 799-807. DOI: 10.11992/tis.201711007
Authors:GULNAZ Alimjan1  2  3  HURXIDA Jumahun1  SUN Tieli2  LIANG Yi1
Affiliation:1. Department of Electronics and Information Engineering, Yili Normal University, Yining 835000, China;2. School of Information Science and Technology, Northeast Normal University, Changchun 130117, China;3. Department of Geographical Science, Nor
Abstract:Text categorization automatically assigns a set of predefined categories or topics to a document. In text classification, the representation of the document has a great influence on the learning performance of the learning machine. The aim is to achieve Kazakh text classification, according to Kazakh grammar rules, the stemming of Kazakh texts is designed to complete the preprocessing of Kazakh text. A sample distance formula based on the latest support vector machine (SVM) is proposed to avoid the selection of k-parameters. The Kazakh texts are classified by special combination of SVM and KNN classification algorithms (SV-NN). Combining the corpus of Kazakh text corpora constructed by himself, text categorization simulation experiments were conducted. Numerical experiments showed the effectiveness of the proposed algorithm and confirmed the theoretical results.
Keywords:stemming   preprocessing   support vector machines   text categorization   classification accuracy
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号