首页 | 本学科首页   官方微博 | 高级检索  
     

基于类别特征改进的KNN短文本分类算法
引用本文:黄贤英,熊李媛,刘英涛,李沁东.基于类别特征改进的KNN短文本分类算法[J].计算机工程与科学,2018,40(1):148-154.
作者姓名:黄贤英  熊李媛  刘英涛  李沁东
作者单位:(重庆理工大学计算机科学与工程学院,重庆 400054)
基金项目:国家自然科学基金(11547148);教育部人文社会科学研究青年基金(15YJC790061);重庆市教委科学技术研究项目(16SKGH133)
摘    要:KNN短文本分类算法通过扩充短文本内容提高短文本分类准确率,却导致短文本分类效率降低。鉴于此,通过卡方统计方法提取训练空间中各类别的类别特征,根据训练空间中各类别样本与该类别特征的相似情况,对已有的训练空间进行拆分细化,将训练空间中的每个类别细化为多个包含部分样本的训练子集;然后针对测试文本,从细化后的训练空间中提取与测试文本相似度较高的类别特征所对应的训练子集的样本来重构该测试文本的训练集合,减少KNN短文本分类算法比较文本对数,从而提高KNN短文本分类算法的效率。实验表明,与基于知网语义的KNN短文本分类算法相比,本算法提高KNN短文本分类算法效率近50%,分类的准确性也有一定的提升。

关 键 词:短文本分类  KNN分类  类别特征  hownet  效率
收稿时间:2016-05-04
修稿时间:2018-01-25

An improved KNN short text classification algorithm based on category feature words
HUANG Xian-ying,XIONG Li-yuan,LIU Ying-tao,LI Qin-dong.An improved KNN short text classification algorithm based on category feature words[J].Computer Engineering & Science,2018,40(1):148-154.
Authors:HUANG Xian-ying  XIONG Li-yuan  LIU Ying-tao  LI Qin-dong
Affiliation:(College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China)
Abstract:The KNN classification algorithm improves the accuracy of short text classification by enlarging the content of short text. However, it leads to the decrease of classification efficiency on short text. Given this problem, we extract the category feature words in the categories of the training set by the CHI. According to the similarities between the samples of every classification and their features in the training set, the existing training set is split and refined. In this way, every classification of the training set can be split into many training subsets containing part of the samples. Then, according to the test text, the corresponding samples of the training subsets which are more similar to the test text are extracted to reconstruct the training sets of the test text. By decreasing the number of comparative text pairs in the KNN short text classification algorithm, the efficiency of the KNN short text classification algorithm can be increased. Experimental results show that comparing with the KNN short text classification algorithm based on HowNet, the efficiency of short text classification of the proposed algorithm can be increased by about 50 percent and the classification accuracy is also improved to some extent. Key words:
Keywords:
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号