一种改进的分类算法在不良信息过滤中的应用 Application of a Improved Categorization Algorithm in the Malicious Information Filtering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种改进的分类算法在不良信息过滤中的应用

引用本文：	刘志刚,杜娟,衣治安.一种改进的分类算法在不良信息过滤中的应用[J].微计算机应用,2011,32(2):9-14.

作者姓名：	刘志刚杜娟衣治安

作者单位：	东北石油大学,计算机与信息技术学院,大庆,163318

摘要：	使用KNN(K Nearest Neighbor)分类算法进行不良文本信息过滤时,由于包含不良信息的样本不易获取,导致分类器预测结果严重倾向于多数类。为改善少数类过滤效果,从数据层的角度改进了传统的KNN算法:先将少数类样本聚类分组,再在每个聚类内部使用遗传交叉生成新样本,并验证其有效性,最终获取到各类别样本数量基本均衡的训练样本集合并训练KNN分类器。实验结果表明,本文的方法可有效识别不良文本。此方法同时适用于其他关注少数类分类精度的不均衡数据集分类问题。
关键词：	不均衡数据集样本生成分类不良文本信息过滤聚类遗传交叉
Application of a Improved Categorization Algorithm in the Malicious Information Filtering

LIU Zhigang,DU Juan,YI Zhian.Application of a Improved Categorization Algorithm in the Malicious Information Filtering[J].Microcomputer Applications,2011,32(2):9-14.

Authors:	LIU Zhigang DU Juan YI Zhian

Affiliation:	LIU Zhigang,DU Juan,YI Zhian(Computer and Information Technology College,Northeast PetroleumUnivesitye,Daqing,16331,China)

Abstract:	The prediction result of classifier was biased towards the class with more samples,because of the samples that including the malicious information were difficult to gain when using the KNN(K Nearest Neighbor) categorization algorithm to filter the malicious information.In order to improve the filter effect of the class with fewer samples,the traditional KNN algorithm was improved from the data angle: the class with fewer samples was grouped by using cluster algorithm,then the genetic crossover operator was ...

Keywords:	imbalanced data sets generate samples classify malicious text information filtering clustering genetic crossover
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏