基于不平衡数据集的文本分类技术研究 Unbalanced Data sets Based on the Text Classification Technology Research期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于不平衡数据集的文本分类技术研究

引用本文：	白凤凤.基于不平衡数据集的文本分类技术研究[J].电脑编程技巧与维护,2010(6):21-22,29.

作者姓名：	白凤凤

作者单位：	山西省吕梁高等专科学校计算机系,离石,033000

摘要：	文本自动分类是数据挖掘和信息检索的核心技术,也是研究热点。在实际的应用中,时常会出现文本数据量很大,但是对人们有用的信息仅占一小部分,这种某类样本数量明显少于其他类样本数量的数据就是不平衡数据集。不平衡数据集可以分类为少数类和多数类。传统方法对少数类的识别率比较低,如何有效地提高少数类的分类性能成为了模式识别和机器学习必须解决的问题。就提高不平衡数据集的少数类文本的分类性能问题,从数据层面处理角度对数据进行了重抽样,采用随机抽样的办法来提高分类器在不平衡数据集的泛化性能。
关键词：	文本自动分类不平衡数据集少数类
Unbalanced Data sets Based on the Text Classification Technology Research

BAI Fengfeng.Unbalanced Data sets Based on the Text Classification Technology Research[J].Computer Programming Skills & Maintenance,2010(6):21-22,29.

Authors:	BAI Fengfeng

Affiliation:	Department of Computer Science;High College of Shanxi Lvliang;Lishi 033000

Abstract:	Automatic text classification is a core technology in data mining and information retrieval community,but also research focus.In practical applications,the text will appear from time to time large amounts of data,but useful information on people only a small part of them,such data that certain number of samples was less than the number of other types of samples is called unbalanced data sets.Unbalanced data sets can be classified as a small number of classes and the majority of classes.The recognition rate ...

Keywords:	Automatic text categorization Unbalanced data set A small number of class
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏