一种基于粗糙集文本自动分类的改进算法 Improved algorithm of automatic classification based on rough set期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于粗糙集文本自动分类的改进算法

引用本文：	张保富,施化吉.一种基于粗糙集文本自动分类的改进算法[J].计算机工程与应用,2011,47(24):129-131.

作者姓名：	张保富施化吉

作者单位：	江苏大学计算机科学与通信工程学院，江苏镇江 212013

基金项目：	国家自然科学基金，国家火炬计划项目

摘要：	自动文本分类的效果在很大程度上依赖于属性特征的选择。针对传统基于频率阈值过滤的特征选择方法会导致有效信息丢失，影响分类精度的不足，提出了一种基于粗糙集的文本自动分类算法。该方法对加权后的特征属性进行离散化，建立一个决策表;根据基于依赖度的属性重要度对决策表中条件属性进行适当的筛选;采用基于条件信息熵的启发式算法实现文本属性特征的约简。实验结果表明，该方法能约简大量冗余的特征属性，在不降低分类精度的同时，提高文本分类的运行效率。
关键词：	粗糙集属性约简文本分类
修稿时间：
Improved algorithm of automatic classification based on rough set

ZHANG Baofu,SHI Huaji.Improved algorithm of automatic classification based on rough set[J].Computer Engineering and Applications,2011,47(24):129-131.

Authors:	ZHANG Baofu SHI Huaji

Affiliation:	Department of Computer Science and Telecommunication Engineering，Jiangsu University，Zhenjiang，Jiangsu 212013，China

Abstract:	The affect of automatic text categorization mostly relies on the selection of attribute feature.Aiming at the prob-lem that the traditional feature selection method which filters features using frequency threshold would result in information loss and reduce the classification precision,a novel automatic text categorization method based on rough set is proposed.In the proposed method,the weighted attribute features discretization is carried out to form a decision table;selection of condi-tional attributes at the decision table is carried out on the basis of attribute significance which is based on dependency de-gree;the reduction of text attribute features is performed by heuristic algorithm which is based on conditional information en-tropy.Experimental results show that the proposed method removes large number of redundant attribute features,and improves the performance of text categorization without reducing classification precision.

Keywords:	rough set attribute reduction text classification
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏