基于粗糙集的文本分类方法研究 The Research of Text Categorization Bssed on Rough Set期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于粗糙集的文本分类方法研究

引用本文：	卢娇丽,郑家恒.基于粗糙集的文本分类方法研究[J].中文信息学报,2005,19(2):67-71.

作者姓名：	卢娇丽郑家恒

作者单位：	山西大学计算机与信息技术学院,山西太原　030006

摘要：	本文旨在利用粗糙集优越的约简理论对文本进行分类。主要完成了以下几个方面的任务:对文本进行了预处理;改进了Okapi 权重计算公式,并对权值进行了离散化;实现了属性约简和规则抽取,首先利用区分矩阵对特征向量维数进行了初次压缩,然后通过相对约简计算再次压缩了特征向量维数,并生成了决策规则;采取了规则合成的策略,生成最终的决策规则;设计了一种文本与规则的匹配算法,使匹配过程尽可能简单有序。试验结果表明该方法是行之有效的。
关键词：	人工智能自然语言处理文本分类粗糙集决策规则
文章编号：	1003-0077(2005)02-0066-05
修稿时间：	2004年6月20日
The Research of Text Categorization Bssed on Rough Set

LU Jiao-li,ZHENG Jia-heng.The Research of Text Categorization Bssed on Rough Set[J].Journal of Chinese Information Processing,2005,19(2):67-71.

Authors:	LU Jiao-li ZHENG Jia-heng

Affiliation:	Institute of computer and information technology , Taiyuan ,Shanxi 030006 ,China

Abstract:	This paper is to fulfill text categorization tasks by using the perfect reduction theory of rough set. It mainly finished the following several jobs. Pretreated the documents. Improved the Okapi term weighting formula. It also separated the term weighting and completed attributes reduction and rules extraction tasks. Firstly it reduced the feature vector dimensions by using discernible matrix. Then reduced it again by computing relative reductions. Finally it produced the decision rules and employed the rule combined tactics to produce the final decision rules. Designed an algorithm for matching documents to rules so that the matching procession could be as simple and orderly as possible. The results of the experiment indicate that the approach is effective.

Keywords:	artificial intelligence natural language processing text categorization rough set decision rule
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏