首页 | 本学科首页   官方微博 | 高级检索  
     

基于粗糙集的文本分类方法研究
引用本文:卢娇丽,郑家恒.基于粗糙集的文本分类方法研究[J].中文信息学报,2005,19(2):67-71.
作者姓名:卢娇丽  郑家恒
作者单位:山西大学计算机与信息技术学院,山西太原 030006
摘    要:本文旨在利用粗糙集优越的约简理论对文本进行分类。主要完成了以下几个方面的任务:对文本进行了预处理;改进了Okapi 权重计算公式,并对权值进行了离散化;实现了属性约简和规则抽取,首先利用区分矩阵对特征向量维数进行了初次压缩,然后通过相对约简计算再次压缩了特征向量维数,并生成了决策规则;采取了规则合成的策略,生成最终的决策规则;设计了一种文本与规则的匹配算法,使匹配过程尽可能简单有序。试验结果表明该方法是行之有效的。

关 键 词:人工智能  自然语言处理  文本分类  粗糙集  决策规则  
文章编号:1003-0077(2005)02-0066-05
修稿时间:2004年6月20日

The Research of Text Categorization Bssed on Rough Set
LU Jiao-li,ZHENG Jia-heng.The Research of Text Categorization Bssed on Rough Set[J].Journal of Chinese Information Processing,2005,19(2):67-71.
Authors:LU Jiao-li  ZHENG Jia-heng
Affiliation:Institute of computer and information technology , Taiyuan ,Shanxi 030006 ,China
Abstract:This paper is to fulfill text categorization tasks by using the perfect reduction theory of rough set. It mainly finished the following several jobs. Pretreated the documents. Improved the Okapi term weighting formula. It also separated the term weighting and completed attributes reduction and rules extraction tasks. Firstly it reduced the feature vector dimensions by using discernible matrix. Then reduced it again by computing relative reductions. Finally it produced the decision rules and employed the rule combined tactics to produce the final decision rules. Designed an algorithm for matching documents to rules so that the matching procession could be as simple and orderly as possible. The results of the experiment indicate that the approach is effective.
Keywords:artificial intelligence  natural language processing  text categorization  rough set  decision rule
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号