首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于CHI值特征选取的粗糙集文本分类规则抽取方法
引用本文:王明春,王正欧,张楷,郝玺龙.一种基于CHI值特征选取的粗糙集文本分类规则抽取方法[J].计算机应用,2005,25(5):1026-1028,1033.
作者姓名:王明春  王正欧  张楷  郝玺龙
作者单位:天津大学,系统工程研究所,天津,300072;天津工程师范学院,数理系,天津,300222;天津大学,系统工程研究所,天津,300072;天津工程师范学院,数理系,天津,300222;天津海量软件公司,天津,300384
基金项目:国家自然科学基金资助项目(60275020)
摘    要:结合文本分类规则抽取的特点,给出了近似规则的定义。该方法首先利用CHI值进行特征选取并为下一步特征选取提供特征重要性信息,然后使用粗糙集对离散决策表继续进行特征选取,最后用粗糙集抽取出精确规则或近似规则。该方法将CHI值特征选取和粗糙集理论充分结合,避免了用粗糙集对大规模决策表进行特征约简,同时避免了决策表的离散化。该方法提高了文本规则抽取的效率,并使其更趋实用化。实验结果表明了这种方法的有效性和实用性。

关 键 词:CHI值  特征选取  粗糙集  文本分类规则
文章编号:1001-9081(2005)05-1026-03

Rough set text classification rule extraction based on CHI value
WANG Ming-chun,WANG Zheng-ou,ZHANG Kai,HAO Xi-long.Rough set text classification rule extraction based on CHI value[J].journal of Computer Applications,2005,25(5):1026-1028,1033.
Authors:WANG Ming-chun  WANG Zheng-ou  ZHANG Kai  HAO Xi-long
Affiliation:WANG Ming-chun~
Abstract:The definition of proximate rule was proposed based on the characteristic of text classification rule extraction. Based on the CHI values, the features of text set were selected firstly and feature significance information was provided to the further feature selection. Then rough set was used to select further the attributes on the discrete decision table. Finally precise rules or proximate rules were extracted using rough set theory. The method combined CHI value feature selection and rough set theory fully so as to avoid both feature reduction on a large scale decision table and the discretization of the decision table. The method improved the effectiveness and the practicability of extracting text rule greatly. Experiment results demonstrate the effectiveness of the method.
Keywords:CHI value  feature selection  rough set  text classification rule
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号