首页 | 本学科首页   官方微博 | 高级检索  
     

基于TFIDF的特征选择方法
引用本文:王美方,刘培玉,朱振方. 基于TFIDF的特征选择方法[J]. 计算机工程与设计, 2007, 28(23): 5795-5796,5799
作者姓名:王美方  刘培玉  朱振方
作者单位:山东师范大学,信息科学与工程学院,山东,济南,250014;山东师范大学,信息科学与工程学院,山东,济南,250014;山东师范大学,信息科学与工程学院,山东,济南,250014
摘    要:在文本分类系统中,特征选择方法是一种有效的降维方法.在分析了几种常用的特征选择评价函数之后,将权值计算函数应用于特征选择,并基于改进的TFIDF方法提出了一种新的评价函数,它将类别信息引入到特征项中,提取出与类别相关的特征项,弥补了TFIDF的缺陷.实验证明该方法简单可行,有助于提高所选特征子集的有效性.

关 键 词:特征选择  术语频率  逆文档频率  文本分类  评价函数
文章编号:1000-7024(2007)23-5795-02
收稿时间:2007-03-18
修稿时间:2007-03-18

Feature selection method based on TFIDF
WANG Mei-fang,LIU Pei-yu,ZHU Zhen-fang. Feature selection method based on TFIDF[J]. Computer Engineering and Design, 2007, 28(23): 5795-5796,5799
Authors:WANG Mei-fang  LIU Pei-yu  ZHU Zhen-fang
Abstract:Feature selection is a valid method to reduce the dimension of vector in text categorization system. After analyzed several common evaluation functions for feature selection, terms weight function is applied in feature selection, A new evaluation function based on improved TFIDF method is presented. The category information is introduced to feature items in this new method. The feature items of relevant categories are selected to make up the shortcomings of the TFIDF. Experiments proved that the method is simple and feasible. It's advantageous in imoroving the efficiency of the selected feature subset.
Keywords:feature selection  term frequency  inverse document frequency  text categorization  evaluation function
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号