首页 | 本学科首页   官方微博 | 高级检索  
     

基于特征词权重的文本分类
引用本文:杨莉,万常选,雷刚,俞涛,孔保新.基于特征词权重的文本分类[J].计算机与现代化,2012(10):8-13.
作者姓名:杨莉  万常选  雷刚  俞涛  孔保新
作者单位:[1]江西财经大学信息管理学院,江西南昌330013 [2]江西财经大学数据与知识工程江西省高校重点实验室,江西南昌330013
基金项目:基金项目:国家自然科学基金资助项目(61173146);国家社会科学基金资助项目(12CTQ042);江西省自然科学基金资助项目(2010GZS0067);江西省教育厅科技重点项目(GJJ09650)
摘    要:在文本分类时,只有少数学者利用特征词权重对文本进行向量表示,但是所使用的特征选择算法没有考虑特征词权重的正负及其范围等。因此,本文在CHI统计基础上提出一种计算特征词类相关性的新方法,并根据各类特征集中包含的特征词的数量,选用不同的文本类相关性计算方法;在判定文本类别过程中,只使用文本包含的特征词的个数及其类相关性,对含特征词少的文本也能很好判别。实验表明,该方法有效可行。

关 键 词:文本分类  特征选择  特征词类相关性  文本类相关性

Text Classification Based on Weight of Feature Words
YANG Li,WAN Chang-xuan,LEI Gang,YU Tao,KONG Bao-xin.Text Classification Based on Weight of Feature Words[J].Computer and Modernization,2012(10):8-13.
Authors:YANG Li  WAN Chang-xuan  LEI Gang  YU Tao  KONG Bao-xin
Affiliation:1.School of Information and Technology,Jiangxi University of Finance and Economics,Nanchang 330013,China; 2.Jiangxi Key Laboratory of Data and Knowledge Engineering,Jiangxi University of Finance and Economics,Nanchang 330013,China))
Abstract:In text classification,only a few scholars used the weight of feature words to express text,but the method of feature selection they used didn't consider the symbol and boundary of the weight of feature words.So,on the basis of CHI statistics,this paper proposes a new way to calculate correlation-score between feature words and classification;and selects different means to get the relevance between text and classification,according to the count of feature words in each feature set.At last,in order to determine the text category,this paper just applies the number of feature words and their relevance to category,and can well judge the text contained few feature words.Experiment shows that it is an effective and feasible method to classify text.
Keywords:text classification  feature selection  correlation-score between feature words and classification  correlation-score between text and classification
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号