首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于概念层次的文本特征权重计算方法
引用本文:毛林,杨学兵.一种基于概念层次的文本特征权重计算方法[J].安徽工业大学学报,2008,25(3):329-333.
作者姓名:毛林  杨学兵
作者单位:[1]安徽工业大学计算机学院,安徽马鞍山243002 [2]南京大学计算机科学与技术系,南京210093
基金项目:安徽省教育厅自然科学基金重点项目
摘    要:特征权重计算是文本表示的关键,权重计算方法的优劣直接影响文本分类和聚类的准确度。基于词形和词频统计的特征加权方法过于近似和粗糙,不能有效突出具有较强类别区分度的重要特征,难以有效区分两类特征,造成了高维稀疏问题,使文本分类性能不够理想,这是特征权重计算的主要障碍。提出一种基于概念层次的特征权重计算方法,将词空间转移为概念空间,在概念层次上引入特征支持度与类别强度两个参数对特征权重进行调整。实验表明,新的方法表现了较好的分类性能,在空间维度的压缩与计算效率上也有明显的改善。

关 键 词:概念空间  特征权重  概念层次  特征支持度  类别强度
文章编号:1671-7872(2008)03-0329-05
修稿时间:2007年10月18

An Approach for Text Feature Weighting Computation Based on Concept Hierarchy
MAO Lin,YANG Xue-bing.An Approach for Text Feature Weighting Computation Based on Concept Hierarchy[J].Journal of Anhui University of Technology,2008,25(3):329-333.
Authors:MAO Lin  YANG Xue-bing
Affiliation:MAO Lin, YANG Xue-bing (1.School of Computer Science, Anhui University of Technology, Ma'. anshan 243002, China;2. Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China)
Abstract:Feature weighting computation belongs to one of key problems in text document representation. Performance of feature weighting computation directly influences precision of text classification or clustering. Morphology and term frequency statistics-based feature weighting approach may suffer from ambiguity and roughness, also be incapable of giving prominence to important features with category differentiating ability. Meanwhile, traditional approach may be faced with difficulty of distinguishing between important features and otherwise. All above issues may bring forth high dimension and sparseness, and suffer from poor performance on text classification or clustering. A new concept hierarchy-based feature weighting, which introduces feature support and categorical intensity for feature weighting adjustment, is put forward. Results from experiment indicate new method performs better than traditional one on precision, vector space dimension and computation efficiency.
Keywords:concept space  feature weighting  concept hierarchy  feature support  categorical intensity
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号