首页 | 本学科首页   官方微博 | 高级检索  
     

向量空间法中单词权重函数的分析和构造
引用本文:陆玉昌,鲁明羽,李凡,周立柱. 向量空间法中单词权重函数的分析和构造[J]. 计算机研究与发展, 2002, 39(10): 1205-1210
作者姓名:陆玉昌  鲁明羽  李凡  周立柱
作者单位:清华大学计算机科学与技术系,北京,100084
基金项目:到国家重点基础研究发展规划项目基金 (G19980 3 0 414 ),国家自然科学基金项目 (79990 5 80 ),清华大学信息学院基础创新研究基金项目资助
摘    要:文本分类是文本挖掘的基础与核心,是近年来数据挖掘和网络挖掘的一个研究热点,在传统的情报检索、网站索引体系结构的建立和Web信息检索等方面占有重要地位,深入分析了一种简单而常用的经典文本分类模型--向量空间模型(vector space model,VSM)--的实质,找出了其分类精度低的原因,提出了一种利用特征筛选中的评估函数代替IDF函数进行权值调整的方法,并对采用各种不同评估函数进行权值调整的性能进行了理论分析和实验比较,提出了一种构造新的高性能评估函数的新颖方法。

关 键 词:向量空间法 单词权重函数 分析 构造 向量空间模 权重调整 文本分类 数据挖掘 数据库

ANALYSIS AND CONSTRUCTION OF WORD WEIGHING FUNCTION IN VSM
LU Yu-Chang,LU Ming-Yu,LI Fan,and ZHOU Li-Zhu. ANALYSIS AND CONSTRUCTION OF WORD WEIGHING FUNCTION IN VSM[J]. Journal of Computer Research and Development, 2002, 39(10): 1205-1210
Authors:LU Yu-Chang  LU Ming-Yu  LI Fan  and ZHOU Li-Zhu
Abstract:Text classification is the basis and core of text mining, and plays an important rule in traditional information retrieval, construction of website architecture, and search for web information. It has become a hot research project in recent years. In this paper, the hypostasis of VSM (vector space model), a kind of frequently-used classical text classification model, is analyzed to find the reason for its low classification precision, and a weight adjustment method is put forward in which the IDF function is replaced by evaluation function used in feature selection. Also made are theoretic analysis and experimental comparison with the performance of weight adjustment using various evaluation functions. And a novel approach to construct a new high-powered evaluation function is presented.
Keywords:VSM   weight adjustment   text categorization
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号