首页 | 本学科首页   官方微博 | 高级检索  
     

文本分类中结合评估函数的TEF-WA权值调整技术
引用本文:唐焕玲, 孙建涛, 陆玉昌. 文本分类中结合评估函数的TEF-WA权值调整技术[J]. 计算机研究与发展, 2005, 42(1): 47-53.
作者姓名:唐焕玲  孙建涛  陆玉昌
作者单位:1(烟台职业学院计算机与信息工程系 烟台 264025) 2(清华大学计算机科学与技术系 北京 100084) (thl01@163.com)
基金项目:国家自然科学基金重大项目(79990584)国家"九七三"重点基础研究发展规划基金项目(G1998030414)
摘    要:文本自动分类面临的难题之一是如何从高维的特征空间中选取对文本分类有效的特征,以适应文本分类算法并提高分类精度.针对这一问题,在分析比较特征选择和权值调整对文本分类精度和效率的影响后,提出了一种结合评估函数的TEF-WA权重调整技术,设计了一种新的权重函数,将特征评估函数蕴含到权值函数,按照特征对文本分类的辨别能力调整其在分类器中的贡献.实验结果证明了TEF-WA权值调整技术在提高分类精度和降低算法的时间复杂度方面都是有效的.

关 键 词:向量空间模型(VSM)  特征选择  权重调整  特征评估函数  文本分类

A Weight Adjustment Technique with Feature Weight Function Named TEF-WA in Text Categorization
Tang Huanling, Sun Jiantao, Lu Yuchang. A Weight Adjustment Technique with Feature Weight Function Named TEF-WA in Text Categorization[J]. Journal of Computer Research and Development, 2005, 42(1): 47-53.
Authors:Tang Huanling  Sun Jiantao  Lu Yuchang
Affiliation:1(Department of Computer and Information Engineering, Yantai Vocational Institute, Yantai 264025) 2(Department of Computer Science and Technology, Tsinghua University, Beijing 100084)
Abstract:Text categorization (TC) is an important research direction in Text Mining. It aims to assign one or more predefined category label(s) for a text document, and provides efficient methods for documents management and information searching. A major problem in automatic text categorization is how to select the best feature subset from the original high feature space in order to make the categorization algorithm work efficiently and improve the precision. In this paper, the methods of feature selection and weight adjustment techniques are discussed and analyzed, and their influence on text classification precision and efficiency is pointed out. Furthermore, the TEF-WA (term evaluation function-weight adjustment) is introduced. We introduce a new weight function, which includes feature weight evaluation function and adjusts the effect of the feature term in the classifier according to the feature term's strength. To evaluate the TEF-WA method, experiments are carried by using several different scale training document collection, various term evaluation functions such as document frequency, information gain, expected cross entropy, CHI, the weight of evidence for text, term frequency formula or document frequency formula. The experiment results have proved that the TEF-WA technique is efficient in promoting the classification precision and reducing the compute complexity.
Keywords:vector space model  feature selection  weight adjustment techniques  feature evaluation function  text categorization
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号