首页 | 本学科首页   官方微博 | 高级检索  
     


Using the absolute difference of term occurrence probabilities in?binary text categorization
Authors:Hakan Alt?n?ay  Zafer Erenel
Affiliation:(1) The Electronic and Information Engineering School, Xi’an Jiaotong University, Xi’an, China;(2) Internet Education School, Xi’an Jiaotong University, Xi’an, China;(3) MOE KLINNS Lab and SPKLSTN Lab, Department of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
Abstract:In this study, the differences among widely used weighting schemes are studied by means of ordering terms according to their discriminative abilities using a recently developed framework which expresses term weights in terms of the ratio and absolute difference of term occurrence probabilities. Having observed that the ordering of terms is dependent on the weighting scheme under concern, it is emphasized that this can be explained by the way different schemes use term occurrence differences in generating term weights. Then, it is proposed that the relevance frequency which is shown to provide the best scores on several datasets can be improved by taking into account the way absolute difference values are used in other widely used schemes. Experimental results on two different datasets have shown that improved F 1 scores can be achieved.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号