首页 | 本学科首页   官方微博 | 高级检索  
     


Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?
Authors:Edda Leopold  Jörg Kindermann
Affiliation:(1) GMD German National Research Center for Information Technology, Institute for Autonomous intelligent Systems, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany
Abstract:The choice of the kernel function is crucial to most applications of support vector machines. In this paper, however, we show that in the case of text classification, term-frequency transformations have a larger impact on the performance of SVM than the kernel itself. We discuss the role of importance-weights (e.g. document frequency and redundancy), which is not yet fully understood in the light of model complexity and calculation cost, and we show that time consuming lemmatization or stemming can be avoided even when classifying a highly inflectional language like German.
Keywords:support vector machines  text classification  lemmatization  stemming  kernel functions
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号