首页 | 本学科首页   官方微博 | 高级检索  
     


Neighbor-weighted K-nearest neighbor for unbalanced text corpus
Authors:Songbo Tan  
Affiliation:

aSoftware Department, Institute of Computing Technology, Chinese Academy of Sciences, P.O. Box 2704, Beijing 100080, People's Republic of China

bGraduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China

Abstract:Text categorization or classification is the automated assigning of text documents to pre-defined classes based on their contents. Many of classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many practical applications. In order to deal with uneven text sets, we propose the neighbor-weighted K-nearest neighbor algorithm, i.e. NWKNN. The experimental results indicate that our algorithm NWKNN achieves significant classification performance improvement on imbalanced corpora.
Keywords:Text classification   K-Nearest neighbor (KNN)   Information retrieval   Data mining
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号