首页 | 本学科首页   官方微博 | 高级检索  
     


Text classification using a few labeled examples
Affiliation:1. GEOMAR Helmholtz Centre for Ocean Research Kiel, Wischhofstr. 1-3, D-24 148 Kiel, Germany;2. Geoscience Centre, University of Göttingen, Goldschmidtstr. 3, D-37077 Göttingen, Germany;1. Aerospace Research Institute, Ministry of Science, Research and Technology, Tehran 14665-834, Iran;2. Audio & Speech Processing Lab, Computer Engineering Department, Iran University of Science & Technology, Tehran, Iran;3. Computer Engineering Department, K.N. Toosi University of Technology, Tehran, Iran
Abstract:Supervised text classifiers need to learn from many labeled examples to achieve a high accuracy. However, in a real context, sufficient labeled examples are not always available because human labeling is enormously time-consuming. For this reason, there has been recent interest in methods that are capable of obtaining a high accuracy when the size of the training set is small.In this paper we introduce a new single label text classification method that performs better than baseline methods when the number of labeled examples is small. Differently from most of the existing methods that usually make use of a vector of features composed of weighted words, the proposed approach uses a structured vector of features, composed of weighted pairs of words.The proposed vector of features is automatically learned, given a set of documents, using a global method for term extraction based on the Latent Dirichlet Allocation implemented as the Probabilistic Topic Model. Experiments performed using a small percentage of the original training set (about 1%) confirmed our theories.
Keywords:Text mining  Text classification  Term extraction  Probabilistic topic  Model  Data mining
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号