On Text-based Mining with Active Learning and Background Knowledge Using SVM |
| |
Authors: | Catarina Silva Bernardete Ribeiro |
| |
Affiliation: | (1) Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium |
| |
Abstract: | Text mining, intelligent text analysis, text data mining and knowledge-discovery in text are generally used aliases to the
process of extracting relevant and non-trivial information from text. Some crucial issues arise when trying to solve this
problem, such as document representation and deficit of labeled data. This paper addresses these problems by introducing information
from unlabeled documents in the training set, using the support vector machine (SVM) separating margin as the differentiating
factor. Besides studying the influence of several pre-processing methods and concluding on their relative significance, we
also evaluate the benefits of introducing background knowledge in a SVM text classifier. We further evaluate the possibility
of actively learning and propose a method for successfully combining background knowledge and active learning. Experimental
results show that the proposed techniques, when used alone or combined, present a considerable improvement in classification
performance, even when small labeled training sets are available. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|