Multiclass classification with bandit feedback using adaptive regularization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Multiclass classification with bandit feedback using adaptive regularization

Authors:	Koby Crammer Claudio Gentile

Affiliation:	1. Department of Electrical Engineering, The Technion, Haifa, 32000, Israel 2. DICOM, Universita’ dell’Insubria, Via Mazzini 5, 21100, Varese, Italy

Abstract:	We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit indicating whether the predicted label is correct or not, rather than the true label. Our algorithm is based on the second-order Perceptron, and uses upper-confidence bounds to trade-off exploration and exploitation, instead of random sampling as performed by most current algorithms. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model which is also chosen adversarially. We show a regret of $mathcal{O}(sqrt{T}log T)$ , which improves over the current best bounds of $mathcal{O}(T^{2/3})$ in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems and on four vowel recognition tasks, often obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.

Keywords:
本文献已被 SpringerLink 等数据库收录！