Active learning through density clustering |
| |
Affiliation: | 1. School of Electrical Engineering, Southwest Petroleum University, Chengdu 610500, China;2. Department of Computer Science, Southwest Petroleum University, Chengdu 610500, China;1. Decision Science Institute, School of Economics and Management, Fuzhou University, 2 Xueyuan road, Fuzhou 350116, China;2. School of Mathematics and Computer Science, Fuzhou University, 2 Xueyuan road, Fuzhou 350116, China;1. University of Nova Gorica, Nova Gorica, Slovenia;2. Jožef Stefan Institute, Ljubljana, Slovenia;3. Temida d.o.o., Ljubljana, Slovenia;1. Pattern Recognition Laboratory, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands;2. College of Information System and Management, National University of Defense Technology, Changsha, China;3. Image Group, University of Copenhagen,Universitetsparken 5, DK-2100, Denmark;1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China;2. School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China |
| |
Abstract: | Active learning is used for classification when labeling data are costly, while the main challenge is to identify the critical instances that should be labeled. Clustering-based approaches take advantage of the structure of the data to select representative instances. In this paper, we developed the active learning through density peak clustering (ALEC) algorithm with three new features. First, a master tree was built to express the relationships among the nodes and assist the growth of the cluster tree. Second, a deterministic instance selection strategy was designed using a new importance measure. Third, tri-partitioning was employed to determine the action to be taken on each instance during iterative clustering, labeling, and classifying. Experiments were performed with 14 datasets to compare against state-of-the-art active learning algorithms. Results demonstrated that the new algorithm had higher classification accuracy using the same number of labeled data. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|