Weakly supervised temporal action localization with proxy metric modeling |
| |
Authors: | Hongsheng XU Zihan CHEN Yu ZHANG Xin GENG Siya MI Zhihong YANG |
| |
Affiliation: | 1. NARI Group Corporation (State Grid Electric Power Research Institute), Nanjing 211106, China2. School of Computer Science and Engineering, and the Key Lab of Computer Network and Information Integration (Ministry of Education), Southeast University, Nanjing 211189, China3. School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China4. Purple Mountain Laboratories, Nanjing 211111, China |
| |
Abstract: | Temporal localization is crucial for action video recognition. Since the manual annotations are expensive and time-consuming in videos, temporal localization with weak video-level labels is challenging but indispensable. In this paper, we propose a weakly-supervised temporal action localization approach in untrimmed videos. To settle this issue, we train the model based on the proxies of each action class. The proxies are used to measure the distances between action segments and different original action features. We use a proxy-based metric to cluster the same actions together and separate actions from backgrounds. Compared with state-of-the-art methods, our method achieved competitive results on the THUMOS14 and ActivityNet1.2 datasets. |
| |
Keywords: | temporal action localization weakly supervised videos proxy metric |
|
| 点击此处可从《Frontiers of Computer Science》浏览原始摘要信息 |
|
点击此处可从《Frontiers of Computer Science》下载全文 |
|