首页 | 本学科首页   官方微博 | 高级检索  
     


Improved binaural sound localization and tracking for unknown time-varying number of speakers
Authors:Ui-Hyun Kim  Hiroshi G Okuno
Affiliation:1. Department of Intelligence Science and Technology , Graduate School of Informatics, Kyoto University , Yoshida-honmachi, Sakyo-ku , Kyoto , 606-8501 , Japan euihyun@kuis.kyoto-u.ac.jp;3. Department of Intelligence Science and Technology , Graduate School of Informatics, Kyoto University , Yoshida-honmachi, Sakyo-ku , Kyoto , 606-8501 , Japan
Abstract:A method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) has been developed for binaural sound source localization (SSL) and tracking of multiple sound sources. Accurate binaural audition is important for applying inexpensive and widely applicable auditory capabilities to robots and systems. Conventional SSL based on the GCC-PHAT method is degraded by low resolution of the time difference of arrival estimation, by the interference created when the sound waves arrive at a microphone from two directions around the robot head, and by impaired performance when there are multiple speakers. The low-resolution problem is solved by using a maximum-likelihood-based SSL method in the frequency domain. The multipath interference problem is avoided by incorporating a new time delay factor into the GCC-PHAT method with assuming a spherical robot head. The performance when there are multiple speakers was improved by using a multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering. The standard K-means clustering algorithm was extended to enable tracking of an unknown time-varying number of speakers by adding two additional steps that increase the number of clusters automatically and eliminate clusters containing incorrect direction estimations. Experiments conducted on the SIG-2 humanoid robot show that this method outperforms the conventional SSL method; it reduces localization errors by 18.1° on average and by over 37° in the side directions. It also tracks multiple speakers in real time with tracking errors below 4.35°.
Keywords:Human–robot interaction  binaural sound localization  multisource sound tracking
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号