首页 | 本学科首页   官方微博 | 高级检索  
     


Improved open-vocabulary spoken content retrieval with word and subword lattices using acoustic feature similarity
Affiliation:1. Graduate Institute of Communication Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan;2. Department of Electrical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan;1. School of Information Technologies, The University of Sydney, NSW 2006, Australia;2. School of Computer Science and Software Engineering, University of Wollongong, Wollongong, NSW 2522, Australia;1. Department of Mathematics and Computer Science, University of Udine, Udine, Italy;2. Department of Computer Science, University of Verona, Verona, Italy;1. Department of Electrical Engineering, University of California Los Angeles, 63-134 Engr IV, Los Angeles, CA 90095-1594, United States;2. Department of Head and Neck Surgery, University of California Los Angeles, School of Medicine, 31-24 Rehab Center, Los Angeles, CA 90095-1794, United States;3. Department of Electrical Engineering, University of California Los Angeles, 66-147G Engr IV, Los Angeles, CA 90095-1594, United States;1. Cambridge Research Laboratory, Toshiba Research Europe Limited, 208 Cambridge Science Park, Milton Road, Cambridge CB4 0GZ, UK;2. Corporate Research and Development Center, Toshiba Corporation 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki 212-8582, Japan
Abstract:Spoken content retrieval will be very important for retrieving and browsing multimedia content over the Internet, and spoken term detection (STD) is one of the key technologies for spoken content retrieval. In this paper, we show acoustic feature similarity between spoken segments used with pseudo-relevance feedback and graph-based re-ranking can improve the performance of STD. This is based on the concept that spoken segments similar in acoustic feature vector sequences to those with higher/lower relevance scores should have higher/lower scores, while graph-based re-ranking further uses a graph to consider the similarity structure among all the segments retrieved in the first pass. These approaches are formulated on both word and subword lattices, and a complete framework of using them in open vocabulary retrieval of spoken content is presented. Significant improvements for these approaches with both in-vocabulary and out-of-vocabulary queries were observed in preliminary experiments.
Keywords:Spoken content retrieval  Spoken term detection  Pseudo-relevance feedback  Random walk
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号