An algorithm for similar utterance section extraction for managing spoken documents |
| |
Authors: | Yoshiaki Itoh Kazuyo Tanaka Shi-Wook Lee |
| |
Affiliation: | (1) Faculty of Software and Information Science, Iwate Prefectural University, Sugo, Takizawa, Iwate 020-0193, Japan;(2) Institute of Library and Information Science, University of Tsukuba, 1-2 Kasuga, Tsukuba 305-8550, Japan;(3) National Institute of Advanced Industrial Science and Technology: AIST, Tukuba-shi Ibaragi 305-8568, Japan |
| |
Abstract: | This paper proposes a new, efficient algorithm for extracting similar sections between two time sequence data sets. The algorithm,
called Relay Continuous Dynamic Programming (Relay CDP), realizes fast matching between arbitrary sections in the reference
pattern and the input pattern and enables the extraction of similar sections in a frame synchronous manner. In addition, Relay
CDP is extended to two types of applications that handle spoken documents. The first application is the extraction of repeated
utterances in a presentation or a news speech because repeated utterances are assumed to be important parts of the speech.
These repeated utterances can be regarded as labels for information retrieval. The second application is flexible spoken document
retrieval. A phonetic model is introduced to cope with the speech of different speakers. The new algorithm allows a user to
query by natural utterance and searches spoken documents for any partial matches to the query utterance. We present herein
a detailed explanation of Relay CDP and the experimental results for the extraction of similar sections and report results
for two applications using Relay CDP.
Yoshiaki Itoh has been an associate professor in the Faculty of Software and Information Science at Iwate Prefectural University, Iwate,
Japan, since 2001. He received the B.E. degree, M.E. degree, and Dr. Eng. from Tokyo University, Tokyo, in 1987, 1989, and
1999, respectively. From 1989 to 2001 he was a researcher and a staff member of Kawasaki Steel Corporation, Tokyo and Okayama.
From 1992 to 1994 he transferred as a researcher to Real World Computing Partnership, Tsukuba, Japan. Dr. Itoh's research
interests include spoken document processing without recognition, audio and video retrieval, and real-time human communication
systems. He is a member of ISCA, Acoustical Society of Japan, Institute of Electronics, Information and Communication Engineers,
Information Processing Society of Japan, and Japan Society of Artificial Intelligence.
Kazuyo Tanaka has been a professor at the University of Tsukuba, Tsukuba, Japan, since 2002. He received the B.E. degree from Yokohama
National University, Yokohama, Japan, in 1970, and the Dr. Eng. degree from Tohoku University, Sendai, Japan, in 1984. From
1971 to 2002 he was research officer of Electrotechnical Laboratory (ETL), Tsukuba, Japan, and the National Institute of Advanced
Science and Technology (AIST), Tsukuba, Japan, where he was working on speech analysis, synthesis, recognition, and understanding,
and also served as chief of the speech processing section. His current interests include digital signal processing, spoken
document processing, and human information processing. He is a member of IEEE, ISCA, Acoustical Society of Japan, Institute
of Electronics, Information and Communication Engineers, and Japan Society of Artificial Intelligence.
Shi-Wook Lee received the B.E. degree and M.E. degree from Yeungnam University, Korea and Ph.D. degree from the University of Tokyo in
1995, 1997, and 2001, respectively. Since 2001 he has been working in the Research Group of Speech and Auditory Signal Processing,
the National Institute of Advanced Science and Technology (AIST), Tsukuba, Japan, as a postdoctoral fellow. His research interests
include spoken document processing, speech recognition, and understanding. |
| |
Keywords: | Speech labeling Spoken documents retrieval Repeated utterance Similar section extraction Time sequence data |
本文献已被 SpringerLink 等数据库收录! |
|