Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news

Abstract:	This paper describes an indexing system that automatically creates metadata for multimedia broadcast news content by integrating audio, speech, and visual information. The automatic multimedia content indexing system includes acoustic segmentation (AS), automatic speech recognition (ASR), topic segmentation (TS), and video indexing features. The new spectral-based features and smoothing method in the AS module improved the speech detection performance from the audio stream of the input news content. In the speech recognition module, automatic selection of acoustic models achieved both a low WER, as with parallel recognition using multiple acoustic models, and fast recognition, as with the single acoustic model. The TS method using word concept vectors achieved more accurate results than the conventional method using local word frequency vectors. The information integration module provides the functionality of integrating results from the AS module, TS module, and SC module. The story boundary detection accuracy was improved by combining it with the AS results and the SC results compared to the sole TS results

Keywords: