Analyzing and identifying multiword expressions in spoken language期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Analyzing and identifying multiword expressions in spoken language

Authors:	Helmer Strik Micha Hulsbosch Catia Cucchiarini

Affiliation:	1. Department of Linguistics (Section Language and Speech), Radboud University, P.O. Box 9103, 6500 HD, Nijmegen, The Netherlands 2. Erasmus Building, room 8.14, Erasmusplein 1, 6525 HT, Nijmegen, The Netherlands

Abstract:	The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring “islands of pronunciation reduction” that contain (potential) MWEs can be identified in a large speech corpus.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏