首页 | 本学科首页   官方微博 | 高级检索  
     


Automatic identification of discourse markers in dialogues: An in-depth study of like and well
Authors:Andrei Popescu-Belis  Sandrine Zufferey
Affiliation:1. Idiap Research Institute, PO Box 592, 1920 Martigny, Switzerland;2. Department of Linguistics, University of Geneva, 1211 Geneva 4, Switzerland;1. Purple Mountain Observatory, Chinese Academy of Sciences, Nanjing 210008, China;2. School of Astronomy and Space Science, Nanjing University, Nanjing 210093, China;3. Shanghai Key Laboratory of Space Navigation and Position Techniques, Shanghai 200030, China;4. Key Laboratory of Modern Astronomy and Astrophysics, Nanjing University, Ministry of Education, Nanjing 210093, China;1. ACLC, English Department, University of Amsterdam, Spuistraat 210, 1012 VT Amsterdam, The Netherlands;2. ACLC, University of Amsterdam, Spuistraat 210, 1012 VT Amsterdam, The Netherlands;1. RWTH Aachen University, Germany;2. Albert-Ludwigs-University, Freiburg, Germany;3. University of the Federal Armed Forces, Munich, Germany;9. Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo, Department of Infection Control, São Paulo/SP, Brazil;99. Faculdade de Medicina da Universidade de São Paulo, Department of Infectious Diseases and LIM-54, São Paulo/SP, Brazil
Abstract:The lexical items like and well can serve as discourse markers (DMs), but can also play numerous other roles, such as verb or adverb. Identifying the occurrences that function as DMs is an important step for language understanding by computers. In this study, automatic classifiers using lexical, prosodic/positional and sociolinguistic features are trained over transcribed dialogues, manually annotated with DM information. The resulting classifiers improve state-of-the-art performance of DM identification, at about 90% recall and 79% precision for like (84.5% accuracy, κ = 0.69), and 99% recall and 98% precision for well (97.5% accuracy, κ = 0.88). Automatic feature analysis shows that lexical collocations are the most reliable indicators, followed by prosodic/positional features, while sociolinguistic features are marginally useful for the identification of DM like and not useful for well. The differentiated processing of each type of DM improves classification accuracy, suggesting that these types should be treated individually.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号