首页 | 本学科首页   官方微博 | 高级检索  
     


SORTAL ANAPHORA RESOLUTION IN MEDLINE ABSTRACTS
Authors:Manabu  Torii K Vijay-Shanker
Affiliation:Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC 20057,USA; Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
Abstract:This paper reports our investigation of machine learning methods applied to anaphora resolution for biology texts, particularly paper abstracts. Our primary concern is the investigation of features and their combinations for effective anaphora resolution. In this paper, we focus on the resolution of demonstrative phrases and definite determiner phrases, the two most prevalent forms of anaphoric expressions that we find in biology research articles. Different resolution models are developed for demonstrative and definite determiner phrases. Our work shows that models may be optimized differently for each of the phrase types. Also, because a significant number of definite determiner phrases are not anaphoric, we induce a model to detect anaphoricity, i.e., a model that classifies phrases as either anaphoric or nonanaphoric. We propose several novel features that we call highlighting features , and consider their utility particularly for processing paper abstracts. The system using the highlighting features achieved accuracies of 78% and 71% for demonstrative phrases and definite determiner phrases, respectively. The use of the highlighting features reduced the error rate by about 10%.
Keywords:anaphora resolution  bioinformatics  machine learning  natural language processing
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号