SORTAL ANAPHORA RESOLUTION IN MEDLINE ABSTRACTS期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

SORTAL ANAPHORA RESOLUTION IN MEDLINE ABSTRACTS

Authors:	Manabu Torii K Vijay-Shanker

Affiliation:	Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC 20057,USA; Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA

Abstract:	This paper reports our investigation of machine learning methods applied to anaphora resolution for biology texts, particularly paper abstracts. Our primary concern is the investigation of features and their combinations for effective anaphora resolution. In this paper, we focus on the resolution of demonstrative phrases and definite determiner phrases, the two most prevalent forms of anaphoric expressions that we find in biology research articles. Different resolution models are developed for demonstrative and definite determiner phrases. Our work shows that models may be optimized differently for each of the phrase types. Also, because a significant number of definite determiner phrases are not anaphoric, we induce a model to detect anaphoricity, i.e., a model that classifies phrases as either anaphoric or nonanaphoric. We propose several novel features that we call highlighting features , and consider their utility particularly for processing paper abstracts. The system using the highlighting features achieved accuracies of 78% and 71% for demonstrative phrases and definite determiner phrases, respectively. The use of the highlighting features reduced the error rate by about 10%.

Keywords:	anaphora resolution bioinformatics machine learning natural language processing

设为首页 | 免责声明 | 关于勤云 | 加入收藏