首页 | 本学科首页   官方微博 | 高级检索  
     


Eras: Improving the quality control in the annotation process for Natural Language Processing tasks
Abstract:The increasing amount of valuable, unstructured textual information poses a major challenge to extract value from those texts. We need to use NLP (Natural Language Processing) techniques, most of which rely on manually annotating a large corpus of text for its development and evaluation. Creating a large annotated corpus is laborious and requires suitable computational support. There are many annotation tools available, but their main weaknesses are the absence of data management features for quality control and the need for a commercial license. As the quality of the data used to train an NLP model directly affects the quality of the results, the quality control of the annotations is essential. In this paper, we introduce ERAS, a novel web-based text annotation tool developed to facilitate and manage the process of text annotation. ERAS includes not only the key features of current mainstream annotation systems but also other features necessary to improve the curation process, such as the inter-annotator agreement, self-agreement and annotation log visualization, for annotation quality control. ERAS also implements a series of features to improve the customization of the user’s annotation workflow, such as: random document selection, re-annotation stages, and warm-up annotations. We conducted two empirical studies to evaluate the tool’s support to text annotation, and the results suggest that the tool not only meets the basic needs of the annotation task but also has some important advantages over the other tools evaluated in the studies. ERAS is freely available at https://github.com/grosmanjs/eras.
Keywords:Annotation tool  Ontology-based annotation  Entities and relations annotation  Corpus curation  Natural Language Processing
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号