基于LSTM和N-gram的ESL文章的语法错误自动纠正方法 Grammatical Error Correction Using LSTM and N-gram期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于LSTM和N-gram的ESL文章的语法错误自动纠正方法

引用本文：	谭咏梅,杨一枭,杨林,刘姝雯.基于LSTM和N-gram的ESL文章的语法错误自动纠正方法[J].中文信息学报,2018,32(6):19-27.

作者姓名：	谭咏梅杨一枭杨林刘姝雯

作者单位：	北京邮电大学计算机学院,北京 100876

摘要：	针对英语文章语法错误自动纠正(Grammatical Error Correction,GEC)问题中的冠词和介词错误,该文提出一种基于LSTM(Long Short-Term Memory,长短时记忆)的序列标注GEC方法;针对名词单复数错误、动词形式错误和主谓不一致错误,因其混淆集为开放集合,该文提出一种基于ESL(English as Second Lauguage)和新闻语料的N-gram投票策略的GEC方法。该文方法在2013年CoNLL的GEC数据上实验的整体F₁值为33.87%,超过第一名UIUC的F₁值31.20%。其中,冠词错误纠正的F₁值为38.05%,超过UIUC冠词错误纠正的F₁值33.40%,介词错误的纠正F₁为28.89%,超过UIUC的介词错误纠正F₁值7.22%。
关键词：	语法错误自动纠正 LSTM N-gram投票策略 ESL语料
Grammatical Error Correction Using LSTM and N-gram

TAN Yongmei,YANG Yixiao,YANG Lin,LIU Shuwen.Grammatical Error Correction Using LSTM and N-gram[J].Journal of Chinese Information Processing,2018,32(6):19-27.

Authors:	TAN Yongmei YANG Yixiao YANG Lin LIU Shuwen

Affiliation:	School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract:	To deal with the incorrect usage of articles and prepositions in GEC (Grammatical Error Correction) area, this paper proposes a sequence labeling method. As for incorrect usage of noun form, verb form and subject-verb agreement, this paper proposes an N-gram voting strategy based on corpus collected from ESL (English as Second Language) essays and news. The results show that the method in this paper on CoNLL (2013) corpus achieves an overall F1 score of 33.87%, outperforming the top ranked UIUC‘s F1 score (31.20%), and a 38.05% F1 score for article errors and 28.89% for preposition errors, both exceeding UIUC's result (33.40% for article errors and 7-22% for preposition errors, respectively).

Keywords:	grammatical error correction LSTM N-gram voting strategy ESL corpus

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏