首页 | 本学科首页   官方微博 | 高级检索  
     

基于抽取的高考作文生成
引用本文:冯骁骋,龚恒,冷海涛,秦兵,孙承杰,刘挺. 基于抽取的高考作文生成[J]. 计算机学报, 2020, 43(2): 315-325
作者姓名:冯骁骋  龚恒  冷海涛  秦兵  孙承杰  刘挺
作者单位:哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001;哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001;哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001;哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001;哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001;哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001
摘    要:机器人自动写作是人工智能和自然语言处理领域重要的研究方向,然而传统的自动写作方法主要针对体育新闻、天气预报等较短的段落级文本进行研究,并没有对篇章级文本自动生成技术进行深入地建模.针对这一问题,我们着重研究面向高考作文的篇章级文本生成任务.具体而言我们提出了一种基于抽取式的高考作文生成模型,即先进行抽取再利用深度学习排序方法进行段落内部的文本组合生成.通过实际专家评测,我们所生成的作文能够达到北京高考二类卷平均分数,具有一定的实际应用价值.

关 键 词:文本生成  文本抽取  句子排序  作文生成  自然语言处理

Extractive Essay Generation for College Entrance Examination
FENG Xiao-Cheng,GONG Heng,LENG Hai-Tao,QIN Bing,SUN Cheng-Jie,LIU Ting. Extractive Essay Generation for College Entrance Examination[J]. Chinese Journal of Computers, 2020, 43(2): 315-325
Authors:FENG Xiao-Cheng  GONG Heng  LENG Hai-Tao  QIN Bing  SUN Cheng-Jie  LIU Ting
Affiliation:(Department of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001)
Abstract:Automatic writing is an important research direction in the field of Artificial Intelligence and Natural Language Processing.However,the traditional automatic writing methods mainly focused on generating short text,such as sports news and weather forecast,and lack deep modeling of the automatic generation of discourse-level text.In this paper,we focus on the discourse-level text generation task oriented to the essay generation in College Entrance Examination.In particular,we present an extractive essay generation model for the College Entrance Examination.We formulate the task as essay generation from mind,namely taking the input as many topic words in mind and outputting an organized article(a document)with several paragraphs under the theme of the topic.The task is challenging as it requires the generator to deeply understand the way human beings write articles.In addition,after understanding the meaning of a topic word,the following challenge is how to generate a topic focused article,e.g.how to collect topic-specific“fuel”(e.g.sentences)and how to organize them to form an organized article.This is of great importance as an article is not a set of sentences chaotically.Natural language is structured and the coherence/discourse relationship between sentences is a crucial element to improve the readability of a document and to guarantee the structured nature of a document in terms of lexicalization and semantic.Hopefully,solving this problem contributes to making progress towards Artificial Intelligence.For the issues mentioned above,our proposed model consists of two major modules including sentence extraction module and paragraph generation module.First,in order to generate a high-quality essay,the extractive essay generation model needs to determine the focus of each paragraph.Therefore,we first expand the given topic with more related topic words.Then we cluster them into multiple sets.Each of them represents the focus of each paragraph.Second,the model needs to find candidate sentences that are related to each paragraph’s topic.Therefore,it first finds sentences that include the given topic words.Then,we propose two methods to expand the candidate sentences set with more diverse sentences.After obtaining sentences candidate sets for each paragraph,the model needs to choose and arrange sentences to be coherent paragraphs.In this paper,we explore three methods to achieve this.Based on experiments,paragraph generation via pointer network and paragraph generation via pair-wise LSTM network outperforms paragraph generation via learning to rank.In addition,paragraph generation via pointer network achieves best result among those three methods.Through experts’evaluation,the essay our proposed model produced can reach the average scores of level two of Beijing college entrance examination,which indicates that our proposed methods have certain practical application value.For each component in the framework,we explore several strategies and empirically compare between them in terms of qualitative or quantitative analysis.We also analyze the pros and cons of each approach.Although we run experiments on Chinese corpus,the method is language independent and can be easily adapted to other language.
Keywords:text generation  text extraction  sentence ordering  essay generation  natural language processing
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号