首页 | 本学科首页   官方微博 | 高级检索  
     

基于关键短语抽取与答案过滤的问答对生成
引用本文:郭峥嵘,郭躬德,王晖. 基于关键短语抽取与答案过滤的问答对生成[J]. 计算机系统应用, 2023, 32(6): 293-300
作者姓名:郭峥嵘  郭躬德  王晖
作者单位:福建师范大学 计算机与网络空间安全学院, 福州 350117;贝尔法斯特女王大学 电子电气工程和计算机科学学院, 贝尔法斯特 BT9 5BN
基金项目:国家自然科学基金(61976053, 62171131); 福建省自然科学基金(2022J01398)
摘    要:高质量的问答对有助于从文章中获取知识,提高问答系统性能,促进机器阅读理解,在人类活动和人工智能领域中都起着较为重要的作用.当前主要问答对生成方法依靠提供文章中的候选答案,根据答案生成特定的问题.然而一些候选答案可能会生成无法从文章中回答的问题,或是生成问题的答案不再是候选答案,造成问答对相关性差,影响问答对的质量.针对此问题,本文提出了一个基于关键短语抽取与过滤生成问答对的方法.该方法能够在输入文本中自动抽取适合生成问题的关键短语作为候选答案,再根据候选答案在问题生成器和答案生成器中生成问答对,并通过对比候选答案与生成答案的相似度过滤相关性低的问答对,最终输出保证质量的问答对.本方法在SQUAD1.1和NewsQA数据集上进行了实验验证,并人工检验了生成的问答对的质量,结果表明该方法可以有效提高生成的问答对的质量.

关 键 词:问答对  候选答案  关键短语抽取  T5模型  相似度过滤
收稿时间:2022-12-06
修稿时间:2023-01-19

Question-answer Pair Generation Based on Key Phrase Extraction and Answer Filtering
GUO Zheng-Rong,GUO Gong-De,WANG Hui. Question-answer Pair Generation Based on Key Phrase Extraction and Answer Filtering[J]. Computer Systems& Applications, 2023, 32(6): 293-300
Authors:GUO Zheng-Rong  GUO Gong-De  WANG Hui
Abstract:High-quality question-answering plays an important role in human activities and artificial intelligence because it can help to obtain knowledge from articles, improve the performance of question-answering systems, and promote machine reading comprehension. The current mainstream question-answer pair generation methods usually rely on candidate answers in the provided article to generate specific questions based on these answers. However, some candidate answers may generate questions that cannot be answered from the article, or the answers to the generated questions are no longer the same as the candidate answers, which thus results in a poor correlation of the question-answer pairs and affects the quality of the question-answer pairs. In order to solve these problems, this study proposes a method to generate question-answer pairs based on key phrase extraction and filtering. The method can automatically extract key phrases suitable for generating questions from the input text as the candidate answers and then generate question-answer pairs by a question generator and an answer generator according to the candidate answers. Finally, the method outputs question-answer pairs with high quality by comparing the similarity between the candidate answers and the generated answers and filtering out those question-answer pairs that have a low correlation with the candidate answers. The proposed method is evaluated by experiments on SQUAD1.1 and NewsQA datasets, and the quality of generated question-answer pairs is manually checked. The results show that this method can effectively improve the quality of generated question-answer pairs.
Keywords:questions-answer pair  candidate answer  key phrase extraction  T5 model  similarity filtering
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号