首页 | 本学科首页   官方微博 | 高级检索  
     

基于自裁剪异构图的NL2SQL模型
引用本文:黄君扬,王振宇,梁家卿,肖仰华.基于自裁剪异构图的NL2SQL模型[J].计算机工程,2022,48(9):71.
作者姓名:黄君扬  王振宇  梁家卿  肖仰华
作者单位:1. 复旦大学 软件学院, 上海 200433;2. 信息系统工程重点实验室, 南京 210007
基金项目:国家自然科学基金(62102095);中国博士后基金(2020M681173,2021T140124);上海市科技创新行动计划(19511120400);信息系统工程重点实验室开放基金(05202002)。
摘    要:自然语言转换为结构化查询语言(NL2SQL)是语义解析领域的重要任务,其核心为对数据库模式和自然语言问句进行联合学习。现有研究通过将整个数据库模式和自然语言问句联合编码构建异构图,使得异构图中引入大量无用信息,并且忽略了数据库模式中不同信息的重要性。为提高NL2SQL模型的逻辑与执行准确率,提出一种基于自裁剪异构图与相对位置注意力机制的NL2SQL模型(SPRELA)。采用序列到序列的框架,使用ELECTRA预训练语言模型作为骨干网络。引入专家知识,对数据库模式和自然语言问句构建初步异构图。基于自然语言问句对初步异构图进行自裁剪,并使用多头相对位置注意力机制编码自裁剪后的数据库模式与自然语言问句。利用树型解码器和预定义的SQL语法,解码生成SQL语句。在Spider数据集上的实验结果表明,SPRELA模型执行准确率达到71.1%,相比于相同参数量级别的RaSaP模型提升了1.1个百分点,能够更好地将数据库模式与自然语言问句对齐,从而理解自然语言查询中的语义信息。

关 键 词:自然语言转换为结构化查询语言  异构图  自裁剪机制  语义解析  预训练语言模型  
收稿时间:2022-04-26
修稿时间:2022-05-26

NL2SQL Model Based on Self-Pruning Heterogeneous Graph
HUANG Junyang,WANG Zhenyu,LIANG Jiaqing,XIAO Yanghua.NL2SQL Model Based on Self-Pruning Heterogeneous Graph[J].Computer Engineering,2022,48(9):71.
Authors:HUANG Junyang  WANG Zhenyu  LIANG Jiaqing  XIAO Yanghua
Affiliation:1. School of Software, Fudan University, Shanghai 200433, China;2. Science and Technology on Information Systems Engineering Laboratory, Nanjing 210007, China
Abstract:Natural Language to Structured Query Language(NL2SQL) is a critical task in semantic parsing.The goal of NL2SQL lies in the joint learning of natural language query and database schema.Existing approaches construct a heterogeneous graph to jointly encode the entire database schema and natural language query.However, the heterogeneous graph constructed using this approach introduces a large amount of useless information and ignores the importance of different information in the schema.A novel NL2SQL model based on a self-pruning heterogeneous graph and relative position attention mechanism called the SPRELA model is proposed to improve the logic and execution accuracy of NL2SQL.The proposed model is implemented with a sequence-to-sequence architecture and uses a pre-trained language model, ELECTRA, as its backbone.An initial heterogeneous graph is constructed by introducing expert knowledge for database schema and natural language query.The heterogeneous graph is self-pruned for the natural language query, and a multi-head relative position is used to encode the self-pruned database schema and natural language query information.The target SQL statement is generated using a tree-structured decoder with predefined SQL syntax.The SPRELA model is experimented on the Spider dataset and achieves an accuracy of 71.1% in terms of execution accuracy.Compared with the Relation Aware Semi-autoregressive Semantic Parsing for NL2SQL(RaSaP) model at the same parameter level, the SPRELA model is improved by 1.1 percentage points.The results demonstrate that the SPRELA model better aligns database schema with natural language questions and understands the semantic information in natural language queries.
Keywords:Natural Language to Structured Query Language(NL2SQL)  heterogeneous graph  self-pruning mechanism  semantic parsing  pre-trained language model  
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号