首页 | 本学科首页   官方微博 | 高级检索  
     

基于树状模型的复杂自然语言查询转SQL技术研究
引用本文:赵猛,陈珂,寿黎但,伍赛,陈刚. 基于树状模型的复杂自然语言查询转SQL技术研究[J]. 软件学报, 2022, 33(12): 4727-4745
作者姓名:赵猛  陈珂  寿黎但  伍赛  陈刚
作者单位:浙江大学 计算机科学与技术学院,浙江 杭州 310027;浙江省大数据智能计算重点实验室(浙江大学),浙江 杭州 310027;浙江大学 计算机科学与技术学院,浙江 杭州 310027;浙江省大数据智能计算重点实验室(浙江大学),浙江 杭州 310027;浙江大学 计算机辅助设计与图形学国家重点实验室,浙江 杭州 310027
基金项目:浙江省重点研发计划(2021C01009);国家自然科学基金(62050099);高校基本科研业务费专项
摘    要:自然语言查询转SQL(NL2SQL)是指将自然语言表达的查询文本自动转化成数据库系统可以理解并执行的结构化查询语言SQL表达式的技术.NL2SQL可以为普通用户提供数据库查询访问的自然交互界面,从而实现基于数据库的自然问答.复杂查询的NL2SQL是当前数据库学术界的研究热点,主流方法采用序列到序列(Seq2seq)的编解码方式对问题进行建模.然而,已有的工作大多基于英文场景,面向中文领域实际应用时,中文特殊的口语化表达导致复杂查询转化困难;此外,现有工作难以正确输出包含复杂计算表达式的查询子句.针对上述问题,提出一种树状模型取代序列表示,将复杂查询自顶向下分解为多叉树,树结点代表SQL的各组成元素,采用深度优先搜索来预测生成SQL语句.在Du SQL中文NL2SQL竞赛的两个官方测试集中,该方法分别取得了第1名和第2名的成绩,验证了其有效性.

关 键 词:自然语言查询转SQL  语义解析  自然语言处理
收稿时间:2021-01-27
修稿时间:2021-12-15

Converting Complex Natural Language Query to SQL Based on Tree Representation Model
ZHAO Meng,CHEN Ke,SHOU Li-Dan,WU Sai,CHEN Gang. Converting Complex Natural Language Query to SQL Based on Tree Representation Model[J]. Journal of Software, 2022, 33(12): 4727-4745
Authors:ZHAO Meng  CHEN Ke  SHOU Li-Dan  WU Sai  CHEN Gang
Affiliation:School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;Key Laboratory of Big Data Intelligent Computing of Zhejiang Province (Zhejiang University), Hangzhou 310027, China;School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;Key Laboratory of Big Data Intelligent Computing of Zhejiang Province (Zhejiang University), Hangzhou 310027, China;State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310027, China
Abstract:NL2SQL refers to a technology that automatically converts query expressed in natural language into a structured SQL expression, which can be parsed and executed by the DBMS. NL2SQL can provide ordinary users with a natural interactive interface for database query access, thereby realizing question-answering atop database systems. NL2SQL for complex queries is now a research hotspot in the database community. The most prevalent approach uses the sequence-to-sequence (Seq2seq) encoder and decoder to convert complex natural language to SQL. However, most of the existing work focuses on English language. This approach is not ready to address the special colloquial expressions in Chinese queries. In addition, the existing work cannot correctly output query clauses containing complex calculation expressions. To solve the above problems, this study proposes to use a tree model instead of the sequence representation. The proposed approach disassembles complex queries from top to down to comprise a multi-way tree, where the tree nodes represent the elements of SQL. It uses a depth-first search to predict and generate SQL statements. The proposed approach has achieved the championship and 1st runner-up in two official tests of DuSQL Chinese NL2SQL Competition. The experimental results confirm the effectiveness of the proposed approach.
Keywords:NL2SQL  semantic parsing  natural language processing
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号