首页 | 本学科首页   官方微博 | 高级检索  
     

基于复述的中文自然语言接口
引用本文:张俊驰,胡婕,刘梦赤.基于复述的中文自然语言接口[J].计算机应用,2016,36(5):1290-1295.
作者姓名:张俊驰  胡婕  刘梦赤
作者单位:1. 湖北大学 计算机与信息工程学院, 武汉430062;2. 软件工程国家重点实验室(武汉大学), 武汉 430072
基金项目:国家自然科学基金资助项目(61202100)。
摘    要:针对传统以句法分析为主的数据库自然语言接口系统识别用户语义准确率不高,且需要大量人工标注训练语料的问题,提出了一种基于复述的中文自然语言接口(NLIDB)实现方法。首先提取用户语句中表征数据库实体词,建立候选树集及对应的形式化自然语言表达;其次由网络问答语料训练得到的复述分类器筛选出语义最相近的表达;最后将相应的候选树转换为结构化查询语句(SQL)。实验表明该方法在美国地理问答语料(GeoQueries880)、餐饮问答语料(RestQueries250)上的F1值分别达到83.4%、90%,均优于句法分析方法。通过对比实验结果发现基于复述方法的数据库自然语言接口系统能更好地处理用户与数据库的语义鸿沟问题。

关 键 词:数据库自然语言接口    词向量    复述    自然语言表达    机器学习
收稿时间:2015-10-15
修稿时间:2015-12-08

Chinese natural language interface based on paraphrasing
ZHANG Junchi,HU Jie,LIU Mengchi.Chinese natural language interface based on paraphrasing[J].journal of Computer Applications,2016,36(5):1290-1295.
Authors:ZHANG Junchi  HU Jie  LIU Mengchi
Affiliation:1. School of Computer and Information Engineering, Hubei University, Wuhan Hubei 430062, China;2. State Key Laboratory of Software Engineering(Wuhan University), Wuhan Hubei 430072, China
Abstract:In this paper, a novel method for Chinese Natural Language Interface of Database (NLIDB) based on Chinese paraphrase was proposed to solve the problems of traditional methods based on syntactic parsing which cannot obtain high accuracy and need a lot of manual label training corpus. First, key entities of user statements in databases were extracted, and candidate tree sets and their tree expressions were generated. Then most relevant semantic expressions were filtered by paraphrase classifier which was obtained from the Internet Q&A training corpus. Finally, candidate trees were translated into Structured Query Language (SQL). F1 score was respectively 83.4% and 90% on data sets of Chinese America Geography (GeoQueries880) and Questions about Restaurants (RestQueries250) by using the proposed method, better than syntactic based method. The experimental results demonstrate that the NLIDB based on paraphrase can handle the semantic gaps between users and databases better.
Keywords:Natural Language Interface of DataBase (NLIDB)                                                                                                                        word vector                                                                                                                        paraphrase                                                                                                                        natural language expression                                                                                                                        machine learning
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号