首页 | 本学科首页   官方微博 | 高级检索  
     

基于EDA和回译的导游投诉文本混合增强方法
引用本文:余佳雨,李响,詹瑾瑜,江维,曹扬,杨瑞.基于EDA和回译的导游投诉文本混合增强方法[J].计算机技术与发展,2021(3).
作者姓名:余佳雨  李响  詹瑾瑜  江维  曹扬  杨瑞
作者单位:电子科技大学信息与软件工程学院;中电科大数据研究院有限公司;提升政府治理能力大数据应用技术国家工程实验室
基金项目:提升政府治理能力大数据应用技术国家工程实验室开放基金项目(W-2019007);四川省科技计划项目(2018CC0136);中科院计算机体系结构国家重点实验室开放课题(CARCH201811);中央高校基本科研业务费(ZYGX2018J077,ZYGX2019J078)。
摘    要:近年来,使用机器学习算法从导游投诉文本数据中识别出导游违规行为,辅助旅游监管人员工作,为旅游监管提供依据,成为一个必然趋势。然而导游投诉文本存在着语料单一、难以获取等困难,如何对这些导游投诉文本进行文本增强以满足导游违规行为识别需要,是一个迫切需要解决的问题。针对这一问题,提出了一种基于EDA(easy data augmentation)和回译的导游投诉文本混合增强方法。从EDA和回译两个角度对导游投诉文本进行增强,将两种方法返回的增强投诉语料进行混合,得到最终的增强文本;并将该方法在实际的导游违规行为识别系统中进行了应用与验证。通过大量实验对该方法与传统的EDA文本增强方法、回译文本增强方法进行了分析与对比,实验数据表明,基于EDA和回译的导游投诉文本混合增强方法相对于其他两种传统文本增强方法具有更高的准确率和更优秀的文本增强效果,应用在实际的导游违规行为识别系统中得到了87.54%的准确率,相比原始数据集准确率提升了7.4%。

关 键 词:导游违规行为识别  文本增强  EDA  回译  混合增强

A Hybrid Augmentation Method of Complaint Texts against Tour Guides Based on EDA and Back Translation
YU Jia-yu,LI Xiang,ZHAN Jin-yu,JIANG Wei,CAO Yang,YANG Rui.A Hybrid Augmentation Method of Complaint Texts against Tour Guides Based on EDA and Back Translation[J].Computer Technology and Development,2021(3).
Authors:YU Jia-yu  LI Xiang  ZHAN Jin-yu  JIANG Wei  CAO Yang  YANG Rui
Affiliation:(School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054,China;CETC Big Data Research Institute Co.,Ltd.,Guiyang 550022,China;Big Data Application on Improving Government Governance Capabilities National Engineering Laboratory,Guiyang 550022,China)
Abstract:In recent years,it has become an inevitable trend to identify the illegal guide behavior from the complaint texts against tour guides by machine learning,which can assist the work of tour supervisors and provide basis for tourism supervision.However,there are some difficulties in the complaint text against tour guides,such as the lack of corpus and the difficulty in obtaining the complaint text.How to augment the complaint texts to meet the needs of illegal tour guide behavior detection is to an urgent problem.To solve this problem,we propose a hybrid augmentation method of complaint texts against tour guides based on EDA and back translation.From two perspectives of EDA and back translation,the complaint texts against tour guides are augmented.The augmented complaint corpus is mixed to get the finial augmented texts.And the proposed method is applied in the practical tour guide behavior detection system.Extensive experiments are done to analyze and compare the proposed method with traditional EDA augmentation method and back translation augmentation method.The experiment shows that the proposed hybrid augmentation method has higher accuracy compared with the other two traditional augmentation methods.The accuracy of the proposed method in the practical tour guide behavior detection system is 87.54%,which is 7.4%higher than that of the original data set.
Keywords:illegal tour guide behavior detection  text data augmentation  EDA  back translation  hybrid augmentation
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号