首页 | 本学科首页   官方微博 | 高级检索  
     

基于案件要素指导及深度聚类的新闻与案件相关性分析
引用本文:毛存礼,吴霞,朱俊国,余正涛,李云龙,王振晗.基于案件要素指导及深度聚类的新闻与案件相关性分析[J].中文信息学报,2021,34(11):60-69.
作者姓名:毛存礼  吴霞  朱俊国  余正涛  李云龙  王振晗
作者单位:1.昆明理工大学 信息工程与自动化学院,云南 昆明 650500;
2.昆明理工大学 云南省人工智能重点实验室,云南 昆明 650500
基金项目:国家自然科学基金(61732005,61662041,61761026,61866019,61972186);云南省应用基础研究计划重点项目(2019FA023);云南省中青年学术和技术带头人后备人才项目(2019HB006)
摘    要:新闻与案件相关性分析是案件领域新闻舆情分析的基础,其可以转化为文本聚类问题。由于缺乏有效的监督信息,传统聚类方法易导致聚类发散,降低结果的准确性。针对案件和新闻文本的特点,该文提出了基于案件要素指导及深度聚类的新闻与案件相关性分析方法。该方法首先抽取出重要的句子表征文本;然后利用案件要素对案件进行表征,用于初始化聚类中心,指导聚类的搜索过程;最后选用卷积自编码器获得文本表征,利用重构损失和聚类损失联合训练网络,使文本的表征更接近于案件,并将文本表征和聚类过程统一到同一框架中,交替更新自编码器参数及聚类模型参数,实现文本聚类。实验表明,该文的方法较基线方法在准确率上提高了4.61%。

关 键 词:相关性分析  深度聚类  文本表征  案件要素  
收稿时间:2020-03-09

Chinese-Burmese Parallel Sentence Pair Extraction Based on CNN-CorrNet
MAO Cunli,WU Xia,ZHU Junguo,YU Zhengtao,LI Yunlong,WANG Zhenhan.Chinese-Burmese Parallel Sentence Pair Extraction Based on CNN-CorrNet[J].Journal of Chinese Information Processing,2021,34(11):60-69.
Authors:MAO Cunli  WU Xia  ZHU Junguo  YU Zhengtao  LI Yunlong  WANG Zhenhan
Affiliation:1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China;2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan 650500, China
Abstract:Bilingual parallel corpus is a key resources to improve the quality of machine translation. We propose a Chinese-Burmese parallel sentence pair extraction method based on CNN-CorrNet network. Specifically, we first use BERT to obtain vector representations of Chinese and Burmese words, and use convolution neural network to represent sentences in Chinese and Burmese to capture important feature information of sentences. Then, in order to ensure the maximum correlation between the cross-language representations of the two languages, the existing Chinese and Burmese parallel sentence pairs are used as constraints, and CorrNet (Correlational Neural Networks) is applied to map the Chinese and Burmese sentence representation into the common semantic space. Finally, the distance of Chinese and Burmese sentences in the public semantic space is calculated to determine the true bilingual sentence pairs. The experiment results show that, compared with the maximum entropy model and the siamese network model, the F1 value of the method proposed in this paper is increased by 13.3% or 5.1%, respectively.
Keywords:Chinese-Burmese bilingual  parallel sentence pair  CNN  correlational neural networks  common semantic space  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号