首页 | 本学科首页   官方微博 | 高级检索  
     

胸部X线影像和诊断报告的双塔跨模态检索
引用本文:张嘉诚,欧卫华,陈英杰,张文川,熊嘉豪.胸部X线影像和诊断报告的双塔跨模态检索[J].计算机应用研究,2023,40(8).
作者姓名:张嘉诚  欧卫华  陈英杰  张文川  熊嘉豪
作者单位:贵州师范大学数学科学学院,贵州师范大学 a. 数学科学学院;b. 大数据与计算机科学学院,贵州师范大学大数据与计算机科学学院,贵州师范大学大数据与计算机科学学院,贵州师范大学大数据与计算机科学学院
基金项目:国家自然科学基金资助项目(62262005,61962010)
摘    要:针对现有胸部X线影像和诊断报告跨模态方法重点聚焦全局信息对齐,忽视影像和诊断报告间的细粒度语义关联,导致检索精度低、匹配度差的问题,提出全局和局部联合对齐的胸部X线影像和诊断报告双塔跨模态检索方法(CDTCR)。具体来说,针对细粒度语义表征,提出由残差网络组成的影像编码器学习影像的细粒度特征和由Transformer构成的BERT模型学习诊断报告的细粒度语义特征;针对细粒度语义关联问题,设计影像对句子和区域对词组两个不同粒度的模态间信息对齐策略,解决了不同模态间细粒度语义关联不足的问题。大型医学数据集MIMIC-CXR上的实验结果表明,CDTCR比现有的跨模态检索方法,检索精度更高、可解释性更强。

关 键 词:胸部X线影像    双塔跨模态检索    细粒度    Transformer    BERT
收稿时间:2022/12/4 0:00:00
修稿时间:2023/7/6 0:00:00

Chest X-ray images and diagnostic reports for twin-towers cross-modal retrieval
Zhang Jiacheng,Ou Weihu,Chen Yingjie,Zhang Wenchuan and Xiong Jiahao.Chest X-ray images and diagnostic reports for twin-towers cross-modal retrieval[J].Application Research of Computers,2023,40(8).
Authors:Zhang Jiacheng  Ou Weihu  Chen Yingjie  Zhang Wenchuan and Xiong Jiahao
Affiliation:School of Mathematical Sciences,Guizhou Normal University,,,,
Abstract:In order to solve the problem that the existing cross-modal methods of chest X-ray images and diagnostic reports focus on global information alignment, which ignore the fine-grained semantic association between chest X-ray images and diagnostic reports, resulting in low retrieval accuracy and poor matching degree, this paper proposed a method named CDTCR(chest X-ray images and diagnostic reports for twin-towers cross-modal retrieval). Specifically, for fine-grained semantic representation, it proposed an image encoder composed of residual network to learn the fine-grained features of the image and a BERT model composed of Transformer to learn the fine-grained semantic features of the diagnostic report. In order to solve the problem of fine-grained semantic association, and it also designed an information alignment strategy of two different granularity modes for the global image to sentence and local region to phrase, which solved the problem of insufficient fine-grained semantic association between different modes. The experimental results on a large-scale medical dataset MIMIC-CXR show that CDTCR has higher retrieval accuracy and better interpretation than the existing cross-modal retrieval methods.
Keywords:chest X-ray image  twin-towers cross-modal retrieval  fine-grained  Transformer  BERT
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号