首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Multispectral imaging (MSI) technique is often used to capture images of the fundus by illuminating it with different wavelengths of light. However, these images are taken at different points in time such that eyeball movements can cause misalignment between consecutive images. The multispectral image sequence reveals important information in the form of retinal and choroidal blood vessel maps, which can help ophthalmologists to analyze the morphology of these blood vessels in detail. This in turn can lead to a high diagnostic accuracy of several diseases. In this paper, we propose a novel semi-supervised end-to-end deep learning framework called “Adversarial Segmentation and Registration Nets” (ASRNet) for the simultaneous estimation of the blood vessel segmentation and the registration of multispectral images via an adversarial learning process. ASRNet consists of two subnetworks: (i) A segmentation module S that fulfills the blood vessel segmentation task, and (ii) A registration module R that estimates the spatial correspondence of an image pair. Based on the segmention-driven registration network, we train the segmentation network using a semi-supervised adversarial learning strategy. Our experimental results show that the proposed ASRNet can achieve state-of-the-art accuracy in segmentation and registration tasks performed with real MSI datasets.  相似文献   

2.
Multimodal representation learning has gained increasing importance in various real-world multimedia applications. Most previous approaches focused on exploring inter-modal correlation by learning a common or intermediate space in a conventional way, e.g. Canonical Correlation Analysis (CCA). These works neglected the exploration of fusing multiple modalities at higher semantic level. In this paper, inspired by the success of deep networks in multimedia computing, we propose a novel unified deep neural framework for multimodal representation learning. To capture the high-level semantic correlations across modalities, we adopted deep learning feature as image representation and topic feature as text representation respectively. In joint model learning, a 5-layer neural network is designed and enforced with a supervised pre-training in the first 3 layers for intra-modal regularization. The extensive experiments on benchmark Wikipedia and MIR Flickr 25K datasets show that our approach achieves state-of-the-art results compare to both shallow and deep models in multimodal and cross-modal retrieval.  相似文献   

3.
深度神经网络(DNN)在许多深度学习关键系统如人脸识别、智能驾驶中被证明容易受到对抗样本攻击,而对多种类对抗样本的检测还存在着检测不充分以及检测效率低的问题,为此,提出一种面向深度学习模型的对抗样本差异性检测方法。首先,构建工业化生产中常用的残差神经网络模型作为对抗样本生成与检测系统的模型;然后,利用多种对抗攻击攻击深度学习模型以产生对抗样本组;最终,构建样本差异性检测系统,包含置信度检测、感知度检测及抗干扰度检测三个子检测系统共7项检测方法。在MNIST与Cifar-10数据集上的实验结果表明,属于不同对抗攻击的对抗样本在置信度、感知度、抗干扰度等各项性能检测上存在明显差异,如感知度各项指标优异的对抗样本在置信度以及抗干扰度的检测中,相较于其他类的对抗样本表现出明显不足;同时,证明了在两个数据集上呈现出差异的一致性。通过运用该检测方法,能有效提升模型对对抗样本检测的全面性与多样性。  相似文献   

4.
针对卷积神经网络(CNN)在数据集(训练集)较小时,易发生过度拟合的现象,提出并实现了一种引入Selu激活函数并结合带参数归一化的Dropout方法的深度卷积生成式对抗网络用于图像增强,生成图像实现数据集扩充,从而解决深度学习图像分类研究中因图像数据不足造成的模型表达能力差、训练时易过度拟合的问题。通过裁剪、旋转、插值、畸变变换等扩充图像集的传统图像增强方法往往只能扩充样式单一甚至信噪比较低的图像,与传统图像增强方法扩充图像集不同,使用生成式对抗网络生成的图像明显区别于原始图像,不仅可以得到数量更多,内容更丰富的高质量图像,数据集扩充效率也得以提升。仿真实验表明,该生成式对抗网络得到了质量相对较高的图像,有效地扩充了数据集。  相似文献   

5.
基于多模态生理数据的连续情绪识别技术在多个领域有重要用途,但碍于被试数据的缺乏和情绪的主观性,情绪识别模型的训练仍需更多的生理模态数据,且依赖于同源被试数据.本文基于人脸图像和脑电提出了多种连续情绪识别方法.在人脸图像模态,为解决人脸图像数据集少而造成的过拟合问题,本文提出了利用迁移学习技术训练的多任务卷积神经网络模型.在脑电信号模态,本文提出了两种情绪识别模型:第一个是基于支持向量机的被试依赖型模型,当测试数据与训练数据同源时有较高准确率;第二个是为降低脑电信号的个体差异性和非平稳特性对情绪识别的影响而提出的跨被试型模型,该模型基于长短时记忆网络,在测试数据和训练数据不同源的情况下也具有稳定的情绪识别性能.为提高对同源数据的情绪识别准确率,本文提出两种融合多模态决策层情绪信息的方法:枚举权重方法和自适应增强方法.实验表明:当测试数据与训练数据同源时,在最佳情况下,双模态情绪识别模型在情绪唤醒度维度和效价维度的平均准确率分别达74.23%和80.30%;而当测试数据与训练数据不同源时,长短时记忆网络跨被试型模型在情绪唤醒度维度和效价维度的准确率分别为58.65%和51.70%.  相似文献   

6.
7.
遥感视觉问答(remote sensing visual question answering,RSVQA)旨在从遥感图像中抽取科学知识.近年来,为了弥合遥感视觉信息与自然语言之间的语义鸿沟,涌现出许多方法.但目前方法仅考虑多模态信息的对齐和融合,既忽略了对遥感图像目标中的多尺度特征及其空间位置信息的深度挖掘,又缺乏对尺度特征的建模和推理的研究,导致答案预测不够全面和准确.针对以上问题,本文提出了一种多尺度引导的融合推理网络(multi-scale guided fusion inference network,MGFIN),旨在增强RSVQA系统的视觉空间推理能力.首先,本文设计了基于Swin Transformer的多尺度视觉表征模块,对嵌入空间位置信息的多尺度视觉特征进行编码;其次,在语言线索的引导下,本文使用多尺度关系推理模块以尺度空间为线索学习跨多个尺度的高阶群内对象关系,并进行空间层次推理;最后,设计基于推理的融合模块来弥合多模态语义鸿沟,在交叉注意力基础上,通过自监督范式、对比学习方法、图文匹配机制等训练目标来自适应地对齐融合多模态特征,并辅助预测最终答案.实验结果表明,本文提出的模型在两个公共RSVQA数据集上具有显著优势.  相似文献   

8.
随着生成式对抗网络的出现,从文本描述合成图像最近成为一个活跃的研究领域.然而,目前文本描述往往使用英文,生成的对象也大多是人脸和花鸟等,专门针对中文和中国画的研究较少.同时,文本生成图像任务往往需要大量标注好的图像文本对,制作数据集的代价昂贵.随着多模态预训练的出现与推进,使得能够以一种优化的方式来指导生成对抗网络的生成过程,大大减少了对数据集和计算资源的需求.提出一种多域VQGAN模型来同时生成多种域的中国画,并利用多模态预训练模型WenLan来计算生成图像和文本描述之间的距离损失,通过优化输入多域VQGAN的隐空间变量来达到图片与文本语义一致的效果.对模型进行了消融实验,详细比较了不同结构的多域VQGAN的FID及R-precisoin指标,并进行了用户调查研究.结果表示,使用完整的多域VQGAN模型在图像质量和文本图像语义一致性上均超过原VQGAN模型的生成结果.  相似文献   

9.
在多模态机器学习领域,为特定任务而制作的人工标注数据昂贵,且不同任务难以进行迁移,从而需要大量重新训练,导致训练多个任务时效率低下、资源浪费。预训练模型通过以自监督为代表的方式进行大规模数据训练,对数据集中不同模态的信息进行提取和融合,以学习其中蕴涵的通用知识表征,从而服务于广泛的相关下游视觉语言多模态任务,这一方法逐渐成为人工智能各领域的主流方法。依靠互联网所获取的大规模图文对与视频数据,以及以自监督学习为代表的预训练方法的进步,视觉语言多模态预训练模型在很大程度上打破了不同视觉语言任务之间的壁垒,提升了多个任务训练的效率并促进了具体任务的性能表现。本文总结视觉语言多模态预训练领域的进展,首先对常见的预训练数据集和预训练方法进行汇总,然后对目前最新方法以及经典方法进行系统概述,按输入来源分为图像—文本预训练模型和视频—文本多模态模型两大类,阐述了各方法之间的共性和差异,并将各模型在具体下游任务上的实验情况进行汇总。最后,总结了视觉语言预训练面临的挑战和未来发展趋势。  相似文献   

10.
深度学习目前被广泛应用于计算机视觉、机器人技术和自然语言处理等领域。然而,已有研究表明,深度神经网络在对抗样本面前很脆弱,一个精心制作的对抗样本就可以使深度学习模型判断出错。现有的研究大多通过产生微小的Lp范数扰动来误导分类器的对抗性攻击,但是取得的效果并不理想。本文提出一种新的对抗攻击方法——图像着色攻击,将输入样本转为灰度图,设计一种灰度图上色方法指导灰度图着色,最终利用经过上色的图像欺骗分类器实现无限制攻击。实验表明,这种方法制作的对抗样本在欺骗几种最先进的深度神经网络图像分类器方面有不俗表现,并且通过了人类感知研究测试。  相似文献   

11.
Peng  Xu  Xintong  Bao 《Multimedia Tools and Applications》2022,81(10):13799-13822

News plays an indispensable role in the development of human society. With the emergence of new media, fake news including multi-modal content such as text and images has greater social harm. Therefore how to identify multi-modal fake news has been a challenge. The traditional methods of multi-modal fake news detection are to simply fuse the different modality information, such as concatenation and element-wise product, without considering the different impacts of the different modalities, which leads to the low accuracy of fake news detection. To address this issue, we design a new multi-modal attention adversarial fusion method built on the pre-training language model BERT, which consists of two important components: the attention mechanism and the adversarial mechanism. The attention mechanism is used to capture the differences in different modalities. The adversarial mechanism is to capture the correlation between different modalities. Experiments on a fake news Chinese public dataset indicate that our proposed new method achieves 5% higher in terms of F1.

  相似文献   

12.
Multimodal deep learning systems that employ multiple modalities like text, image, audio, video, etc., are showing better performance than individual modalities (i.e., unimodal) systems. Multimodal machine learning involves multiple aspects: representation, translation, alignment, fusion, and co-learning. In the current state of multimodal machine learning, the assumptions are that all modalities are present, aligned, and noiseless during training and testing time. However, in real-world tasks, typically, it is observed that one or more modalities are missing, noisy, lacking annotated data, have unreliable labels, and are scarce in training or testing, and or both. This challenge is addressed by a learning paradigm called multimodal co-learning. The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using the transfer of knowledge between modalities, including their representations and predictive models.Co-learning being an emerging area, there are no dedicated reviews explicitly focusing on all challenges addressed by co-learning. To that end, in this work, we provide a comprehensive survey on the emerging area of multimodal co-learning that has not been explored in its entirety yet. We review implementations that overcome one or more co-learning challenges without explicitly considering them as co-learning challenges. We present the comprehensive taxonomy of multimodal co-learning based on the challenges addressed by co-learning and associated implementations. The various techniques, including the latest ones, are reviewed along with some applications and datasets. Additionally, we review techniques that appear to be similar to multimodal co-learning and are being used primarily in unimodal or multi-view learning. The distinction between them is documented. Our final goal is to discuss challenges and perspectives and the important ideas and directions for future work that we hope will benefit for the entire research community focusing on this exciting domain.  相似文献   

13.
深度学习模型可以从原始数据中自动学习到数据的纹理特征和形态特征, 使得其在安全验证、识别分类、语音人脸识别等不同领域取得远远超过人工特征方法的性能。虽然深度学习在图像分类和目标检测等方向上取得了较好成效, 但是通过在输入上添加难以察觉的微小扰动形成的对抗样本导致深度学习模型在实际使用中存在巨大的风险。因此, 提高单个模型的鲁棒性是重要的研究方向。前人在时序数据分类模型的鲁棒性研究中, 对抗样本的解释性研究较为欠缺。目前较为常见的防御对抗样本的方法是对抗训练, 但是对抗训练有着非常高的训练代价。本文以时序数据分类模型为研究对象, 定义了时序数据的纹理特征和形态特征, 并基于理论证明和可视化特征层方式, 说明了纹理特征是被攻击的关键因素。同时, 提出了一种基于特征约束的模型鲁棒性提升方法。该方法结合多任务学习, 通过在误差函数中增加特征的平滑约束项, 引导模型在分类的同时尽可能学习到原始数据的形态特征。在保证分类精度的同时, 降低对抗样本存在的空间, 从而训练出更加鲁棒的模型。算法在经典分类模型和多个时序数据集进行了大量的实验, 实验结果表明了本文方法的有效性, 在多种对抗攻击下, 能较好的提高单个模型的鲁棒性。  相似文献   

14.
在多模态深度学习发展前期总结当前多模态深度学习,发现在不同多模态组合和学习目标下,多模态深度学习实现过程中的共有问题,并对共有问题进行分类,叙述解决各类问题的方法。具体来说,从涉及自然语言、视觉、听觉的多模态学习中考虑了语言翻译、事件探测、信息描述、情绪识别、声音识别和合成以及多媒体检索等方面研究,将多模态深度学习实现过程中的共有问题分为模态表示、模态传译、模态融合和模态对齐四类,并对各类问题进行子分类和论述,同时列举了为解决各类问题产生的神经网络模型。最后论述了实际多模态系统、多模态深度学习研究中常用的数据集和评判标准,并展望了多模态深度学习的发展趋势。  相似文献   

15.
磁共振成像(MRI)作为一种典型的非侵入式成像技术,可产生高质量的无损伤和无颅骨伪影的脑影像,为脑肿瘤的诊断和治疗提供更为全面的信息,是脑肿瘤诊疗的主要技术手段。MRI脑肿瘤自动分割利用计算机技术从多模态脑影像中自动将肿瘤区(坏死区、水肿区、非增强肿瘤区和增强肿瘤区)和正常组织区进行分割和标注,对于辅助脑肿瘤的诊疗具有重要作用。本文对MRI脑肿瘤图像分割的深度学习方法进行了总结与分析,给出了各类方法的基本思想、网络架构形式、代表性改进方案以及优缺点总结等,并给出了部分典型方法在BraTS(multimodal brain tumor segmentation)数据集上的性能表现与分析结果。通过对该领域研究方法进行综述,对现有基于深度学习的MRI脑肿瘤分割研究方法进行了梳理,作为新的发展方向,MRI脑肿瘤图像分割的深度学习方法较传统方法已取得明显的性能提升,已成为领域主流方法并持续展现出良好的发展前景,有助于进一步推动MRI脑肿瘤分割在临床诊疗上的应用。  相似文献   

16.

Diseases of the eye require manual segmentation and examination of the optic disc by ophthalmologists. Though, image segmentation using deep learning techniques is achieving remarkable results, it leverages on large-scale labeled datasets. But, in the field of medical imaging, it is challenging to acquire large labeled datasets. Hence, this article proposes a novel deep learning model to automatically segment the optic disc in retinal fundus images by using the concepts of semi-supervised learning and transfer learning. Initially, a convolutional autoencoder (CAE) is trained to automatically learn features from a large number of unlabeled fundus images available from the Kaggle’s diabetic retinopathy (DR) dataset. The autoencoder (AE) learns the features from the unlabeled images by reconstructing the input images and becomes a pre-trained network (model). After this, the pre-trained autoencoder network is converted into a segmentation network. Later, using transfer learning, the segmentation network is trained with retinal fundus images along with their corresponding optic disc ground truth images from the DRISHTI GS1 and RIM-ONE datasets. The trained segmentation network is then tested on retinal fundus images from the test set of DRISHTI GS1 and RIM-ONE datasets. The experimental results show that the proposed method performs on par with the state-of-the-art methods achieving a 0.967 and 0.902 dice score coefficient on the test set of the DRISHTI GS1 and RIM-ONE datasets respectively. The proposed method also shows that transfer learning and semi-supervised learning overcomes the barrier imposed by the large labeled dataset. The proposed segmentation model can be used in automatic retinal image processing systems for diagnosing diseases of the eye.

  相似文献   

17.
目的 在智能监控视频分析领域中,行人重识别是跨无交叠视域的摄像头匹配行人的基础问题。在可见光图像的单模态匹配问题上,现有方法在公开标准数据集上已取得优良的性能。然而,在跨正常光照与低照度场景进行行人重识别的时候,使用可见光图像和红外图像进行跨模态匹配的效果仍不理想。研究的难点主要有两方面:1)在不同光谱范围成像的可见光图像与红外图像之间显著的视觉差异导致模态鸿沟难以消除;2)人工难以分辨跨模态图像的行人身份导致标注数据缺乏。针对以上两个问题,本文研究如何利用易于获得的有标注可见光图像辅助数据进行单模态自监督信息的挖掘,从而提供先验知识引导跨模态匹配模型的学习。方法 提出一种随机单通道掩膜的数据增强方法,对输入可见光图像的3个通道使用掩膜随机保留单通道的信息,使模型关注提取对光谱范围不敏感的特征。提出一种基于三通道与单通道双模型互学习的预训练与微调方法,利用三通道数据与单通道数据之间的关系挖掘与迁移鲁棒的跨光谱自监督信息,提高跨模态匹配模型的匹配能力。结果 跨模态行人重识别的实验在“可见光—红外”多模态行人数据集SYSU-MM01(Sun Yat-Sen University Multiple Modality 01)、RGBNT201(RGB,near infrared,thermal infrared,201)和RegDB上进行。实验结果表明,本文方法在这3个数据集上都达到领先水平。与对比方法中的最优结果相比,在RGBNT201数据集上的平均精度均值mAP (mean average precision)有最高接近5%的提升。结论 提出的单模态跨光谱自监督信息挖掘方法,利用单模态可见光图像辅助数据挖掘对光谱范围变化不敏感的自监督信息,引导单模态预训练与多模态有监督微调,提高跨模态行人重识别的性能。  相似文献   

18.
Semantic image synthesis is a process for generating photorealistic images from a single semantic mask. To enrich the diversity of multimodal image synthesis, previous methods have controlled the global appearance of an output image by learning a single latent space. However, a single latent code is often insufficient for capturing various object styles because object appearance depends on multiple factors. To handle individual factors that determine object styles, we propose a class- and layer-wise extension to the variational autoencoder (VAE) framework that allows flexible control over each object class at the local to global levels by learning multiple latent spaces. Furthermore, we demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods via extensive experiments with real and synthetic datasets in three different domains. We also show that our method enables a wide range of applications in image synthesis and editing tasks.  相似文献   

19.
由于现实生活场景差异大,人类在不同场景中表现的情感也不尽相同,导致获取到的情感数据集标签分布不均衡;同时传统方法多采用模型预训练和特征工程来增强与表情相关特征的表示能力,但没有考虑不同特征表达之间的互补性,限制了模型的泛化性和鲁棒性.针对上述问题,提出了一种包含网络集成模型Ens-Net的端到端深度学习框架EE-GAN...  相似文献   

20.
The fusion and combination of images from multiple modalities is important in many applications. Typically, this process consists of the alignment of the images and the combination of the complementary information. In this work, we focused on the former part and propose a multimodal image distance measure based on the commutativity of graph Laplacians. The eigenvectors of the image graph Laplacian, and thus the graph Laplacian itself, capture the intrinsic structure of the image’s modality. Using Laplacian commutativity as a criterion of image structure preservation, we adapt the problem of finding the closest commuting operators to multimodal image registration. Hence, by using the relation between simultaneous diagonalization and commutativity of matrices, we compare multimodal image structures by means of the commutativity of their graph Laplacians. In this way, we avoid spectrum reordering schemes or additional manifold alignment steps which are necessary to ensure the comparability of eigenspaces across modalities. We show on synthetic and real datasets that this approach is applicable to dense rigid and non-rigid image registration. Results demonstrated that the proposed measure is able to deal with very challenging multimodal datasets and compares favorably to normalized mutual information, a de facto similarity measure for multimodal image registration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号