首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Using neural networks for semantic labeling has become a dominant technique for layout analysis of historical document images. However, to train or fine-tune appropriate models, large labeled datasets are needed. This paper addresses the case when only limited labeled data are available and promotes a novel approach using so-called controlled data to pre-train the networks. Two different strategies are proposed: The first addresses the real labeling task by using artificial data; the second uses real data to pre-train the networks with a pretext task. To assess these strategies, a large set of experiments has been carried out on a text line detection and classification task using different variants of U-Net. The observations, obtained from two different datasets, show that globally the approach reduces the training time while offering similar or better performance. Furthermore, the effect is bigger on lightweight network architectures.

  相似文献   

2.
目的 基于深度学习的飞机目标识别方法在遥感图像解译领域取得了很大进步,但其泛化能力依赖于大规模数据集。条件生成对抗网络(conditional generative adversarial network,CGAN)可用于产生逼真的生成样本以扩充真实数据集,但对复杂遥感场景的建模能力有限,生成样本质量低。针对这些问题,提出了一种结合CGAN样本生成的飞机识别框架。方法 改进条件生成对抗网络,利用感知损失提高生成器对遥感图像的建模能力,提出了基于掩膜的结构相似性(structural similarity,SSIM)度量损失函数(masked-SSIM loss)以提高生成样本中飞机区域的图像质量,该损失函数与飞机的掩膜相结合以保证只作用于图像中的飞机区域而不影响背景区域。选取一个基于残差网络的识别模型,与改进后的生成模型结合,构成飞机识别框架,训练过程中利用生成样本代替真实的卫星图像,降低了对实际卫星数据规模的需求。结果 采用生成样本与真实样本训练的识别模型在真实样本上的进行实验,前者的准确率比后者低0.33%;对于生成模型,在加入感知损失后,生成样本的峰值信噪比(peak signal to noise ratio,PSNR)提高了0.79 dB,SSIM提高了0.094;在加入基于掩膜的结构相似性度量损失函数后,生成样本的PSNR提高了0.09 dB,SSIM提高了0.252。结论 本文提出的基于样本生成的飞机识别框架生成了质量更高的样本,这些样本可以替代真实样本对识别模型进行训练,有效地解决了飞机识别任务中的样本不足问题。  相似文献   

3.
Recent approaches for classifying data streams are mostly based on supervised learning algorithms, which can only be trained with labeled data. Manual labeling of data is both costly and time consuming. Therefore, in a real streaming environment where large volumes of data appear at a high speed, only a small fraction of the data can be labeled. Thus, only a limited number of instances will be available for training and updating the classification models, leading to poorly trained classifiers. We apply a novel technique to overcome this problem by utilizing both unlabeled and labeled instances to train and update the classification model. Each classification model is built as a collection of micro-clusters using semi-supervised clustering, and an ensemble of these models is used to classify unlabeled data. Empirical evaluation of both synthetic and real data reveals that our approach outperforms state-of-the-art stream classification algorithms that use ten times more labeled data than our approach.  相似文献   

4.
We present a new scheme for the estimation of Markov random field line process parameters which uses geometric CAD models of the objects in the scene. The models are used to generate synthetic images of the objects from random view points. The edge maps computed from the synthesized images are used as training samples to estimate the line process parameters using a least squares method. We show that this parameter estimation method is useful for detecting edges in range as well as intensity edges. The main contributions of the paper are: 1) use of CAD models to obtain true edge labels which are otherwise not available; and 2) use of canonical Markov random field representation to reduce the number of parameters  相似文献   

5.
We propose the use of Vapnik's vicinal risk minimization (VRM) for training decision trees to approximately maximize decision margins. We implement VRM by propagating uncertainties in the input attributes into the labeling decisions. In this way, we perform a global regularization over the decision tree structure. During a training phase, a decision tree is constructed to minimize the total probability of misclassifying the labeled training examples, a process which approximately maximizes the margins of the resulting classifier. We perform the necessary minimization using an appropriate meta-heuristic (genetic programming) and present results over a range of synthetic and benchmark real datasets. We demonstrate the statistical superiority of VRM training over conventional empirical risk minimization (ERM) and the well-known C4.5 algorithm, for a range of synthetic and real datasets. We also conclude that there is no statistical difference between trees trained by ERM and using C4.5. Training with VRM is shown to be more stable and repeatable than by ERM.  相似文献   

6.
In the area of computer vision, deep learning has produced a variety of state-of-the-art models that rely on massive labeled data. However, collecting and annotating images from the real world is too demanding in terms of labor and money investments, and is usually inflexible to build datasets with specific characteristics, such as small area of objects and high occlusion level. Under the framework of Parallel Vision, this paper presents a purposeful way to design artificial scenes and automatically generate virtual images with precise annotations. A virtual dataset named ParallelEye is built, which can be used for several computer vision tasks. Then, by training the DPM (Deformable parts model) and Faster R-CNN detectors, we prove that the performance of models can be significantly improved by combining ParallelEye with publicly available real-world datasets during the training phase. In addition, we investigate the potential of testing the trained models from a specific aspect using intentionally designed virtual datasets, in order to discover the flaws of trained models. From the experimental results, we conclude that our virtual dataset is viable to train and test the object detectors.   相似文献   

7.
目前深度学习在医学图像分析领域取得的良好表现大多取决于高质量带标注的数据集, 但是医学图像由于其专业性和复杂性, 数据集的标注工作往往需要耗费巨大的成本. 本文针对这一问题设计了一种基于深度主动学习的半自动标注系统, 该系统通过主动学习算法减少训练深度学习标注模型所需的标注样本数量, 训练完成后的标注模型可以用于剩余数据集的标注工作. 系统基于Web应用构建, 无需安装且能跨平台访问, 便于用户完成标注工作.  相似文献   

8.
基于主动学习和半监督学习的多类图像分类   总被引:5,自引:0,他引:5  
陈荣  曹永锋  孙洪 《自动化学报》2011,37(8):954-962
多数图像分类算法需要大量的训练样本对分类器模型进行训练.在实际应用中, 对大量样本进行标注非常枯燥、耗时.对于一些特殊图像,如合成孔径雷达 (Synthetic aperture radar, SAR)图像, 对其内容判读非常困难,因此能够获得的标注样本数量非常有限. 本文将基于最优标号和次优标号(Best vs second-best, BvSB)的主动学习和带约束条件的自学习(Constrained self-training, CST) 引入到基于支持向量机(Support vector machine, SVM)分类器的图像分类算法中,提出了一种新的图像分类方法.通过BvSB 主动学习去挖掘那些对当前分类器模型最有价值的样本进行人工标注,并借助CST半 监督学习进一步利用样本集中大量的未标注样本,使得在花费较小标注代价情况下, 能够获得良好的分类性能.将新方法与随机样本选择、基于熵的不确定性采样主动学 习算法以及BvSB主动学习方法进行了性能比较.对3个光学图像集及1个SAR图像集分类 问题的实验结果显示,新方法能够有效地减少分类器训练时所需的人工标注样本的数 量,并获得较高的准确率和较好的鲁棒性.  相似文献   

9.
Robotic advances and developments in sensors and acquisition systems facilitate the collection of survey data in remote and challenging scenarios. Semantic segmentation, which attempts to provide per‐pixel semantic labels, is an essential task when processing such data. Recent advances in deep learning approaches have boosted this task's performance. Unfortunately, these methods need large amounts of labeled data, which is usually a challenge in many domains. In many environmental monitoring instances, such as the coral reef example studied here, data labeling demands expert knowledge and is costly. Therefore, many data sets often present scarce and sparse image annotations or remain untouched in image libraries. This study proposes and validates an effective approach for learning semantic segmentation models from sparsely labeled data. Based on augmenting sparse annotations with the proposed adaptive superpixel segmentation propagation, we obtain similar results as if training with dense annotations, significantly reducing the labeling effort. We perform an in‐depth analysis of our labeling augmentation method as well as of different neural network architectures and loss functions for semantic segmentation. We demonstrate the effectiveness of our approach on publicly available data sets of different real domains, with the emphasis on underwater scenarios—specifically, coral reef semantic segmentation. We release new labeled data as well as an encoder trained on half a million coral reef images, which is shown to facilitate the generalization to new coral scenarios.  相似文献   

10.
We present a novel and light‐weight approach to capture and reconstruct structured 3D models of multi‐room floor plans. Starting from a small set of registered panoramic images, we automatically generate a 3D layout of the rooms and of all the main objects inside. Such a 3D layout is directly suitable for use in a number of real‐world applications, such as guidance, location, routing, or content creation for security and energy management. Our novel pipeline introduces several contributions to indoor reconstruction from purely visual data. In particular, we automatically partition panoramic images in a connectivity graph, according to the visual layout of the rooms, and exploit this graph to support object recovery and rooms boundaries extraction. Moreover, we introduce a plane‐sweeping approach to jointly reason about the content of multiple images and solve the problem of object inference in a top‐down 2D domain. Finally, we combine these methods in a fully automated pipeline for creating a structured 3D model of a multi‐room floor plan and of the location and extent of clutter objects. These contribution make our pipeline able to handle cluttered scenes with complex geometry that are challenging to existing techniques. The effectiveness and performance of our approach is evaluated on both real‐world and synthetic models.  相似文献   

11.
Learning from synthetic data has many important applications in case where sufficient amounts of labeled data are not available. Using synthetic data is challenging due to differences in feature distributions between synthetic and actual data, a phenomenon we term synthetic gap. In this paper, we investigate and formalize a general framework – Stacked Multichannel Autoencoder (SMCAE) that enables bridging the synthetic gap and learning from synthetic data more efficiently. In particular, we show that our SMCAE can not only transform and use synthetic data on a challenging face-sketch recognition task, but that it can also help simulate real images which can be used for training classifiers for recognition. Preliminary experiments validate the effectiveness of the proposed framework.  相似文献   

12.
An important aspect of robust automated assembly is an accurate and efficient method for the inspection of finished assemblies. This novel algorithm is trained on synthetic images generated using the CAD model of the different components of the assembly. Once trained on synthetic images, the algorithm can detect assembly errors by examining real images of the assembled product  相似文献   

13.
The use of spatially varying reflectance models (SVBRDF) is the state of the art in physically based rendering and the ultimate goal is to acquire them from real world samples. Recently several promising deep learning approaches have emerged that create such models from a few uncalibrated photos, after being trained on synthetic SVBRDF datasets. While the achieved results are already very impressive, the reconstruction accuracy that is achieved by these approaches is still far from that of specialized devices. On the other hand, fitting SVBRDF parameter maps to the gibabytes of calibrated HDR images per material acquired by state of the art high quality material scanners takes on the order of several hours for realistic spatial resolutions. In this paper, we present a first deep learning approach that is capable of producing SVBRDF parameter maps more than two orders of magnitude faster than state of the art approaches, while still providing results of equal quality and generalizing to new materials unseen during the training. This is made possible by training our network on a large‐scale database of material scans that we have gathered with a commercially available SVBRDF scanner. In particular, we train a convolutional neural network to map calibrated input images to the 13 parameter maps of an anisotropic Ward BRDF, modified to account for Fresnel reflections, and evaluate the results by comparing the measured images against re‐renderings from our SVBRDF predictions. The novel approach is extensively validated on real world data taken from our material database, which we make publicly available under https://cg.cs.uni‐bonn.de/svbrdfs/ .  相似文献   

14.
安峰  戴军  韩振  严仲兴 《图学学报》2022,43(5):841-848
光流计算是诸多计算机视觉系统的关键模块,广泛应用于动作识别、机器人定位与导航等领域。 但目前端到端的光流计算仍受限于数据源的缺少,尤其是真实场景下的光流数据难以获取。人工合成的光流数 据占绝大多数,且合成数据不能完全反应真实场景(如树叶晃动、行人倒影等),难以避免过拟合等情况。无监 督或自监督方法可以利用海量的视频数据进行训练,摆脱了对数据集的依赖,是解决数据集缺少的有效途径。 基于此搭建了一个自监督学习光流计算网络,其中的“Teacher”模块和“Student”模块集成了最新光流计算网 络:稀疏相关体网络(SCV),减少了计算冗余量;同时引入注意力模型作为网络的一个节点,以提高图像特征 在通道和空间上的维度属性。将 SCV 与注意力机制集成在自监督学习光流计算网络之中,在 KITTI 2015 数据 集上的测试结果达到或超过了常见的有监督训练网络。  相似文献   

15.
Model-based image segmentation has been extensively used in medical imaging to learn both the shape and appearance of anatomical structures from training datasets. The more training datasets are used, the more accurate is the segmented model, as we account for more information about its variability. However, training datasets of large size with a proper sampling of the population may not always be available. In this paper, we compare the performance of statistical models in the context of lower limb bones segmentation using MR images when only a small number of datasets is available for training. For shape, both PCA-based priors and shape memory strategies are tested. For appearance, methods based on intensity profiles are tested, namely mean intensity profiles, multivariate Gaussian distributions of profiles and multimodal profiles from EM clustering. Segmentation results show that local and simple methods perform the best when a small number of datasets is available for training. Conversely, statistical methods feature the best segmentation results when the number of training datasets is increased.  相似文献   

16.
Digital scans of analogue photographic film typically contain artefacts such as dust and scratches. Automated removal of these is an important part of preservation and dissemination of photographs of historical and cultural importance. While state-of-the-art deep learning models have shown impressive results in general image inpainting and denoising, film artefact removal is an understudied problem. It has particularly challenging requirements, due to the complex nature of analogue damage, the high resolution of film scans, and potential ambiguities in the restoration. There are no publicly available high-quality datasets of real-world analogue film damage for training and evaluation, making quantitative studies impossible. We address the lack of ground-truth data for evaluation by collecting a dataset of 4K damaged analogue film scans paired with manually-restored versions produced by a human expert, allowing quantitative evaluation of restoration performance. We have made the dataset available at https://doi.org/10.6084/m9.figshare.21803304. We construct a larger synthetic dataset of damaged images with paired clean versions using a statistical model of artefact shape and occurrence learnt from real, heavily-damaged images. We carefully validate the realism of the simulated damage via a human perceptual study, showing that even expert users find our synthetic damage indistinguishable from real. In addition, we demonstrate that training with our synthetically damaged dataset leads to improved artefact segmentation performance when compared to previously proposed synthetic analogue damage overlays. The synthetically damaged dataset can be found at https://doi.org/10.6084/m9.figshare.21815844, and the annotated authentic artefacts along with the resulting statistical damage model at https://github.com/daniela997/FilmDamageSimulator. Finally, we use these datasets to train and analyse the performance of eight state-of-the-art image restoration methods on high-resolution scans. We compare both methods which directly perform the restoration task on scans with artefacts, and methods which require a damage mask to be provided for the inpainting of artefacts. We modify the methods to process the inputs in a patch-wise fashion to operate on original high resolution film scans.  相似文献   

17.
In recent years, crowd counting has increasingly drawn attention due to its widespread applications in the field of computer vision. Most of the existing methods rely on datasets with scarce labeled images to train networks. They are prone to suffer from the over-fitting problem. Further, these existing datasets usually just give manually labeled annotations related to the head center position. This kind of annotation provides limited information. In this paper, we propose to exploit virtual synthetic crowd scenes to improve the performance of the counting network in the real world. Since we can obtain people masks easily in a synthetic dataset, we first learn to distinguish people from the background via a segmentation network using the synthetic data. Then we transfer the learned segmentation priors from synthetic data to real-world data. Finally, we train a density estimation network on real-world data by utilizing the obtained people masks. Our experiments on two crowd counting datasets demonstrate the effectiveness of the proposed method.  相似文献   

18.
目的 从眼底图像中分割视盘和视杯对于眼部疾病智能诊断来说是一项重要工作,U-Net及变体模型已经广泛应用在视杯盘分割任务中。由于连续的卷积与池化操作容易引起空间信息损失,导致视盘和视杯分割精度差且效率低。提出了融合残差上下文编码和路径增强的深度学习网络RCPA-Net,提升了分割结果的准确性与连续性。方法 采用限制对比度自适应直方图均衡方法处理输入图像,增强对比度并丰富图像信息。特征编码模块以ResNet34(residual neural network)为骨干网络,通过引入残差递归与注意力机制使模型更关注感兴趣区域,采用残差空洞卷积模块捕获更深层次的语义特征信息,使用路径增强模块在浅层特征中获得精确的定位信息来增强整个特征层次。本文还提出了一种新的多标签损失函数用于提高视盘视杯与背景区域的像素比例并生成最终的分割图。结果 在4个数据集上与多种分割方法进行比较,在ORIGA(online retinal fundus image database for glaucoma analysis)数据集中,本文方法对视盘分割的JC(Jaccard)指数为0.939 1,F-measure为...  相似文献   

19.
In this work we discuss the problem of automatically determining bounding box annotations for objects in images whereas we only assume weak labeling in the form of global image labels. We therefore are only given a set of positive images all containing at least one instance of a desired object and a negative set of images which represent background. Our goal is then to determine the locations of the object instances within the positive images by bounding boxes. We also describe and analyze a method for automatic bounding box annotation which consists of two major steps. First, we apply a statistical model for determining visual features which are likely to be indicative for the respective object class. Based on these feature models we infer preliminary estimations for bounding boxes. Second, we use a CCCP training algorithm for latent structured SVM in order to improve the initial estimations by using them as initializations for latent variables modeling the optimal bounding box positions. We evaluate our approach on three publicly available datasets.  相似文献   

20.
目的 基于清晰图像训练的深度神经网络检测模型因为成像差异导致的域偏移问题使其难以直接泛化到水下场景。为了有效解决清晰图像和水下图像的特征偏移问题,提出一种即插即用的特征增强模块(feature de-drifting module Unet,FDM-Unet)。方法 首先提出一种基于成像模型的水下图像合成方法,从真实水下图像中估计色偏颜色和亮度,从清晰图像估计得到场景深度信息,根据改进的光照散射模型将清晰图像合成为具有真实感的水下图像。然后,借鉴U-Net结构,设计了一个轻量的特征增强模块FDM-Unet。在清晰图像和对应的合成水下图像对上,采用常见的清晰图像上预训练的检测器,提取它们对应的浅层特征,将水下图像对应的退化浅层特征输入FDM-Unet进行增强,并将增强之后的特征与清晰图像对应的特征计算均方误差(mean-square error,MSE)损失,从而监督FDM-Unet进行训练。最后,将训练好的FDM-Unet直接插入上述预训练的检测器的浅层位置,不需要对网络进行重新训练或微调,即可以直接处理水下图像目标检测。结果 实验结果表明,FDM-Unet在PASCAL VOC 2007(pattern analysis,statistical modeling and computational learning visual object classes 2007)合成水下图像测试集上,针对YOLO v3(you only look once v3)和SSD (single shot multibox detector)预训练检测器,检测精度mAP (mean average precision)分别提高了8.58%和7.71%;在真实水下数据集URPC19(underwater robot professional contest 19)上,使用不同比例的数据进行微调,相比YOLO v3和SSD,mAP分别提高了4.4%~10.6%和3.9%~10.7%。结论 本文提出的特征增强模块FDM-Unet以增加极小的参数量和计算量为代价,不仅能直接提升预训练检测器在合成水下图像的检测精度,也能在提升在真实水下图像上微调后的检测精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号