首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents an approach to categorize typical places in indoor environments using 3D scans provided by a laser range finder. Examples of such places are offices, laboratories, or kitchens. In our method, we combine the range and reflectance data from the laser scan for the final categorization of places. Range and reflectance images are transformed into histograms of local binary patterns and combined into a single feature vector. This vector is later classified using support vector machines. The results of the presented experiments demonstrate the capability of our technique to categorize indoor places with high accuracy. We also show that the combination of range and reflectance information improves the final categorization results in comparison with a single modality.  相似文献   

2.
3.
Pan  Meng  Zhang  Huanrong  Wu  Jiahao  Jin  Zhi 《Multimedia Tools and Applications》2022,81(25):35899-35913

As one of the most crucial tasks of scene perception, Monocular Depth Estimation (MDE) has made considerable development in recent years. Current MDE researchers are interested in the precision and speed of the estimation, but pay less attention to the generalization ability across scenes. For instance, the MDE networks trained on outdoor scenes achieve impressive performance on outdoor scenes but poor performance on indoor scenes, and vice versa. To tackle this problem, we propose a self-distillation MDE framework to improve the generalization ability across different scenes in this paper. Specifically, we design a student encoder that extracts features from two datasets of indoor and outdoor scenes, respectively. After that, we introduce a dissimilarity loss to pull apart encoded features of different scenes in the feature space. Finally, a decoder is adopted to estimate the final depth from encoded features. By doing so, our self-distillation MDE framework can learn the depth estimation of two different datasets. To the best of our knowledge, we are the first one to tackle the generalization problem across datasets of different scenes in the MDE field. Experiments demonstrate that our method reduces the degradation problem when a MDE network is in the face of datasets with complex data distribution. Note that evaluating on two datasets by a single network is more challenging than evaluating on two datasets by two different networks.

  相似文献   

4.
针对雾霾情况下室内外图像深度难以估计的问题,提出了融合感知损失函数的单幅雾霾图像深度估计方法.首先采用双尺度网络模型对雾霾图像进行粗提取,再结合底层特征进行局部细化;然后在上采样阶段使用多卷积核上采样方法,得到雾霾图像的预测深度图;最后将像素级损失函数与感知损失函数结合构造新的复合损失函数,对网络进行训练.在室内NYU...  相似文献   

5.
基于高维肺部计算机断层扫描(CT)图像的肺结节检测是一项极具挑战性的任务。在诸多肺结节检测算法中,深度卷积神经网络(CNN)最引人注目,其中二维(2D) CNN具有预训练模型多、检测效率高等优点,应用非常广泛,但肺结节本质是三维(3D)病灶,2D CNN会不可避免地造成信息损失,从而影响检测精度。3D CNN能充分利用CT图像空间信息,有效提升检测精度,但是3D CNN存在参数多、计算消耗大、过拟合风险高等不足。为了兼顾两者的优势,提出基于深度混合CNN的肺结节检测模型,通过在神经网络模型的浅层部署3D CNN,在模型的深层部署2D CNN,并增加反卷积模块,融合了多层级的图像特征,达到了在不损失检测精度的情况下减少模型参数、增强模型泛化能力,提高检测效率的目的。在LUNA16数据集上的实验结果表明,所提出的模型在平均每次扫描8个假阳性的情况下的敏感度为0.924,优于现有的先进模型。  相似文献   

6.
基于高维肺部计算机断层扫描(CT)图像的肺结节检测是一项极具挑战性的任务。在诸多肺结节检测算法中,深度卷积神经网络(CNN)最引人注目,其中二维(2D) CNN具有预训练模型多、检测效率高等优点,应用非常广泛,但肺结节本质是三维(3D)病灶,2D CNN会不可避免地造成信息损失,从而影响检测精度。3D CNN能充分利用CT图像空间信息,有效提升检测精度,但是3D CNN存在参数多、计算消耗大、过拟合风险高等不足。为了兼顾两者的优势,提出基于深度混合CNN的肺结节检测模型,通过在神经网络模型的浅层部署3D CNN,在模型的深层部署2D CNN,并增加反卷积模块,融合了多层级的图像特征,达到了在不损失检测精度的情况下减少模型参数、增强模型泛化能力,提高检测效率的目的。在LUNA16数据集上的实验结果表明,所提出的模型在平均每次扫描8个假阳性的情况下的敏感度为0.924,优于现有的先进模型。  相似文献   

7.
在室内单目视觉导航任务中,场景的深度信息十分重要.但单目深度估计是一个不适定问题,精度较低.目前, 2D激光雷达在室内导航任务中应用广泛,且价格低廉.因此,本文提出一种融合2D激光雷达的室内单目深度估计算法来提高深度估计精度.本文在编解码结构上增加了2D激光雷达的特征提取,通过跳跃连接增加单目深度估计结果的细节信息,并提出一种运用通道注意力机制融合2D激光雷达特征和RGB图像特征的方法.本文在公开数据集NYUDv2上对算法进行验证,并针对本文算法的应用场景,制作了带有2D激光雷达数据的深度数据集.实验表明,本文提出的算法在公开数据集和自制数据集中均优于现有的单目深度估计.  相似文献   

8.
崔帅  张骏  高隽 《中国图象图形学报》2019,24(12):2111-2125
目的 颜色恒常性通常指人类在任意光源条件下正确感知物体颜色的自适应能力,是实现识别、分割、3维视觉等高层任务的重要前提。对图像进行光源颜色估计是实现颜色恒常性计算的主要途径之一,现有光源颜色估计方法往往因局部场景的歧义颜色导致估计误差较大。为此,提出一种基于深度残差学习的光源颜色估计方法。方法 将输入图像均匀分块,根据局部图像块的光源颜色估计整幅图像的全局光源颜色。算法包括光源颜色估计和图像块选择两个残差网络:光源颜色估计网络通过较深的网络层次和残差结构提高光源颜色估计的准确性;图像块选择网络按照光源颜色估计误差对图像块进行分类,根据分类结果去除图像中误差较大的图像块,进一步提高全局光源颜色估计精度。此外,对输入图像进行对数色度预处理,可以降低图像亮度对光源颜色估计的影响,提高计算效率。结果 在NUS-8和重处理的ColorChecker数据集上的实验结果表明,本文方法的估计精度和稳健性较好;此外,在相同条件下,对数色度图像比原始图像的估计误差低10% 15%,图像块选择网络能够进一步使光源颜色估计网络的误差降低约5%。结论 在两组单光源数据集上的实验表明,本文方法的总体设计合理有效,算法精度和稳健性好,可应用于需要进行色彩校正的图像处理和计算机视觉等领域。  相似文献   

9.
通过肺部CT影像进行肺结节检测是肺癌早期筛查的重要手段,而候选结节的假阳性筛查是结节检测的关键部分.传统的结节检测方法严重依赖先验知识,流程繁琐,性能并不理想.在深度学习中,卷积神经网络可以在通用的学习过程中提取图像的特征.该文以密集神经网络为基础设计了一个三维结节假阳性筛查模型—三维卷积神经网络模型(TDN-CNN)...  相似文献   

10.
针对未知环境下移动机器人平稳上坡控制对坡度感知精度的要求,本文提出了一种基于迁移学习的移动机器人单帧图像坡度检测算法.利用室内图像标准数据集训练深度卷积神经场-全连接超像素池化网络(deep convolutional neural field-fully connected superpixel pooling ne...  相似文献   

11.
This paper addresses the problem of image-based event recognition by transferring deep representations learned from object and scene datasets. First we empirically investigate the correlation of the concepts of object, scene, and event, thus motivating our representation transfer methods. Based on this empirical study, we propose an iterative selection method to identify a subset of object and scene classes deemed most relevant for representation transfer. Afterwards, we develop three transfer techniques: (1) initialization-based transfer, (2) knowledge-based transfer, and (3) data-based transfer. These newly designed transfer techniques exploit multitask learning frameworks to incorporate extra knowledge from other networks or additional datasets into the fine-tuning procedure of event CNNs. These multitask learning frameworks turn out to be effective in reducing the effect of over-fitting and improving the generalization ability of the learned CNNs. We perform experiments on four event recognition benchmarks: the ChaLearn LAP Cultural Event Recognition dataset, the Web Image Dataset for Event Recognition, the UIUC Sports Event dataset, and the Photo Event Collection dataset. The experimental results show that our proposed algorithm successfully transfers object and scene representations towards the event dataset and achieves the current state-of-the-art performance on all considered datasets.  相似文献   

12.
步态识别是根据人体的行走方式进行身份识别. 目前, 大多数步态识别方法通过浅层神经网络进行特征提取, 在室内步态数据集表现良好, 然而在近年新公布的室外步态数据集中性能表现不佳. 为了解决室外步态数据集带来的严峻挑战, 提出了一种基于视频残差神经网络的深度步态识别模型. 在特征提取阶段, 基于提出的视频残差块构建深层3D卷积神经网络(3D CNN), 提取整个步态序列的时空动力学特征; 然后, 引入时序池化和水平金字塔映射降低采样特征分辨率并提取局部步态特征; 使用联合损失函数驱动训练过程, 最后通过BNNeck平衡损失函数并调整特征空间. 实验分别在公开的室内 (CASIA-B)、室外(GREW、Gait3D)这3个步态数据集上进行. 实验结果表明, 该模型在室外步态数据集中的准确率以及收敛速度优于其他模型.  相似文献   

13.
In computer vision fields, 3D object recognition is one of the most important tasks for many real-world applications. Three-dimensional convolutional neural networks (CNNs) have demonstrated their advantages in 3D object recognition. In this paper, we propose to use the principal curvature directions of 3D objects (using a CAD model) to represent the geometric features as inputs for the 3D CNN. Our framework, namely CurveNet, learns perceptually relevant salient features and predicts object class labels. Curvature directions incorporate complex surface information of a 3D object, which helps our framework to produce more precise and discriminative features for object recognition. Multitask learning is inspired by sharing features between two related tasks, where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification. Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification. We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs. A Cross-Stitch module was adopted to learn effective shared features across multiple representations. We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task.   相似文献   

14.
We introduce a new dataset called GeoPose3K1 which contains over three thousand precise camera poses of mountain landscape images. In addition to camera location and orientation, we provide data for the training and evaluation of computer vision methods and applications in the context of outdoor scenes; synthetic depth maps, normal maps, illumination simulation and semantic labels. In order to illustrate properties of the dataset, we compare results achieved by state-of-the-art visual geo-localization method on GeoPose3K with results achieved on an existing dataset for visual geo-localization. So as to foster research of computer vision algorithms for outdoor environments, several novel future use-cases of our new GeoPose3K dataset are proposed.  相似文献   

15.
在自动驾驶、机器人、数字城市以及虚拟/混合现实等应用的驱动下,三维视觉得到了广泛的关注。三维视觉研究主要围绕深度图像获取、视觉定位与制图、三维建模及三维理解等任务而展开。本文围绕上述三维视觉任务,对国内外研究进展进行了综合评述和对比分析。首先,针对深度图像获取任务,从非端到端立体匹配、端到端立体匹配及无监督立体匹配3个方面对立体匹配研究进展进行了回顾,从深度回归网络和深度补全网络两个方面对单目深度估计研究进展进行了回顾。其次,针对视觉定位与制图任务,从端到端视觉定位和非端到端视觉定位两个方面对大场景下的视觉定位研究进展进行了回顾,并从视觉同步定位与地图构建和融合其他传感器的同步定位与地图构建两个方面对同步定位与地图构建的研究进展进行了回顾。再次,针对三维建模任务,从深度三维表征学习、深度三维生成模型、结构化表征学习与生成模型以及基于深度学习的三维重建等4个方面对三维几何建模研究进展进行了回顾,并从多视RGB重建、单深度相机和多深度相机方法以及单视图RGB方法等3个方面对人体动态建模研究进展进行了回顾。最后,针对三维理解任务,从点云语义分割和点云实例分割两个方面对点云语义理解研究进展进行了回顾。在此基础上,给出了三维视觉研究的未来发展趋势,旨在为相关研究者提供参考。  相似文献   

16.
三维重建技术常用于自动驾驶、机器人、无人机和增强现实等领域。视差估计是三维重建的关键步骤,随着数据集的增加、硬件和网络模型的发展,深度学习视差估计模型被广泛使用并取得良好效果。然而,这些方法常用室外场景的物体,很少使用在室内场景的数据集中。回顾了双目视差估计的深度学习方法,选用5种深度学习网络:PSMNet(pyramid stereo matching network)、GA-Net(guided aggregation network)、LEAStereo(hierarchical neural architecture search for deep stereo matching)、DeepPruner(learning efficient stereo matching via differentiable patchmatch)、BGNet(bilateral grid learning for stereo matching networks),将其运用在一套真实世界的街景数据集(KITTI2015)和两套室内场景数据集(Middlebury2014、Instereo2K...  相似文献   

17.
Place recognition is a core competency for any visual simultaneous localization and mapping system. Identifying previously visited places enables the creation of globally accurate maps, robust relocalization, and multi-user mapping. To match one place to another, most state-of-the-art approaches must decide a priori what constitutes a place, often in terms of how many consecutive views should overlap, or how many consecutive images should be considered together. Unfortunately, such threshold dependencies limit their generality to different types of scenes. In this paper, we present a placeless place recognition algorithm using a novel match-density estimation technique that avoids heuristically discretizing the space. Instead, our approach considers place recognition as a problem of continuous matching between image streams, automatically discovering regions of high match density that represent overlapping trajectory segments. The algorithm uses well-studied statistical tests to identify the relevant matching regions which are subsequently passed to an absolute pose algorithm to recover the geometric alignment. We demonstrate the efficiency and accuracy of our methodology on three outdoor sequences, including a comprehensive evaluation against ground-truth from publicly available datasets that shows our approach outperforms several state-of-the-art algorithms for place recognition. Furthermore we compare our overall algorithm to the currently best performing system for global localization and show how we outperform the approach on challenging indoor and outdoor datasets.  相似文献   

18.
Jia  Wei  Gao  Jian  Xia  Wei  Zhao  Yang  Min  Hai  Lu  Jing-Ting 《国际自动化与计算杂志》2021,18(1):18-44

Palmprint recognition and palm vein recognition are two emerging biometrics technologies. In the past two decades, many traditional methods have been proposed for palmprint recognition and palm vein recognition, and have achieved impressive results. However, the research on deep learning-based palmprint recognition and palm vein recognition is still very preliminary. In this paper, in order to investigate the problem of deep learning based 2D and 3D palmprint recognition and palm vein recognition in-depth, we conduct performance evaluation of seventeen representative and classic convolutional neural networks (CNNs) on one 3D palmprint database, five 2D palmprint databases and two palm vein databases. A lot of experiments have been carried out in the conditions of different network structures, different learning rates, and different numbers of network layers. We have also conducted experiments on both separate data mode and mixed data mode. Experimental results show that these classic CNNs can achieve promising recognition results, and the recognition performance of recently proposed CNNs is better. Particularly, among classic CNNs, one of the recently proposed classic CNNs, i.e., EfficientNet achieves the best recognition accuracy. However, the recognition performance of classic CNNs is still slightly worse than that of some traditional recognition methods.

  相似文献   

19.
Traditional algorithms to design hand-crafted features for action recognition have been a hot research area in the last decade. Compared to RGB video, depth sequence is more insensitive to lighting changes and more discriminative due to its capability to catch geometric information of object. Unlike many existing methods for action recognition which depend on well-designed features, this paper studies deep learning-based action recognition using depth sequences and the corresponding skeleton joint information. Firstly, we construct a 3D-based Deep Convolutional Neural Network (3D2CNN) to directly learn spatio-temporal features from raw depth sequences, then compute a joint based feature vector named JointVector for each sequence by taking into account the simple position and angle information between skeleton joints. Finally, support vector machine (SVM) classification results from 3D2CNN learned features and JointVector are fused to take action recognition. Experimental results demonstrate that our method can learn feature representation which is time-invariant and viewpoint-invariant from depth sequences. The proposed method achieves comparable results to the state-of-the-art methods on the UTKinect-Action3D dataset and achieves superior performance in comparison to baseline methods on the MSR-Action3D dataset. We further investigate the generalization of the trained model by transferring the learned features from one dataset (MSR-Action3D) to another dataset (UTKinect-Action3D) without retraining and obtain very promising classification accuracy.  相似文献   

20.
Liu  Liying  Si  Yain-Whar 《The Journal of supercomputing》2022,78(12):14191-14214

This paper proposes a novel deep learning-based approach for financial chart patterns classification. Convolutional neural networks (CNNs) have made notable achievements in image recognition and computer vision applications. These networks are usually based on two-dimensional convolutional neural networks (2D CNNs). In this paper, we describe the design and implementation of one-dimensional convolutional neural networks (1D CNNs) for the classification of chart patterns from financial time series. The proposed 1D CNN model is compared against support vector machine, extreme learning machine, long short-term memory, rule-based and dynamic time warping. Experimental results on synthetic datasets reveal that the accuracy of 1D CNN is highest among all the methods evaluated. Results on real datasets also reveal that chart patterns identified by 1D CNN are also the most recognized instances when they are compared to those classified by other methods.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号