首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
特殊环境下道路目标的三维感知对汽车的全天时、全气候自动驾驶具有重要意义,红外双目视觉模仿人眼实现微光/无光等特殊环境下目标的立体感知,目标检测与匹配是双目视觉立体感知的关键技术。针对当前分步实现目标检测与目标匹配的过程冗杂问题,提出了一个可以同步检测与匹配红外目标的深度学习网络SODMNet(Synchronous Object Detection and Matching Network)。SODMNet创新地融合了目标检测网络和目标匹配模块,以目标检测网络为主要架构,取其分类与回归分支深层特征为目标匹配模块的输入,与特征图相对位置编码拼接后通过卷积网络输出左右图像特征描述子,根据特征描述子之间的欧氏距离得到目标匹配结果,实现双目视觉目标检测与匹配。与此同时,采集并制作了一个包含人、车辆等标注目标的夜间红外双目数据集。实验表明,SODMNet在该红外双目数据集上的目标检测精度mAP(Mean Average Precision)提升84.9%以上,同时目标匹配精度AP(Average Precision)达到0.5777。结果证明,SODMNet能够高精度地同步实现红外双目目标检测与匹配。  相似文献   

2.
吴建耀  程树英  郑茜颖 《半导体光电》2019,40(3):428-432, 437
针对DSOD目标检测算法对小目标检测能力较弱的问题,提出在DSOD中引入RFB_a网络模块和Atrous卷积层予以改进。首先,该算法将DSOD网络的第二个转接层产生的特征图输入到RFB_a网络模块中,经过RFB_a网络不同采样步长的Atrous卷积提取具有不同感受野的特征,为后续检测小目标步骤提供所需特征;其次,为了增加特征图的语义信息,在第二个无池化转接层后加入采样步长为6的Atrous卷积层;最后,在损失函数中加入IOG惩罚项,防止在预测密集的同类型目标时出现同类预测框重叠,从而避免在NMS后处理时出现漏检。实验表明,该算法相对于原DSOD算法具有更高的检测精度,提高了对小目标的检测能力,同时降低了训练网络的硬件设备要求。  相似文献   

3.
    
Breast cancer is the most common cancer among women worldwide.Ultrasound is widely used as a harmless test for early breast cancer screening.The ultrasound network(USNet) model is presented.It is an improved object detection model specifically for breast nodule detection on ultrasound images.USNet improved the backbone network,optimized the generation of feature maps,and adjusted the loss function.Finally,USNet trained with real clinical data.The evaluation results show that the trained model has strong nodule detection ability.The mean average precision(mAP) value can reach 0.734 9.The nodule detection rate is 95.11%,and the in situ cancer detection rate is 79.65%.At the same time,detection speed can reach 27.3 frame per second(FPS),and the video data can be processed in real time.  相似文献   

4.
    
Over the past few years, skeleton-based action recognition has attracted great success because the skeleton data is immune to illumination variation, view-point variation, background clutter, scaling, and camera motion. However, effective modeling of the latent information of skeleton data is still a challenging problem. Therefore, in this paper, we propose a novel idea of action embedding with a self-attention Transformer network for skeleton-based action recognition. Our proposed technology mainly comprises of two modules as, (i) action embedding and (ii) self-attention Transformer. The action embedding encodes the relationship between corresponding body joints (e.g., joints of both hands move together for performing clapping action) and thus captures the spatial features of joints. Meanwhile, temporal features and dependencies of body joints are modeled using Transformer architecture. Our method works in a single-stream (end-to-end) fashion, where multiple-layer perceptron (MLP) is used for classification. We carry out an ablation study and evaluate the performance of our model on a small-scale SYSU-3D dataset and large-scale NTU-RGB+D and NTU-RGB+D 120 datasets where the results establish that our method performs better than other state-of-the-art architectures.  相似文献   

5.
    
LiDAR-based 3D object detection is important for autonomous driving scene perception, but point clouds produced by LiDAR are irregular and unstructured in nature, and cannot be adopted by the conventional Convolutional Neural Networks (CNN). Recently, Graph Convolutional Networks (GCN) has been proved as an ideal way to handle non-Euclidean structure data, as well as for point cloud processing. However, GCN involves massive computation for searching adjacent nodes, and the heavy computational cost limits its applications in processing large-scale LiDAR point cloud in autonomous driving. In this work, we adopt a frustum-based point cloud-image fusion scheme to reduce the amount of LiDAR point clouds, thus making the GCN-based large-scale LiDAR point clouds feature learning feasible. On this basis, we propose an efficient graph attentional network to accomplish the goal of 3D object detection in autonomous driving, which can learn features from raw LiDAR point cloud directly without any conversions. We evaluate the model on the public KITTI benchmark dataset, the 3D detection mAP is 63.72% on KITTI Cars, Pedestrian and Cyclists, and the inference speed achieves 7.9 fps on a single GPU, which is faster than other methods of the same type.  相似文献   

6.
目前基于深度卷积神经网络的显著性物体检测方法难以在非欧氏空间不规则结构数据中应用,在复杂视觉场景中易造成显著物体边缘及结构等高频信息损失,影响检测性能。为此,该文面向显著性物体检测任务提出一种端到端的多图神经网络协同学习框架,实现显著性边缘特征与显著性区域特征协同学习的过程。在该学习框架中,该文构造了一种动态信息增强图卷积算子,通过增强不同图节点之间和同一图节点内不同通道之间的信息传递,捕获非欧氏空间全局上下文结构信息,完成显著性边缘信息与显著性区域信息的充分挖掘;进一步地,通过引入注意力感知融合模块,实现显著性边缘信息与显著性区域信息的互补融合,为两种信息挖掘过程提供互补线索。最后,通过显式编码显著性边缘信息,指导显著性区域的特征学习,从而更加精准地定位复杂场景下的显著性区域。在4个公开的基准测试数据集上的实验表明,所提方法优于目前主流的基于深度卷积神经网络的显著性物体检测方法,具有较强的鲁棒性和泛化能力。  相似文献   

7.
    
In order to solve the challenging tasks of person re-identification(Re-ID) in occluded scenarios, we propose a novel approach which divides local units by forming high-level semantic information of pedestrians and generates features of occluded parts. The approach uses CNN and pose estimation to extract the feature map and key points, and a graph convolutional network to learn the relation of key points. Specifically, we design a Generating Local Part (GLP) module to divide the feature map into different units. Based on different occluded conditions, the partition mode of GLP has high flexibility and variability. The features of the non-occluded parts are clustered into an intermediate node, and then the spatially correlated features of the occluded parts are generated according to the de-clustering operation. We conduct experiments on both the occluded and the holistic datasets to demonstrate its effectiveness.  相似文献   

8.
针对手部的高自由度问题和结构相似问题引起的三维关键点姿态估计误差,本文提出了一套联合识别、检测以及姿态估计的三维手部骨架姿态回归网络。采用基于YOLOv3的预处理网络,提出基于级联多特征热度图的二维和三维关键点检测网络,并在特征提取网络架构中引入人体骨架手部约束,利用渐进的图卷积神经网络特征增强模块对骨架关键点结果进行进一步精细化修正,完成姿态由粗到细的调整。本文与现有多种算法在不同公共数据集下进行PCK指标和AUC指标比较,本文算法在不同测试集上的AUC指标均达到最高,平均AUC精度达到92.9%。实验表明本文方法可以通过单张二维数据准确、细致地估计三维手部姿态,并且在测试集与自然场景下均有较好表现。  相似文献   

9.
10.
    
Emotion recognition in conversations (ERC) has gained increasing research attention in recent years due to its wide applications in a surge of emerging tasks, such as social media analysis, dialog generation, and recommender systems. Since constituent utterances in a conversation are closely semantic-related, the constituent utterances’ emotional states are also closely related. In our consideration, this correlation could serve as a guide for the emotion recognition of constituent utterances. Accordingly, we propose a novel approach named Semantic-correlation Graph Convolutional Network (SC-GCN) to take advantage of this correlation for the ERC task in multimodal scenario. Specifically, we first introduce a hierarchical fusion module to model the dynamics among the textual, acoustic and visual features and fuse the multimodal information. Afterward, we construct a graph structure based on the speaker and temporal dependency of the dialog. We put forward a novel multi-loop architecture to explore the semantic correlations by the self-attention mechanism and enhance the correlation information via multiple loops. Through the graph convolution process, the proposed SC-GCN finally obtains a refined representation of each utterance, which is used for the final prediction. Extensive experiments are conducted on two benchmark datasets and the experimental results demonstrate the superiority of our SC-GCN.  相似文献   

11.
陈国平  程秋菊  黄超意  周围  王璐 《电讯技术》2019,59(10):1121-1126
通过收集大量的毫米波图像并建立相应的人体数据集进行检测,提出基于Faster R-CNN深度学习的方法检测隐藏于人体上的危险物品。该方法将区域建议网络和VGG19训练卷积神经网络模型相结合,构建了面向毫米波图像目标检测的深度卷积神经网络。为了提高毫米波图像的处理能力,采用Caffe深度学习框架在图形处理单元上进行训练和测试。实验结果证明了基于Faster R-CNN深度卷积神经网络的目标检测方法能有效检测毫米波图像中的危险物品,并且目标检测的平均准确率约94%,检测速度约为6 frame/s,对毫米波安检系统的智能化发展有着极其重要的参考价值。  相似文献   

12.
    
Most object detection methods use a horizontal bounding box that causes problems between adjacent objects with arbitrary directions, resulting in misaligned detection. Hence, the horizontal anchor should be replaced by a rotating anchor to determine oriented bounding boxes. A two-stage process of delineating a horizontal bounding box and then converting it into an oriented bounding box is inefficient. To improve detection, a box-boundary-aware vector can be estimated based on a convolutional neural network. Specifically, we propose a ResNeXt101 encoder to overcome the weaknesses of the conventional ResNet, which is less effective as the network depth and complexity increase. Owing to the cardinality of using a homogeneous design and multibranch architecture with few hyperparameters, ResNeXt captures better information than ResNet. Experimental results demonstrate more accurate and faster oriented object detection of our proposal compared with a baseline, achieving a mean average precision of 89.41% and inference rate of 23.67 fps.  相似文献   

13.
为了提升无人驾驶汽车对于外界环境感知的能力, 本文提出了一种级联式神经网络框架对虚拟环境中的路标进行检测与分类。该框架将添加了 辅助结构的全卷积神经网络与改进后的经典LeNet-5网络进行 组合,在处理所提取出的路标区域边缘不平整以及产生杂项问题上使用传统的腐蚀膨胀开运 算图像处理算 子进行优化和解决,实现虚拟道路图像中雨雪等多种情况下的多类路标进行定位与识别。通 过与经典的不 变矩特征、ORB全局特征提取方法,以及YOLO,SSD人工智能方法对比试验表明,本文所提出 方法具备检测准确度高,运算速度快的优势。  相似文献   

14.
时文华  张雄伟  邹霞  孙蒙 《信号处理》2019,35(4):631-640
针对传统的神经网络未能对时频域的相关性充分利用的问题,提出了一种利用深度全卷积编解码神经网络的单通道语音增强方法。在编码端,通过卷积层的卷积操作对带噪语音的时频表示逐级提取特征,在得到目标语音高级特征表示的同时逐层抑制背景噪声。解码端和编码端在结构上对称,在解码端,对编码端获得的高级特征表示进行反卷积、上采样操作,逐层恢复目标语音。跳跃连接可以很好地解决极深网络中训练时存在的梯度弥散问题,本文在编解码端的对应层之间引入跳跃连接,将编码端特征图信息传递到对应的解码端,有利于更好地恢复目标语音的细节特征。对特征融合和特征拼接两种跳跃连接方式、L1和L2两种训练损失函数对语音增强性能的影响进行了研究,通过实验验证所提方法的有效性。  相似文献   

15.
         下载免费PDF全文
徐涛  杨克成  夏珉  李微  郭文平 《激光与红外》2017,47(10):1321-1324
基于水下距离选通激光成像技术,提出了一种可用于长距离下的水下线状目标检测算法。该算法针对水下成像中低对比度、模糊和噪声等特性,首先采用对比度拉升、中值滤波、小波变换等方法对图像进行增强处理;然后利用Canny边缘检测算子提取出目标的边缘特征;最后针对边缘特征中出现的噪声边缘问题,选用了鲁棒性强的随机抽样一致性参数估计算法从边缘特征中检测出线状目标,并计算得到目标的位置和方向等相关参数。实验结果表明,该算法可以有效地检测出水下曲线状目标,弥补现有方法只能检测直线目标的不足,检测率可以达到93%,有效检测距离能达到5倍水下衰减长度。  相似文献   

16.
基于毫米波图像的隐匿物检测技术在无接触式人体安检中具有重要意义。目前,毫米波设备已实现三维成像,但隐匿物检测算法通常将其简单压缩为二维图像进行目标检测,未能充分利用图像深度方向的信息。针对这一问题,提出一种毫米波图像隐匿物检测框架,将三维图像视为截面序列并充分利用其截面内特征沿序列(即深度方向)的内在逻辑关系。该框架由卷积神经网络与长短时记忆网络构成,前者用于提取截面的粗细粒度特征,后者用于提取上述特征沿深度方向的全局关联性,实现特征级信息融合,从而提高隐匿物二维定位准确率。实验结果表明,与现有主流毫米波图像隐匿物检测方法相比,所提模型能大幅提高检测精度。  相似文献   

17.
知识图谱作为辅助信息可以有效缓解传统推荐模型的冷启动问题。但在提取结构化信息时,现有模型都忽略了图谱中实体之间的邻居关系。针对这一问题,该文提出一种基于共同邻居排序采样的知识图谱卷积网络(KGCN-PN)推荐模型,该模型首先基于共同邻居数目对知识图谱中的每个实体邻域进行排序采样;其次利用图卷积神经网络沿着图谱中的关系路径将实体自身信息与接收域信息逐层融合;最后将用户特征向量与融合得到的实体特征向量送入预测函数中预测用户与实体项目交互的概率。实验结果表明该模型在数据稀疏场景下相较其他基线模型性能均获得了相应提升。  相似文献   

18.
    
Motivated by the powerful capability of deep neural networks in feature learning, a new graph-based neural network is proposed to learn local and global relational information on skeleton sequences represented as spatio-temporal graphs (STGs). The pipeline of our network architecture consists of three main stages. As the first stage, spatial–temporal sub-graphs (sub-STGs) are projected into a latent space in which every point is represented as a linear subspace. The second stage is based on message passing to acquire the localized correlated features of the nodes in the latent space. The third stage relies on graph convolutional networks (GCNs) to reason the long-range spatio-temporal dependencies through a graph representation of the latent space. Finally, the average pooling layer and the softmax classifier are then employed to predict the action categories based on the extracted local and global correlations. We validate our model in terms of action recognition using three challenging datasets: the NTU RGB+D, Kinetics Motion, and SBU Kinect Interaction datasets. The experimental results demonstrate the effectiveness of our approach and show that our proposed model outperforms the state-of-the-art methods.  相似文献   

19.
20.
基于全卷积网络的图像显著性检测获得了广泛的关注,并取得了令人瞩目的检测性能.然而,该类型神经网络依然存在许多问题,如高复杂网络导致难以训练、显著性对象边缘结果不准确等.针对这些问题,本文提出基于Gabor初始化的卷积神经网络.该网络主要特点包括:1)利用Gabor特征初始化卷积神经网络,提高神经网络训练效率;2)构建多...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号