首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the prevalence of accessible depth sensors, dynamic skeletons have attracted much attention as a robust modality for action recognition. Convolutional neural networks (CNNs) excel at modeling local relations within local receptive fields and are typically inefficient at capturing global relations. In this article, we first view the dynamic skeletons as a spatio-temporal graph (STG) and then learn the localized correlated features that generate the embedded nodes of the STG by message passing. To better extract global relational information, a novel model called spatial–temporal graph interaction networks (STG-INs) is proposed, which perform long-range temporal modeling of human body parts. In this model, human body parts are mapped to an interaction space where graph-based reasoning can be efficiently implemented via a graph convolutional network (GCN). After reasoning, global relation-aware features are distributed back to the embedded nodes of the STG. To evaluate our model, we conduct extensive experiments on three large-scale datasets. The experimental results demonstrate the effectiveness of our proposed model, which achieves the state-of-the-art performance.  相似文献   

2.
群组行为识别是对个体的共同行为进行识别。群组行为与群组状态密不可分,也与群组内个体时空特征息息相关,时空信息既能描述空间语义信息,更能反映行为的动态变化情况。针对有效精细的时空特征提取问题,本文提出了一种基于注意力机制和深度时空信息的群组行为识别方法。首先将ShuffleAttention引入双流特征提取网络中,有效提取个体外观和运动信息。其次使用改进Non-Local网络提取深度时序信息。最后将个体特征送到图卷积网络中进行空间交互信息建模,得到群组行为识别结果。在CAD和CAED数据集上的准确率达到了93.6%和97.8%,在CAD数据集上与凝聚群组搜索算法(CCS)和成员关系图(ARG)方法相比,准确率提高了1.2%和2.6%,这表明本文方法能有效提取深度时空特征,提升群组行为识别准确率。  相似文献   

3.
Learning-based shadow detection methods have achieved an impressive performance, while these works still struggle on complex scenes, especially ambiguous soft shadows. To tackle this issue, this work proposes an efficient shadow detection network (ESDNet) and then applies uncertainty analysis and graph convolutional networks for detection refinement. Specifically, we first aggregate global information from high-level features and harvest shadow details in low-level features for obtaining an initial prediction. Secondly, we analyze the uncertainty of our ESDNet for an input shadow image and then take its intensity, expectation, and entropy into account to formulate a semi-supervised graph learning problem. Finally, we solve this problem by training a graph convolution network to obtain the refined detection result for every training image. To evaluate our method, we conduct extensive experiments on several benchmark datasets, i.e., SBU, UCF, ISTD, and even on soft shadow scenes. Experimental results demonstrate that our strategy can improve shadow detection performance by suppressing the uncertainties of false positive and false negative regions, achieving state-of-the-art results.  相似文献   

4.
裴晓敏  范慧杰  唐延东 《红外与激光工程》2018,47(2):203007-0203007(6)
基于自然场景图像的人体行为识别方法中遮挡、背景干扰、光照不均匀等因素影响识别结果,利用人体三维骨架序列的行为识别方法可以克服上述缺点。首先,考虑人体行为的时空特性,提出一种时空特征融合深度学习网络人体骨架行为识别方法;其次,根据骨架几何特征建立视角不变性特征表示,CNN(Convolutional Neural Network)网络学习骨架的局部空域特征,作用于空域的LSTM(Long Short Term Memory)网络学习骨架空域节点之间的相关性特征,作用于时域的LSTM网络学习骨架序列时空关联性特征;最后,利用NTU RGB+D数据库验证文中算法。实验结果表明:算法识别精度有所提高,对于多视角骨架具有较强的鲁棒性。  相似文献   

5.
姜威  汪洋  尹晶  朱超然 《激光与红外》2023,53(12):1944-1952
使用少量样本进行学习和概括的能力是人工智能和人类之间主要的区别。在小样本学习领域,大多数图神经网络专注于将标记的样本信息传递给未标记的查询样本,而忽略了语义特征在分类过程中的重要作用。为此构建了语义特征传播图神经网络,首先将语义特征嵌入到图神经网络中,解决了细粒度图像特征相似性带来的分类准确率低的问题,然后将注意力机制与骨干网络合并达到强化前景并提高特征提取质量的目的,利用马氏距离计算类的相似度得到更好的分类性能,最后使用Funnel ReLU函数作为激活函数进一步提高分类准确率。在基准数据集上实验表明,所提算法相比于基线算法在5类1/2/5样本任务上的准确率分别提高了903%、456%和415%。  相似文献   

6.
Existing action recognition methods based on event cameras have not fully exploited the advantages of event cameras, such as compressing event streams into frames for subsequent calculation, which greatly sacrifices the time information of event streams. Meanwhile, the conventional Point Cloud-based methods suffer from large computational complexity while processing event data, which make it difficult to handle long-term actions. To tackle the above problems, we propose a dynamic graph memory-bo...  相似文献   

7.
Obstacle detection in single images is a challenging problem in autonomous navigation on low-cost condition. In this paper, we introduce an approach for obstacle detection in single images with deep neural networks. We propose the followings: (1) a deep model combined with other deep neural network for obstacle detection; (2) a method to segment obstacles and infer their depths. Among others, both local and global information are generated in our method for better classification and portability. Experiments are performed on the open datasets and images captured by our autonomous vehicle. The results show that our method is effective in both obstacle detection and depth inference.  相似文献   

8.
针对骨架行为识别对时空特征提取不充分以及难以捕捉全局上下文信息的问题,研究了一种将时空注意力机制和自适应图卷积网络相结合的人体骨架行为识别方案。首先,构建基于非局部操作的时空注意力模块,辅助模型关注骨架序列中最具判别性的帧和区域;其次,利用高斯嵌入函数和轻量级卷积神经网络的特征学习能力,并考虑人体先验知识在不同时期的影响,构建自适应图卷积网络;最后,将自适应图卷积网络作为基本框架,并嵌入时空注意力模块,与关节信息、骨骼信息以及各自的运动信息构建双流融合模型。该算法在NTU RGB+D数据集的两种评价标准下分别达到了90.2%和96.2%的准确率,在大规模的数据集Kinetics上体现出模型的通用性,验证了该算法在提取时空特征和捕捉全局上下文信息上的优越性。   相似文献   

9.
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. By naturally modeling the skeleton structure of the human body as a graph, GCNs are able to capture the spatial relationships between joints and learn an efficient representation of the underlying pose. However, most GCN-based methods use a shared weight matrix, making it challenging to accurately capture the different and complex relationships between joints. In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation, which aims to predict the 3D joint positions given a set of 2D joint locations in images. Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization via the Gauss–Seidel iterative method. Motivated by this iterative solution, we design a Gauss–Seidel network (GS-Net) architecture, which makes use of weight and adjacency modulation, skip connection, and a pure convolutional block with layer normalization. Adjacency modulation facilitates the learning of edges that go beyond the inherent connections of body joints, resulting in an adjusted graph structure that reflects the human skeleton, while skip connections help maintain crucial information from the input layer’s initial features as the network depth increases. We evaluate our proposed model on two standard benchmark datasets, and compare it with a comprehensive set of strong baseline methods for 3D human pose estimation. Our experimental results demonstrate that our approach outperforms the baseline methods on both datasets, achieving state-of-the-art performance. Furthermore, we conduct ablation studies to analyze the contributions of different components of our model architecture and show that the skip connection and adjacency modulation help improve the model performance.  相似文献   

10.
In this paper, we focus on recognizing person-person interactions using skeletal data captured from depth sensors. First, we propose a novel and efficient view transformation scheme. The skeletal interaction sequence is re-observed under a new coordinate system, which is invariant to various setups and capturing views of depth cameras as well as the position or facing orientation exchange between two persons. Second, we propose concise and discriminative interaction representations simply composed of the joint locations from two persons. Proposed representations are efficient to describe both the holistic interactive scene and individual poses performed by each subject separately. Third, we introduce the graph convolutional networks(GCN) to directly learn proposed skeletal interaction representations. Moreover, we design a multiple GCN-based model to provide the final class score. Extensive experimental results on three skeletal action datasets NTU RGB+D 60, NTU RGB+D 120 and SBU consistently demonstrate the superiority of our interaction recognition method.  相似文献   

11.
Driver distraction has currently been a global issue causing the dramatic increase of road accidents and casualties. However, recognizing distracted driving action remains a challenging task in the field of computer vision, since inter-class variations between different driver action categories are quite subtle. To overcome this difficulty, in this paper, a novel deep learning based approach is proposed to extract fine-grained feature representation for image-based driver action recognition. Specifically, we improve the existing convolutional neural network from two aspects: (1) we employ multi-scale convolutional block with different receptive fields of kernel sizes to generate hierarchical feature map and adopt maximum selection unit to adaptively combine multi-scale information; (2) we incorporate an attention mechanism to learn pixel saliency and channel saliency between convolutional features so that it can guide the network to intensify local detail information and suppress global background information. For experiment, we evaluate the designed architecture on multiple driver action datasets. The quantitative experiment result shows that the proposed multi-scale attention convolutional neural network (MSA-CNN) obtains the state of the art performance in image-based driver action recognition.  相似文献   

12.
In this paper we propose a novel deep spatial transformer convolutional neural network (Spatial Net) framework for the detection of salient and abnormal areas in images. The proposed method is general and has three main parts: (1) context information in the image is captured by using convolutional neural networks (CNN) to automatically learn high-level features; (2) to better adapt the CNN model to the saliency task, we redesign the feature sub-network structure to output a 6-dimensional transformation matrix for affine transformation based on the spatial transformer network. Several local features are extracted, which can effectively capture edge pixels in the salient area, meanwhile embedded into the above model to reduce the impact of highlighting background regions; (3) finally, areas of interest are detected by means of the linear combination of global and local feature information. Experimental results demonstrate that Spatial Nets obtain superior detection performance over state-of-the-art algorithms on two popular datasets, requiring less memory and computation to achieve high performance.  相似文献   

13.
近年来,基于骨架的人体动作识别任务因骨架数据的鲁棒性和泛化能力而受到了广泛关注。其中,将人体骨骼建模为时空图的图卷积网络取得了显著的性能。然而图卷积主要通过一系列3D卷积来学习长期交互联系,这种联系偏向于局部并且受到卷积核大小的限制,无法有效地捕获远程依赖关系。该文提出一种协作卷积Transformer网络(Co-ConvT),通过引入Transformer中的自注意力机制建立远程依赖关系,并将其与图卷积神经网络(GCNs)相结合进行动作识别,使模型既能通过图卷积神经网络提取局部信息,也能通过Transformer捕获丰富的远程依赖项。另外,Transformer的自注意力机制在像素级进行计算,因此产生了极大的计算代价,该模型通过将整个网络分为两个阶段,第1阶段使用纯卷积来提取浅层空间特征,第2阶段使用所提出的ConvT块捕获高层语义信息,降低了计算复杂度。此外,原始Transformer中的线性嵌入被替换为卷积嵌入,获得局部空间信息增强,并由此去除了原始模型中的位置编码,使模型更轻量。在两个大规模权威数据集NTU-RGB+D和Kinetics-Skeleton上进行实验验证,该模型分...  相似文献   

14.
In this paper, we propose a new approach for signal detection in wireless digital communications based on the neural network with transient chaos and time-varying gain (NNTCTG), and give a concrete model of the signal detector after appropriate transformations and mappings. It is well known that the problem of the maximum likelihood signal detection can be described as a complex optimization problem that has so many local optima that conventional Hopfield-type neural networks fail to solve. By refraining from the serious local optima problem of Hopfield-type neural networks, the NNTCTG makes use of the time-varying parameters of the recurrent neural network to control the evolving behavior of the network so that the network undergoes the transition from chaotic behavior to gradient convergence. It has richer and more flexible dynamics rather than conventional neural networks only with point attractors, so that it can be expected to have much ability to search for globally optimal or near-optimal solutions. After going through a transiently inverse-bifurcation process, the NNTCTG can approach the global optimum or the neighborhood of global optimum of our problem. Simulation experiments have been performed to show the effectiveness and validation of the proposed neural network based method for the signal detection in digital communications.  相似文献   

15.
Bag-of-words models have been widely used to obtain the global representation for action recognition. However, these models ignored the structure information, such as the spatial and temporal contextual information for action representation. In this paper, we propose a novel structured codebook construction method to encode spatial and temporal contextual information among local features for video representation. Given a set of training videos, our method first extracts local motion and appearance features. Next, we encode the spatial and temporal contextual information among local features by constructing correlation matrices for local spatio-temporal features. Then, we discover the common patterns of movements to construct the structured codebook. After that, actions can be represented by a set of sparse coefficients with respect to the structured codebook. Finally, a simple linear SVM classifier is applied to predict the action class based on the action representation. Our method has two main advantages compared to traditional methods. First, our method automatically discovers the mid-level common patterns of movements that capture rich spatial and temporal contextual information. Second, our method is robust to unwanted background local features mainly because most unwanted background local features cannot be sparsely represented by the common patterns and they are treated as residual errors that are not encoded into the action representation. We evaluate the proposed method on two popular benchmarks: KTH action dataset and UCF sports dataset. Experimental results demonstrate the advantages of our structured codebook construction.  相似文献   

16.
针对传统卷积神经网络(convolutional neural network, CNN)受感受野大小的限制,无法直接有效地获取空间结构及全局语义等关键信息,导致宽血管边界及毛细血管区域特征提取困难,造成视网膜血管分割表现不佳的问题,提出一种基于图卷积的视网膜血管分割细化框架。该框架通过轮廓提取及不确定分析方法,选取CNN粗分割结果中潜在的误分割区域,并结合其提取的特征信息构造出合适的图数据,送入残差图卷积网络(residual graph convolutional network, Res-GCN)二次分类,得到视网膜血管细化分割结果。该框架可以作为一个即插即用模块接入任意视网膜血管分割网络的末端,具有高移植性和易用性的特点。实验分别选用U型网络(U-neural network, U-Net)及其代表性改进网络DenseU-Net和AttU-Net作为基准网络,在DRIVE、STARE和CHASEDB1数据集上进行测试,本文框架的Sp分别为98.28%、99.10%和99.04%,Pr分别为87.97%、88.87%和90.25%,证明其具有提升基准网络分割效果的细化能力。  相似文献   

17.
In the action recognition, a proper frame sampling method can not only reduce redundant video information, but also improve the accuracy of action recognition. In this paper, an action density based non-isometric frame sampling method, namely NFS, is proposed to discard the redundant video information and sample the rational frames in videos for neural networks to achieve great accuracy on human action recognition, in which action density is introduced in our method to indicate the intensity of actions in videos. Particularly, the action density determination mechanism, focused-clips division mechanism, and reinforcement learning based frame sampling (RLFS) mechanism are proposed in NFS method. Via the evaluations with various neural networks and datasets, our results show that the proposed NFS method can achieve great effectiveness in frame sampling and can assist in achieving better accuracy on action recognition in comparison with existing methods.  相似文献   

18.
Over the past few years, skeleton-based action recognition has attracted great success because the skeleton data is immune to illumination variation, view-point variation, background clutter, scaling, and camera motion. However, effective modeling of the latent information of skeleton data is still a challenging problem. Therefore, in this paper, we propose a novel idea of action embedding with a self-attention Transformer network for skeleton-based action recognition. Our proposed technology mainly comprises of two modules as, (i) action embedding and (ii) self-attention Transformer. The action embedding encodes the relationship between corresponding body joints (e.g., joints of both hands move together for performing clapping action) and thus captures the spatial features of joints. Meanwhile, temporal features and dependencies of body joints are modeled using Transformer architecture. Our method works in a single-stream (end-to-end) fashion, where multiple-layer perceptron (MLP) is used for classification. We carry out an ablation study and evaluate the performance of our model on a small-scale SYSU-3D dataset and large-scale NTU-RGB+D and NTU-RGB+D 120 datasets where the results establish that our method performs better than other state-of-the-art architectures.  相似文献   

19.
In this paper, a new hierarchical approach for object detection is proposed. Object detection methods based on Implicit Shape Model (ISM) efficiently handle deformable objects, occlusions and clutters. The structure of each object in ISM is defined by a spring like graph. We introduce hierarchical ISM in which structure of each object is defined by a hierarchical star graph. Hierarchical ISM has two layers. In the first layer, a set of local ISMs are used to model object parts. In the second layer, structure of parts with respect to the object center is modeled by global ISM. In the proposed approach, the obtained parts for each object category have high discriminative ability. Therefore, our approach does not require a verification stage. We applied the proposed approach to some datasets and compared the performance of our algorithm to comparable methods. The results show that our method has a superior performance.  相似文献   

20.
常用的异质信息网络有知识图谱和具有简单模式层的异质信息网络,它们的表示学习通常遵循不同的方法。该文总结了知识图谱和具有简单模式层的异质信息网络之间的异同,提出了一个通用的异质信息网络表示学习框架。该文提出的框架可以分为3个部分:基础向量模型,基于图注意力网络的传播模型以及任务模型。基础向量模型用于学习基础的网络向量;传播模型通过堆叠注意力层学习网络的高阶邻居特征;可更换的任务模型适用于不同的应用场景。与基准模型相比,该文所提框架在知识图谱的链接预测任务和异质信息网络的节点分类任务中都取得了相对不错的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号