首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
张昱彤  翟旭平  聂宏 《红外技术》2022,44(3):286-293
近年来动作识别成为计算机视觉领域的研究热点,不同于针对视频图像进行的研究,本文针对低分辨率红外传感器采集到的温度数据,提出了一种基于此类红外传感器的双流卷积神经网络动作识别方法.空间和时间数据分别以原始温度值的形式同时输入改进的双流卷积神经网络中,最终将空间流网络和时间流网络的概率矢量进行加权融合,得到最终的动作类别....  相似文献   

2.
刘强  张文英  陈恩庆 《信号处理》2020,36(9):1422-1428
人体动作识别在人机交互、视频内容检索等领域有众多应用,是多媒体信息处理的重要研究方向。现有的大多数基于双流网络进行动作识别的方法都是在双流上使用相同的卷积网络去处理RGB与光流数据,缺乏对多模态信息的利用,容易造成网络冗余和相似性动作误判问题。近年来,深度视频也越来越多地用于动作识别,但是大多数方法只关注了深度视频中动作的空间信息,没有利用时间信息。为了解决这些问题,本文提出一种基于异构多流网络的多模态动作识别方法。该方法首先从深度视频中获取动作的时间特征表示,即深度光流数据,然后选择合适的异构网络来进行动作的时空特征提取与分类,最后对RGB数据、RGB中提取的光流、深度视频和深度光流识别结果进行多模态融合。通过在国际通用的大型动作识别数据集NTU RGB+D上进行的实验表明,所提方法的识别性能要优于现有较先进方法的性能。   相似文献   

3.
针对动作特征在卷积神经网络模型传输时的损失问题以及网络模型过拟合的问题,该文提出一种跨层融合模型和多个模型投票的动作识别方法。在预处理阶段,借助排序池化的方法聚集视频中的运动信息,生成近似动态图像。在全连接层前设置对特征信息进行水平翻转结构,构成无融合模型。在无融合模型的基础上添加第2层的输出特征与第5层的输出特征融合结构,构造成跨层融合模型。训练时,对无融合模型和跨层融合模型两种基本模型采用3种数据划分方式以及两种生成近似动态图像顺序进行训练,得到多个不同的分类器。测试时使用多个分类器进行预测,对它们得到的结果进行投票集成,作为最终分类结果。在UCF101数据集上,提出的无融合模型和跨层融合模型的识别方法与动态图像网络模型的方法相比,识别率有较大提高;多模型投票的识别方法能有效缓解模型的过拟合现象,增加算法的鲁棒性,得到更好的平均性能。  相似文献   

4.
In video-based action recognition, using videos with different frame numbers to train a two-stream network can result in data skew problems. Moreover, extracting the key frames from a video is crucial for improving the training and recognition efficiency of action recognition systems. However, previous works suffer from problems of information loss and optical-flow interference when handling videos with different frame numbers. In this paper, an augmented two-stream network (ATSNet) is proposed to achieve robust action recognition. A frame-number-unified strategy is first incorporated into the temporal stream network to unify the frame numbers of videos. Subsequently, the grayscale statistics of the optical-flow images are extracted to filter out any invalid optical-flow images and produce the dynamic fusion weights for the two branch networks to adapt to different action videos. Experiments conducted on the UCF101 dataset demonstrate that ATSNet outperforms previously defined methods, improving the recognition accuracy by 1.13%.  相似文献   

5.
Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe the difference between living face and fraudulent face. But these handmade features do not apply to different variations in an unconstrained environment. The convolutional neural network (CNN) for face deceptions achieves considerable results. However, most existing neural network-based methods simply use neural networks to extract single-scale features from single-modal data, while ignoring multi-scale and multi-modal information. To address this problem, a novel face anti-spoofing method based on multi-modal and multi-scale features fusion ( MMFF) is proposed. Specifically, first residual network ( Resnet )-34 is adopted to extract features of different scales from each modality, then these features of different scales are fused by feature pyramid network (FPN), finally squeeze-and-excitation fusion ( SEF) module and self-attention network ( SAN) are combined to fuse features from different modalities for classification. Experiments on the CASIA-SURF dataset show that the new method based on MMFF achieves better performance compared with most existing methods.  相似文献   

6.
针对直接利用卷积自编码网络未考虑视频时间信息的问题,该文提出基于贝叶斯融合的时空流异常行为检测模型。空间流模型采用卷积自编码网络对视频单帧进行重构,时间流模型采用卷积长短期记忆(LSTM)编码-解码网络对短期光流序列进行重构。接着,分别计算空间流模型和时间流模型下每帧的重构误差,设计自适应阈值对重构误差图进行二值化,并基于贝叶斯准则对空间流和时间流下的重构误差进行融合,得到融合重构误差图,并在此基础上进行异常行为判断。实验结果表明,该算法在UCSD和Avenue视频库上的检测效果优于现有异常检测算法。  相似文献   

7.
康书宁  张良 《信号处理》2020,36(11):1897-1905
基于深度学习的人体动作识别近几年取得了良好的识别效果,尤其是二维卷积神经网络可以较充分的学习人体动作的空间特征,但在捕获长时间的运动信息上仍存在问题。针对此问题,提出了基于语义特征立方体切片的人体动作识别模型来联合地学习动作的表观和运动特征。该模型在时序分割网络(Temporal Segment Networks,TSN)的基础上,选取InceptionV4作为骨干网络提取人体动作的表观特征,将得到的三维特征图立方体分为二维的空间上和时间上的特征图切片。另外设计一个时空特征融合模块协同的学习多维度切片的权重分配,从而得到人体动作的时空特征,由此实现了网络的端到端训练。与TSN模型相比,该模型在UCF101和 HMDB51数据集上的准确率均有所提升。实验结果表明,该模型在不显著增加网络参数量的前提下,能够捕获更丰富的运动信息,使人体动作的识别结果提高。   相似文献   

8.
裴晓敏  范慧杰  唐延东 《红外与激光工程》2018,47(2):203007-0203007(6)
基于自然场景图像的人体行为识别方法中遮挡、背景干扰、光照不均匀等因素影响识别结果,利用人体三维骨架序列的行为识别方法可以克服上述缺点。首先,考虑人体行为的时空特性,提出一种时空特征融合深度学习网络人体骨架行为识别方法;其次,根据骨架几何特征建立视角不变性特征表示,CNN(Convolutional Neural Network)网络学习骨架的局部空域特征,作用于空域的LSTM(Long Short Term Memory)网络学习骨架空域节点之间的相关性特征,作用于时域的LSTM网络学习骨架序列时空关联性特征;最后,利用NTU RGB+D数据库验证文中算法。实验结果表明:算法识别精度有所提高,对于多视角骨架具有较强的鲁棒性。  相似文献   

9.
卷积神经网络在高级计算机视觉任务中展现出强 大的特征学习能力,已经在图像语义 分割任务 中取得了显著的效果。然而,如何有效地利用多尺度的特征信息一直是个难点。本文提出一 种有效 融合多尺度特征的图像语义分割方法。该方法包含4个基础模块,分别为特征融合模块(feature fusion module,FFM)、空 间信息 模块(spatial information module,SIM)、全局池化模块(global pooling module,GPM)和边界细化模块(boundary refinement module,BRM)。FFM采用了注意力机制和残差结构,以提高 融合多 尺度特征的效率,SIM由卷积和平均池化组成,为模型提供额外的空间细节信息以 辅助定 位对象的边缘信息,GPM提取图像的全局信息,能够显著提高模型的性能,BRM以残差结构为核心,对特征图进行边界细化。本文在全卷积神经网络中添加4个基础模块, 从而有 效地利用多尺度的特征信息。在PASCAL VOC 2012数据集上的实验结 果表明该方法相比全卷积神 经网络的平均交并比提高了8.7%,在同一框架下与其他方法的对比结 果也验证了其性能的有效性。  相似文献   

10.
As a challenging task of video classification, action recognition has become a significant topic of computer vision community. The most popular methods based on two-stream architecture up to now are still simply fusing the prediction scores of each stream. In that case, the complementary characteristics of two streams cannot be fully utilized and the effect of shallower features is often overlooked. In addition, the equal treatment to features may weaken the role of the feature contributing significantly to the classification. Accordingly, a novel network called Multiple Depth-levels Features Fusion Enhanced Network (MDFFEN) is proposed. It improves on two aspects of two-stream architecture. In terms of the two-stream interaction mechanism, multiple depth-levels features fusion (MDFF) is formed to aggregate spatial–temporal features extracted from several sub-modules of original two streams by spatial–temporal features fusion (STFF). And with respect to further refining the spatiotemporal features, we propose a group-wise spatial-channel enhance (GSCE) module to highlight the meaningful regions and expressive channels automatically by priority assignment. The competitive results are achieved after we validate MDFFEN on three public challenging action recognition datasets, HDMB51, UCF101 and ChaLearn LAP IsoGD.  相似文献   

11.
该文受人脑视觉感知机理启发,在深度学习框架下提出融合时空双网络流和视觉注意的行为识别方法。首先,采用由粗到细Lucas-Kanade估计法逐帧提取视频中人体运动的光流特征。然后,利用预训练模型微调的GoogLeNet神经网络分别逐层卷积并聚合给定时间窗口视频中外观图像和相应光流特征。接着,利用长短时记忆多层递归网络交叉感知即得含高层显著结构的时空流语义特征序列;解码时间窗口内互相依赖的隐状态;输出空间流视觉特征描述和视频窗口中每帧标签概率分布。其次,利用相对熵计算时间维每帧注意力置信度,并融合空间网络流感知序列标签概率分布。最后,利用softmax分类视频中行为类别。实验结果表明,与其他现有方法相比,该文行为识别方法在分类准确度上具有显著优势。  相似文献   

12.
Video-based person re-identification (Re-ID) is of important capability for artificial intelligence and human–computer interaction. The spatial and temporal features play indispensable roles to comprehensively represent the person sequences. In this paper, we propose a comprehensive feature fusion mechanism (CFFM) for video-based Re-ID. We use multiple significance-aware attention to learn attention-based spatial–temporal feature fusion to better represent the person sequences. Specifically, CFFM consists of spatial attention, periodic attention, significance attention and residual learning. The spatial attention and periodic attention aim to respectively make the system focus on more useful spatial feature extracted by CNN and temporal feature extracted by the recurrent networks. The significance attention is to measure the two features that contribute to the sequence representation. Then the residual learning plays between the spatial and temporal features based on the significance scores for final significance-aware feature fusion. We apply our approach to different representative state-of-the-art networks, proposing several improved networks for improving the video-based Re-ID task. We conduct extensive experimental results on the widely utilized datasets, PRID-2011, i-LIDS-VID and MARS, for the video-based Re-ID task. Results show that the improved networks perform favorably against existing approaches, demonstrating the effectiveness of our proposed CFFM for comprehensive feature fusion. Furthermore, we compare the performance of different modules in CFFM, investigating the varied significance of the different networks, features and sequential feature aggregation modes.  相似文献   

13.
胡正平  邱悦  翟丰鋆  赵梦瑶  毕帅 《信号处理》2021,37(8):1470-1478
视频行为识别算法在特征提取过程中,存在未聚焦视频图像显著区域信息的问题,使模型分类效果不理想。为了提高网络区别关注的能力,提出融入注意力机制的视频多尺度时序行为识别算法模型。在视频长-短时序网络中分别融入通道-空间注意力和通道注意力模块,引入注意力机制使网络在训练过程中重新分配权重,捕捉视频内容与位置兴趣点,提高网络的表达能力。在Something-somethingV1和Jester数据集上的实验结果表明,融入轻量注意力模块的视频多尺度时序融合行为识别网络的性能得到有效提升,与其他行为识别网络相比体现出一定的优势。   相似文献   

14.
The two-stream convolutional network has been proved to be one milestone in the study of video-based action recognition. Lots of recent works modify internal structure of two-stream convolutional network directly and put top-level features into a 2D/3D convolution fusion module or a simpler one. However, these fusion methods cannot fully utilize features and the way fusing only top-level features lacks rich vital details. To tackle these issues, a novel network called Diverse Features Fusion Network (DFFN) is proposed. The fusion stream of DFFN contains two types of uniquely designed modules, the diverse compact bilinear fusion (DCBF) module and the channel-spatial attention (CSA) module, to distill and refine diverse compact spatiotemporal features. The DCBF modules use the diverse compact bilinear algorithm to fuse features extracted from multiple layers of the base network that are called diverse features in this paper. Further, the CSA module leverages channel attention and multi-size spatial attention to boost key information as well as restraining the noise of fusion features. We evaluate our three-stream network DFFN on three public challenging video action benchmarks: UCF101, HMDB51 and Something-Something V1. Experiment results indicate that our method achieves state-of-the-art performance.  相似文献   

15.
Bag-of-words models have been widely used to obtain the global representation for action recognition. However, these models ignored the structure information, such as the spatial and temporal contextual information for action representation. In this paper, we propose a novel structured codebook construction method to encode spatial and temporal contextual information among local features for video representation. Given a set of training videos, our method first extracts local motion and appearance features. Next, we encode the spatial and temporal contextual information among local features by constructing correlation matrices for local spatio-temporal features. Then, we discover the common patterns of movements to construct the structured codebook. After that, actions can be represented by a set of sparse coefficients with respect to the structured codebook. Finally, a simple linear SVM classifier is applied to predict the action class based on the action representation. Our method has two main advantages compared to traditional methods. First, our method automatically discovers the mid-level common patterns of movements that capture rich spatial and temporal contextual information. Second, our method is robust to unwanted background local features mainly because most unwanted background local features cannot be sparsely represented by the common patterns and they are treated as residual errors that are not encoded into the action representation. We evaluate the proposed method on two popular benchmarks: KTH action dataset and UCF sports dataset. Experimental results demonstrate the advantages of our structured codebook construction.  相似文献   

16.
裴晓敏  范慧杰  唐延东 《红外与激光工程》2020,49(5):20190552-20190552-6
提出一种基于多通道时空融合网络的双人交互行为识别方法,对双人骨架序列行为进行识别。首先,采用视角不变性特征提取方法提取双人骨架特征,然后,设计两层级联的时空融合网络模型,第一层基于一维卷积神经网络(1DCNN)和双向长短时记忆网络(BiLSTM)学习空间特征,第二层基于长短时记忆网络(LSTM)学习时间特征,得到双人骨架的时空融合特征。最后,采用多通道时空融合网络分别学习多组双人骨架特征得到多通道融合特征,利用融合特征识别交互行为,各通道之间权值共享。将文中算法应用于NTU-RGBD人体交互行为骨架库,双人交叉对象实验准确率可达96.42%,交叉视角实验准确率可达97.46%。文中方法与该领域的典型方法相比,在双人交互行为识别中表现出更好的性能。  相似文献   

17.
蒋一  侯丽萍  张强 《红外技术》2021,43(9):852-860
为了提升复杂背景下红外序列的行人动作识别精度,本文提出了一种改进的空时双流网络,该网络首先采用深度差分网络代替时间信息网络,提高时空特征的表征能力与提取效率;然后,采用基于决策级特征融合机制的代价函数对模型进行训练,可以更大限度地保留不同网络帧间图像的时空特征,更加真实地反映行人的动作类别.仿真结果表明,本文提出的改进...  相似文献   

18.
针对如何利用视频中空域C3D与光流2D网络的互补性、光流高效计算与存储问题,提出基于端到端时空双流卷积网络融合的视频分类算法(TV BN-Inception network and ResNeXt-101 TVBN-ResNeXt),可融合C3D与自学习端到端光流卷积网络的优点。针对空间流,首先基于C3D 的ResNeXt-101残差网络进行空域视频分类;然后另一支路使用端到端时间流网络,由TVnet网络实时进行光流学习,其次针对堆叠光流特征数据利用BN-Inception网络进行视频分类;最后将双流支路的视频分类结果进行加权融合形成最后判决。在UCF-101和HMDB-51数据集上的实验分别达到94.6%和70.4%的准确率。结果表明,本文提出的TVBN-ResNeXt双流互补网络融合方法不但可解决光流自学习问题,提高网络的运行效率,还可有效提高视频分类的性能   相似文献   

19.
在动作识别任务中,如何充分学习和利用视频的空间特征和时序特征的相关性,对最终识别结果尤为重要。针对传统动作识别方法忽略时空特征相关性及细小特征,导致识别精度下降的问题,本文提出了一种基于卷积门控循环单元(convolutional GRU, ConvGRU)和注意力特征融合(attentional feature fusion,AFF) 的人体动作识别方法。首先,使用Xception网络获取视频帧的空间特征提取网络,并引入时空激励(spatial-temporal excitation,STE) 模块和通道激励(channel excitation,CE) 模块,获取空间特征的同时加强时序动作的建模能力。此外,将传统的长短时记忆网络(long short term memory, LSTM)网络替换为ConvGRU网络,在提取时序特征的同时,利用卷积进一步挖掘视频帧的空间特征。最后,对输出分类器进行改进,引入基于改进的多尺度通道注意力的特征融合(MCAM-AFF)模块,加强对细小特征的识别能力,提升模型的准确率。实验结果表明:在UCF101数据集和HMDB51数据集上分别达到了95.66%和69.82%的识别准确率。该算法获取了更加完整的时空特征,与当前主流模型相比更具优越性。  相似文献   

20.
张宇  张雷 《电讯技术》2021,61(10):1205-1212
针对现有的深度学习方法在人体动作识别中易出现过拟合、易受到干扰信息影响、特征表达能力不足的问题,提出了一种融入注意力机制的深度学习动作识别方法.该方法在数据预处理中提出了视频数据增强算法,降低了模型过拟合的风险,然后在视频帧采样过程中对现有的采样算法进行了改进,有效抑制了干扰信息的影响,并在特征提取部分提出了融入注意力的残差网络,提高了模型的特征提取能力;之后,利用长短时记忆(Long Short-Term Memory,LSTM)网络解决了空间特征的时序关联问题;最后,通过Softmax完成了相应动作的分类.实验结果表明,在UCF YouTube、KTH和HMDB-51数据集上,所提方法的识别率分别为96.72%、98.06%和64.81%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号