首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
The two-stream convolutional network has been proved to be one milestone in the study of video-based action recognition. Lots of recent works modify internal structure of two-stream convolutional network directly and put top-level features into a 2D/3D convolution fusion module or a simpler one. However, these fusion methods cannot fully utilize features and the way fusing only top-level features lacks rich vital details. To tackle these issues, a novel network called Diverse Features Fusion Network (DFFN) is proposed. The fusion stream of DFFN contains two types of uniquely designed modules, the diverse compact bilinear fusion (DCBF) module and the channel-spatial attention (CSA) module, to distill and refine diverse compact spatiotemporal features. The DCBF modules use the diverse compact bilinear algorithm to fuse features extracted from multiple layers of the base network that are called diverse features in this paper. Further, the CSA module leverages channel attention and multi-size spatial attention to boost key information as well as restraining the noise of fusion features. We evaluate our three-stream network DFFN on three public challenging video action benchmarks: UCF101, HMDB51 and Something-Something V1. Experiment results indicate that our method achieves state-of-the-art performance.  相似文献   

2.
In video-based action recognition, using videos with different frame numbers to train a two-stream network can result in data skew problems. Moreover, extracting the key frames from a video is crucial for improving the training and recognition efficiency of action recognition systems. However, previous works suffer from problems of information loss and optical-flow interference when handling videos with different frame numbers. In this paper, an augmented two-stream network (ATSNet) is proposed to achieve robust action recognition. A frame-number-unified strategy is first incorporated into the temporal stream network to unify the frame numbers of videos. Subsequently, the grayscale statistics of the optical-flow images are extracted to filter out any invalid optical-flow images and produce the dynamic fusion weights for the two branch networks to adapt to different action videos. Experiments conducted on the UCF101 dataset demonstrate that ATSNet outperforms previously defined methods, improving the recognition accuracy by 1.13%.  相似文献   

3.
张昱彤  翟旭平  聂宏 《红外技术》2022,44(3):286-293
近年来动作识别成为计算机视觉领域的研究热点,不同于针对视频图像进行的研究,本文针对低分辨率红外传感器采集到的温度数据,提出了一种基于此类红外传感器的双流卷积神经网络动作识别方法.空间和时间数据分别以原始温度值的形式同时输入改进的双流卷积神经网络中,最终将空间流网络和时间流网络的概率矢量进行加权融合,得到最终的动作类别....  相似文献   

4.
As a challenging task of video classification, action recognition has become a significant topic of computer vision community. The most popular methods based on two-stream architecture up to now are still simply fusing the prediction scores of each stream. In that case, the complementary characteristics of two streams cannot be fully utilized and the effect of shallower features is often overlooked. In addition, the equal treatment to features may weaken the role of the feature contributing significantly to the classification. Accordingly, a novel network called Multiple Depth-levels Features Fusion Enhanced Network (MDFFEN) is proposed. It improves on two aspects of two-stream architecture. In terms of the two-stream interaction mechanism, multiple depth-levels features fusion (MDFF) is formed to aggregate spatial–temporal features extracted from several sub-modules of original two streams by spatial–temporal features fusion (STFF). And with respect to further refining the spatiotemporal features, we propose a group-wise spatial-channel enhance (GSCE) module to highlight the meaningful regions and expressive channels automatically by priority assignment. The competitive results are achieved after we validate MDFFEN on three public challenging action recognition datasets, HDMB51, UCF101 and ChaLearn LAP IsoGD.  相似文献   

5.
林森  赵振禹  任晓奎  陶志勇 《红外与激光工程》2022,51(8):20210702-1-20210702-12
3D点云数据处理在物体分割、医学图像分割和虚拟现实等领域起到了重要作用。然而现有3D点云学习网络全局特征提取范围小,难以描述局部高级语义信息,进而导致点云特征表述不完整。针对这些问题,提出一种基于语义信息补偿全局特征的物体点云分类分割网络。首先,将输入的点云数据对齐到规范空间,进行数据的输入转换预处理。然后,利用扩张边缘卷积模块提取转换后数据的每一层特征,并叠加生成全局特征。而在局部特征提取时,利用提取到的低级语义信息来描述高级语义信息和有效几何特征,用于补偿全局特征中遗漏的点云特征。最后,融合全局特征和局部高级语义信息得到点云的整体特征。实验结果表明,文中方法在分类和分割性能上优于目前经典和新颖的算法。  相似文献   

6.
为了更好地对人体动作的长时时域信息进行建模,提出了一种结合时序动态图和双流卷积网络的人体行为识别算法。首先,利用双向顺序池化算法来构建时序动态图,实现视频从三维空间到二维空间的映射,用来提取动作的表观和长时时序信息;然后提出了基于inceptionV3的双流卷积网络,包含表观及长时运动流和短时运动流,分别以时序动态图和堆叠的光流帧序列作为输入,且结合数据增强、模态预训练、稀疏采样等方式;最后将各支流输出的类别判定分数通过平均池化的方式进行分数融合。在UCF101和HMDB51数据集的实验结果表明:与传统双流卷积网络相比,该方法可以有效利用动作的时空信息,识别率得到较大的提升,具有有效性和鲁棒性。  相似文献   

7.
3D hand pose estimation by taking point cloud as input has been paid more and more attention recently. In this paper, a new module for point cloud processing, named Local-aware Point Processing Module (LPPM), is designed. With the ability to extract local information, it is permutation invariant w.r.t. neighboring points in input point cloud and is an independent module that is easy to be implemented and flexible to construct point cloud network. Based on this module, a LPPM-Net is constructed to estimate 3D hand pose. In order to normalize orientations of the point cloud as well as to maintain diversity properly in a controllable manner, we transform point cloud into an oriented bounding box coordinate system (OBB C.S.) and then rotate it randomly around the principal axis when training. In addition, a simple but effective technique called sampling ensemble is used in the test stage, which compensates for the resolution degradation caused by downsampling and improves the performance without extra parameters. We evaluate the proposed method on three public hand datasets: NYU, ICVL, and MSRA. Results show that our approach has a competitive performance on the three datasets.  相似文献   

8.
Huilan LUO  Kang TONG 《通信学报》2019,40(10):189-198
Aiming at the shortcomings of shallow networks and general deep models in two-stream network structure,which could not effectively learn spatial and temporal information,a squeeze-and-excitation residual network was proposed for action recognition with a spatial stream and a temporal stream.Meanwhile,the long-term temporal dependence was captured by injecting the identity mapping kernel into the network as a temporal filter.Spatiotemporal feature multiplication fusion was used to further enhance the interaction between spatial information and temporal information of squeeze-and-excitation residual networks.Simultaneously,the influence of spatial-temporal stream multiplication fusion methods,times and locations on the performance of action recognition was studied.Given the limitations of performance achieved by a single model,three different strategies were proposed to generate multiple models,and the final recognition result was obtained by integrating these models through averaging and weighted averaging.The experimental results on the HMDB51 and UCF101 datasets show that the proposed spatiotemporal squeeze-and-excitation residual multiplier networks can effectively improve the performance of action recognition.  相似文献   

9.
本文提出了一种基于双流特征融合的FMCW雷达人体连续动作识别方法。首先对人体动作雷达回波信号进行预处理得到距离时间域图与微多普勒时频谱图,之后分别对两个不同维度的图像进行主成分分析提取对应特征并选取相同时间段的主成分分析结果进行融合得到双流融合特征,最后将双流融合特征输入到Bi-LSTM网络中训练与测试,网络对每个时间段的输入特征产生与之对应的动作类别输出从而实现连续人体动作识别。实验结果表明,当采用双流融合特征作为Bi-LSTM网络的输入时平均识别准确率要高于只采用距离时间特征或微多普勒特征作为网络输入时的平均识别准确率。  相似文献   

10.
刘桂玉  刘佩林  钱久超 《信息技术》2020,(5):121-124,130
基于3D骨架的动作识别技术现已成为人机交互的重要手段。为了提高3D动作识别的精度,文中提出一种将3D骨架特征和2D图片特征进行融合的双流神经网络。其中一个网络处理3D骨架序列,另一个网络处理2D图片。最后再将二者的特征进行融合,以提高识别精度。相较于单独使用3D骨架的动作识别,文中所使用的方法在NTU_RGBD数据集以及SYSU数据集上都有了很大的精度提升。  相似文献   

11.
针对如何利用视频中空域C3D与光流2D网络的互补性、光流高效计算与存储问题,提出基于端到端时空双流卷积网络融合的视频分类算法(TV BN-Inception network and ResNeXt-101 TVBN-ResNeXt),可融合C3D与自学习端到端光流卷积网络的优点。针对空间流,首先基于C3D 的ResNeXt-101残差网络进行空域视频分类;然后另一支路使用端到端时间流网络,由TVnet网络实时进行光流学习,其次针对堆叠光流特征数据利用BN-Inception网络进行视频分类;最后将双流支路的视频分类结果进行加权融合形成最后判决。在UCF-101和HMDB-51数据集上的实验分别达到94.6%和70.4%的准确率。结果表明,本文提出的TVBN-ResNeXt双流互补网络融合方法不但可解决光流自学习问题,提高网络的运行效率,还可有效提高视频分类的性能   相似文献   

12.
针对复杂场景点云分割精度不高、神经网络隐藏单元缺乏直接监督,难以提取语义明确的点云特征等问题,提出了一种将多尺度监督和SCF-Net相结合的点云语义分割网络。首先构建了一个类别信息生成模块,记录编码器中隐藏单元感受野内的类别,用于解码器中辅助分类器的监督学习。其次将解码阶段的点云类别预测任务分解成一系列点云感受野类别预测任务,通过对解码器中每一层添加辅助分类器,预测当前阶段点云感受野类别,编码阶段生成的类别信息作为标签监督网络学习。模型从粗到细地推理点云感受野类别,最终预测得到点云语义标签。实验结果表明,该方法能够有效提取点云关键信息,提高语义分割精度。  相似文献   

13.
Action recognition in video is one of the most important and challenging tasks in computer vision. How to efficiently combine the spatial-temporal information to represent video plays a crucial role for action recognition. In this paper, a recurrent hybrid network architecture is designed for action recognition by fusing multi-source features: a two-stream CNNs for learning semantic features, a two-stream single-layer LSTM for learning long-term temporal feature, and an Improved Dense Trajectories (IDT) stream for learning short-term temporal motion feature. In order to mitigate the overfitting issue on small-scale dataset, a video data augmentation method is used to increase the amount of training data, as well as a two-step training strategy is adopted to train our recurrent hybrid network. Experiment results on two challenging datasets UCF-101 and HMDB-51 demonstrate that the proposed method can reach the state-of-the-art performance.  相似文献   

14.
激光点云是3D传感器的输出,且对它的语义分割任务是理解真实世界的基础。基于图卷积的点云分割网络在许多场景下都展现了优异的性能。然而,现有的图卷积方法存在部分问题:点云局部表示的能力未得到加强,忽略了全局几何信息,并且聚合操作只保留局部最大响应值信息,而次最大值信息丢失。为了处理这些问题,本文提出GRes-Net网络。利用局部几何加强(Local Geometry Augment,LGA)模块,使网络对Z轴具有旋转不变性,以便加强点云局部信息表示;采用全局几何特征(Global Geometry Feature,GGF)模块,计算局部与全局的球体体积比,将其与坐标特征X进行连接,使全局几何信息特征得以保留;通过多个对称聚合操作将局部信息多方面地保留;网络中每层都使用残差操作,将上一层信息传递到下一层,以及利用反向残差模块(Reversed Residual MLP,RevResMLP)挖掘更深层次的语义信息。本文在S3DIS数据集上进行语义场景分割实验,验证网络分割的性能。实验结果表明该方法在分割精度上达到61%,相比于基准网络DGCNN提高14%,有效地提高了模型性能。  相似文献   

15.
针对激光雷达点云的稀疏性和空间离散分布的特点,通过结合体素划分和图表示方法设计了新的图卷积特征提取模块,提出一种基于体素化图卷积神经网络的激光雷达三维点云目标检测算法。该方法通过消除传统3D卷积神经网络的计算冗余性,不仅提升了网络的目标检测能力,并且提高了点云拓扑信息的分析能力。文中设计的方法在KITTI公开数据集的车辆、行人、骑行者的3D目标检测和鸟瞰图目标检测任务的检测性能相比基准网络均有了有效提升,尤其在车辆3D目标检测任务上最高提升了13.75%。实验表明:该方法采用图卷积特征提取模块有效提高了网络整体检测性能和数据拓扑关系的学习能力,为三维点云目标检测任务提供了新的方法。  相似文献   

16.
Fan  Weibei  Han  Zhijie  Li  Peng  Zhou  Jingya  Fan  Jianxi  Wang  Ruchuan 《Journal of Signal Processing Systems》2019,91(10):1077-1089

With the wide application of cloud computing, the scale of cloud data center network is growing. The virtual machine (VM) live migration technology is becoming more crucial in cloud data centers for the purpose of load balance, and efficient utilization of resources. The lightweight virtualization technique has made virtual machines more portable, efficient and easier to management. Different from virtual machines, containers bring more lightweight, more flexible and more intensive service capabilities to the cloud. Researches on container migration is still in its infancy, especially live migration is still very immature. In this paper, we present the locality live migration model where we take into account the distance, available bandwidth and costs between containers. Furthermore, we conduct comprehensive experiments on a cluster. Extensive simulation results show that the proposed method improves the utilization of resources of servers, and also improves the balance of all kinds of resources on the physical machine.

  相似文献   

17.
RGB-D图像显著性检测是在一组成对的RGB和Depth图中识别出视觉上最显著突出的目标区域。已有的双流网络,同等对待多模态的RGB和Depth图像数据,在提取特征方面几乎一致。然而,低层的Depth特征存在较大噪声,不能很好地表征图像特征。因此,该文提出一种多模态特征融合监督的RGB-D图像显著性检测网络,通过两个独立流分别学习RGB和Depth数据,使用双流侧边监督模块分别获取网络各层基于RGB和Depth特征的显著图,然后采用多模态特征融合模块来融合后3层RGB和Depth高维信息生成高层显著预测结果。网络从第1层至第5层逐步生成RGB和Depth各模态特征,然后从第5层到第3层,利用高层指导低层的方式产生多模态融合特征,接着从第2层到第1层,利用第3层产生的融合特征去逐步地优化前两层的RGB特征,最终输出既包含RGB低层信息又融合RGB-D高层多模态信息的显著图。在3个公开数据集上的实验表明,该文所提网络因为使用了双流侧边监督模块和多模态特征融合模块,其性能优于目前主流的RGB-D显著性检测模型,具有较强的鲁棒性。  相似文献   

18.
针对大部分行为识别算法效率较低,难以应对大规模影像识别任务的问题,一方面,提出一种结合双流结构与多纤维网络的双流多纤维网络模型,分别以RGB序列、光流序列为输入提取视频的时空信息,然后将两条支路网络的识别结果进行决策相加,提高了对战场目标聚集行为的检测效率与识别准确率;另一方面,提出一种结合分离卷积思想与多纤维网络的双流分离卷积多纤维网络模型,进一步提高网络检测效率与抗过拟合能力。实验表明,在建立的情报影像仿真数据集中,上述算法能够有效识别出战场目标聚集行为,在大幅提升检测效率同时实现了识别准确率的提升。  相似文献   

19.
由于数据量大,直接传输点云曲面需耗费大量传输时间。文章提出了基于几何图像表示的点云曲面网络渐进传输方法。基于几何图像表示的点云曲面减小了点云曲面的存储代价,大大节约了网络传输时间。为了实现渐进传输,对平均金字塔层次结构图像进行了改进,通过几何图像的渐进传输来实现点云曲面的网络渐进传输。  相似文献   

20.
桑海峰  赵子裕  何大阔 《电子学报》2020,48(6):1052-1061
视频帧中复杂的环境背景、照明条件等与行为无关的视觉信息给行为空间特征带来了大量的冗余和噪声,一定程度上影响了行为识别的准确性.针对这一点,本文提出了一种循环区域关注单元以捕捉空间特征中与行为相关的区域视觉信息,并根据视频的时序特性又提出了循环区域关注模型.其次,本文又提出了一种能够突显整段行为视频序列中较为重要帧的视频帧关注模型,以减少异类行为视频序列间相似的前后关联给识别带来的干扰.最后,提出了一个能够端到端训练的网络模型:基于循环区域关注和视频帧关注的视频行为识别网络(Recurrent Region Attention and Video Frame Attention based video action recognition Network,RFANet).在两个视频行为识别基准UCF101数据集和HMDB51数据集上的实验表明,本文提出的端到端网络RFANet能够可靠地识别出视频中行为的所属类别.受双流结构启发,本文构建了双模态RFANet网络.在相同的训练环境下,双模态RFANet网络在两个数据集上达到了最优的性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号