首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
近年来,基于骨架的人体动作识别任务因骨架数据的鲁棒性和泛化能力而受到了广泛关注。其中,将人体骨骼建模为时空图的图卷积网络取得了显著的性能。然而图卷积主要通过一系列3D卷积来学习长期交互联系,这种联系偏向于局部并且受到卷积核大小的限制,无法有效地捕获远程依赖关系。该文提出一种协作卷积Transformer网络(Co-ConvT),通过引入Transformer中的自注意力机制建立远程依赖关系,并将其与图卷积神经网络(GCNs)相结合进行动作识别,使模型既能通过图卷积神经网络提取局部信息,也能通过Transformer捕获丰富的远程依赖项。另外,Transformer的自注意力机制在像素级进行计算,因此产生了极大的计算代价,该模型通过将整个网络分为两个阶段,第1阶段使用纯卷积来提取浅层空间特征,第2阶段使用所提出的ConvT块捕获高层语义信息,降低了计算复杂度。此外,原始Transformer中的线性嵌入被替换为卷积嵌入,获得局部空间信息增强,并由此去除了原始模型中的位置编码,使模型更轻量。在两个大规模权威数据集NTU-RGB+D和Kinetics-Skeleton上进行实验验证,该模型分...  相似文献   

2.
In this paper, we proposed a robust tracking algorithm with an appearance model based on random ferns and template library. We adopt random Gaussian difference to generate binary features which depend on two randomly selected points and their corresponding Gaussian blur kernels. Semi-naive Bayes based random ferns are adopted as the discriminative model, and a template library including both positive templates and negative templates is used as generative model, the co-training of both discriminative and generative models gives our tracker the ability to separate foreground and background samples accurately. Besides, we also come up with a fragment based method which combines global ferns and local ferns to handle the occlusion problem. Experimental results demonstrated that the proposed algorithm performs well in terms of accuracy and robustness.  相似文献   

3.
NLNet has been considered as one milestone in the study of capturing long-range dependencies. Many recent studies modify the internal structure of NLNet directly and apply them to video object detection and semantic segmentation tasks. The dependencies between local and global features have been well developed, but the dependencies between global features of different convolution layers are rarely considered. Convolution is a local operation, so the global features of different convolution layers cannot be directly related, resulting in the loss of dependencies between global features. Given the vulnerability, this study designs a network that can efficiently capture the dependencies between the global features of different convolution layers, potentially further improving the accuracy. Furthermore, for the calculation of the dependency matrix, based on the Dot-product used in NLNet, we propose RELU-Dot-product, which can achieve higher accuracy. We evaluate the proposed method on image classification and object detection tasks. The data sets involved are CIFAR10, CIFAR100, Tiny-imagenet, VOC2007, VOC2012 and MS COCO. Experiments show that our method can significantly improve network performance by introducing a few parameters.  相似文献   

4.
Image denoising requires both spatial details and global contextualized information to recover a clean version from the deteriorative one. Previous deep convolution networks usually focus on modeling the local feature and stacked convolution blocks to expand the receptive field, which can catch the long-distance dependencies. However, contrary to the expectation, the extracted local feature incapacity recovers the global details by traditional convolution while the stacked blocks hinder the information flow. To tackle these issues, we introduce the Matrix Factorization Denoising Module (MD) to model the interrelationship between the global context aggregating process and the reconstructed process to attain the context details. Besides, we redesign a new basic block to ease the information flow and maintain the network performance. In addition, we conceive the Feature Fusion Module (FFU) to fuse the information from the different sources. Inspired by the multi-stage progressive restoration architecture, we adopt two-stage convolution branches progressively reconstructing the denoised image. In this paper, we propose an original and efficient neural convolution network dubbed MFU. Experimental results on various image denoising datasets: SIDD, DND, and synthetic Gaussian noise datasets show that our MFU can produce comparable visual quality and accuracy results with state-of-the-art methods.  相似文献   

5.
陈莹  龚苏明 《电子与信息学报》2021,43(12):3538-3545
针对现有通道注意力机制对各通道信息直接全局平均池化而忽略其局部空间信息的问题,该文结合人体行为识别研究提出了两种改进通道注意力模块,即矩阵操作的时空(ST)交互模块和深度可分离卷积(DS)模块。ST模块通过卷积和维度转换操作提取各通道时空加权信息数列,经卷积得到各通道的注意权重;DS模块首先利用深度可分离卷积获取各通道局部空间信息,然后压缩通道尺寸使其具有全局的感受野,接着通过卷积操作得到各通道注意权重,进而完成通道注意力机制下的特征重标定。将改进后的注意力模块插入基础网络并在常见的人体行为识别数据集UCF101和HDBM51上进行实验分析,实现了准确率的提升。  相似文献   

6.
近年来,孪生网络在视觉目标跟踪的应用给跟踪器性能带来了极大的提升,可以同时兼顾准确率和实时性。然而,孪生网络跟踪器的准确率在很大程度上受到限制。为了解决上述问题,该文基于通道注意力机制,创新地提出了关键特征信息感知模块来增强网络模型的判别能力,使网络聚焦于目标的卷积特征变化;在此基础上,该文还提出了一种在线自适应掩模策略,根据在线学习到的互相关层输出状态,自适应掩模后续帧,以此来突出前景目标。在OTB100, GOT-10k数据集上进行实验验证,所提跟踪器在不影响实时性的前提下,准确率相较于基准有了显著提升,并且在遮挡、尺度变化以及背景杂乱等复杂场景下具有鲁棒的跟踪效果。  相似文献   

7.
Multiscale Bayesian segmentation using a trainable context model   总被引:12,自引:0,他引:12  
Multiscale Bayesian approaches have attracted increasing attention for use in image segmentation. Generally, these methods tend to offer improved segmentation accuracy with reduced computational burden. Existing Bayesian segmentation methods use simple models of context designed to encourage large uniformly classified regions. Consequently, these context models have a limited ability to capture the complex contextual dependencies that are important in applications such as document segmentation. We propose a multiscale Bayesian segmentation algorithm which can effectively model complex aspects of both local and global contextual behavior. The model uses a Markov chain in scale to model the class labels that form the segmentation, but augments this Markov chain structure by incorporating tree based classifiers to model the transition probabilities between adjacent scales. The tree based classifier models complex transition rules with only a moderate number of parameters. One advantage to our segmentation algorithm is that it can be trained for specific segmentation applications by simply providing examples of images with their corresponding accurate segmentations. This makes the method flexible by allowing both the context and the image models to be adapted without modification of the basic algorithm. We illustrate the value of our approach with examples from document segmentation in which test, picture and background classes must be separated.  相似文献   

8.
杨真真  孙雪  邵静  杨永鹏 《信号处理》2022,38(9):1912-1921
为了提高U-Net网络性能的同时尽可能减少额外计算量,本文提出了一种新的多尺度偶数卷积注意力UNet(Multiscale Even Convolution Attention U-Net,MECAU-Net)网络。该网络在编码端采用2×2偶数卷积代替3×3卷积进行特征提取,并借鉴多尺度思想,采用4×4偶数卷积将得到的信息直接传递给主干部分,以获取更全面的图像信息并减少额外计算开销,同时还采用对称填充解决偶数卷积提取信息过程中产生的偏移问题。此外,在2×2偶数卷积模块后加入卷积注意力模块,结合空间和通道注意力,在提取更丰富的信息的同时几乎不增加额外开销。最后,在两个医学图像数据集上进行仿真实验,实验结果表明提出的MECAU-Net网络相对于U-Net在稍微增加计算成本的情况下,分割性能得到了较大的提升,并比其他对比网络取得更好的分割性能的同时还降低了参数量。  相似文献   

9.
目前主流的深度融合方法仅利用卷积运算来提取图像局部特征,但图像与卷积核之间的交互过程与内容无关,且不能有效建立特征长距离依赖关系,不可避免地造成图像上下文内容信息的丢失,限制了红外与可见光图像的融合性能。为此,本文提出了一种红外与可见光图像多尺度Transformer融合方法。以Swin Transformer为组件,架构了Conv Swin Transformer Block模块,利用卷积层增强图像全局特征的表征能力。构建了多尺度自注意力编码-解码网络,实现了图像全局特征提取与全局特征重构;设计了特征序列融合层,利用SoftMax操作计算特征序列的注意力权重系数,突出了源图像各自的显著特征,实现了端到端的红外与可见光图像融合。在TNO、Roadscene数据集上的实验结果表明,该方法在主观视觉描述和客观指标评价都优于其他典型的传统与深度学习融合方法。本方法结合自注意力机制,利用Transformer建立图像的长距离依赖关系,构建了图像全局特征融合模型,比其他深度学习融合方法具有更优的融合性能和更强的泛化能力。  相似文献   

10.

In recent years, to improve the nonlinear feature mapping ability of the image super-resolution network, the depth of the convolutional neural network is getting deeper and deeper. In the existing residual network, the the residual block’s output and input are added directly through the skip connection to deepen the nonlinear mapping layer. However, it can not be proved that every addition is useful to improve the network’s performance. In this paper, based on Dirac convolution, an improved Dirac residual block is proposed, which uses the trainable parameters to adaptively control the balance of the convolution and the skip connection to increase the nonlinear mapping ability of the model. The main body network uses multiple Dirac residual blocks to learn the nonlinear mapping of high-frequency information between LR and HR images. In addition, the global skip connection is realized by sub-pixel convolution, which can learn to use linear mapping of low-frequency features of input LR image. In the training stage, the model uses Adam optimizer for network training and L1 as the loss function. The experiments compare our algorithm with some other state-of-the-art models in PSNR, SSIM, IFC, and visual effect on five different benchmark datasets. The results show that the proposed model has excellent performance both in subjective and objective evaluation.

  相似文献   

11.
In this paper, we address a complex image registration issue arising while the dependencies between intensities of images to be registered are not spatially homogeneous. Such a situation is frequently encountered in medical imaging when a pathology present in one of the images modifies locally intensity dependencies observed on normal tissues. Usual image registration models, which are based on a single global intensity similarity criterion, fail to register such images, as they are blind to local deviations of intensity dependencies. Such a limitation is also encountered in contrast-enhanced images where there exist multiple pixel classes having different properties of contrast agent absorption. In this paper, we propose a new model in which the similarity criterion is adapted locally to images by classification of image intensity dependencies. Defined in a Bayesian framework, the similarity criterion is a mixture of probability distributions describing dependencies on two classes. The model also includes a class map which locates pixels of the two classes and weighs the two mixture components. The registration problem is formulated both as an energy minimization problem and as a maximum a posteriori estimation problem. It is solved using a gradient descent algorithm. In the problem formulation and resolution, the image deformation and the class map are estimated simultaneously, leading to an original combination of registration and classification that we call image classifying registration. Whenever sufficient information about class location is available in applications, the registration can also be performed on its own by fixing a given class map. Finally, we illustrate the interest of our model on two real applications from medical imaging: template-based segmentation of contrast-enhanced images and lesion detection in mammograms. We also conduct an evaluation of our model on simulated medical data and show its ability to take into account spatial variations of intensity dependencies while keeping a good registration accuracy.  相似文献   

12.
Siamese trackers have attracted considerable attention in the field of object tracking because of their high precision and speed. However, one of the main disadvantages of Siamese trackers is that their feature extraction network is relatively single. They often use AlexNet or ResNet50 as the backbone network. AlexNet is shallow and thus cannot easily extract abundant semantic information, whereas ResNet50 has many convolutional layers, reducing the real-time performance of Siamese trackers. We propose a multi-branch feature aggregation network with different designs in the shallow and deep convolutional layers. We use the residual module to build the shallow convolutional layers to extract textural and edge features. The deep convolution layers, designed with two independent branches, are built with residual and parallel modules to extract different semantic features. The proposed network has a depth of only nine modules, and thus it is a simple and effective network. We then apply the network to a Siamese tracker to form SiamMBFAN. We design multi-layer classification and regression subnetworks in the Siamese tracker by aggregating the last three modules of the two branches, improving the localization ability of the tracker. Our tracker achieves a better balance between performance and speed. Finally, SiamMBFAN is tested on four challenging benchmarks, including OTB100, VOT2016, VOT2018, and UAV123. Compared with other trackers, our tracker improves by 7% (OTB100).  相似文献   

13.
Tracking objects in video using the mean shift (MS) technique has been the subject of considerable attention. In this work, we aim to remedy one of its shortcomings. MS, like other gradient ascent optimization methods, is designed to find local modes. In many situations, however, we seek the global mode of a density function. The standard MS tracker assumes that the initialization point falls within the basin of attraction of the desired mode. When tracking objects in video this assumption may not hold, particularly when the target's displacement between successive frames is large. In this case, the local and global modes do not correspond and the tracker is likely to fail. A novel multibandwidth MS procedure is proposed which converges to the global mode of the density function, regardless of the initialization point. We term the procedure annealed MS, as it shares similarities with the annealed importance sampling procedure. The bandwidth of the procedure plays the same role as the temperature in conventional annealing. We observe that an over-smoothed density function with a sufficiently large bandwidth is unimodal. Using a continuation principle, the influence of the global peak in the density function is introduced gradually. In this way, the global maximum is more reliably located. Since it is imperative that the computational complexity is minimal for real-time applications, such as visual tracking, we also propose an accelerated version of the algorithm. This significantly decreases the number of iterations required to achieve convergence. We show on various data sets that the proposed algorithm offers considerable promise in reliably and rapidly finding the true object location when initialized from a distant point.  相似文献   

14.
Computing discrete two-dimensional (2-D) convolutions is an important problem in image processing. In mathematical morphology, an important variant is that of computing binary convolutions, where the kernel of the convolution is a 0-1 valued function. This operation can be quite costly, especially when large kernels are involved. We present an algorithm for computing convolutions of this form, where the kernel of the binary convolution is derived from a convex polygon. Because the kernel is a geometric object, we allow the algorithm some flexibility in how it elects to digitize the convex kernel at each placement, as long as the digitization satisfies certain reasonable requirements. We say that such a convolution is valid. Given this flexibility we show that it is possible to compute binary convolutions more efficiently than would normally be possible for large kernels. Our main result is an algorithm which, given an mxn image and a k-sided convex polygonal kernel K, computes a valid convolution in O(kmn) time. Unlike standard algorithms for computing correlations and convolutions, the running time is independent of the area or perimeter of K, and our techniques do not rely on computing fast Fourier transforms. Our algorithm is based on a novel use of Bresenham's (1965) line-drawing algorithm and prefix-sums to update the convolution incrementally as the kernel is moved from one position to another across the image.  相似文献   

15.
The accurate fitting of a circle to noisy measurements of circumferential points is a much studied problem in the literature. In this paper, we present an interpretation of the maximum-likelihood estimator (MLE) and the Delogne-K?sa estimator (DKE) for circle-center and radius estimation in terms of convolution on an image which is ideal in a certain sense. We use our convolution-based MLE approach to find good estimates for the parameters of a circle in digital images. In digital images, it is then possible to treat these estimates as preliminary estimates into various other numerical techniques which further refine them to achieve subpixel accuracy. We also investigate the relationship between the convolution of an ideal image with a "phase-coded kernel" (PCK) and the MLE. This is related to the "phase-coded annulus" which was introduced by Atherton and Kerbyson who proposed it as one of a number of new convolution kernels for estimating circle center and radius. We show that the PCK is an approximate MLE (AMLE). We compare our AMLE method to the MLE and the DKE as well as the Cramér-Rao Lower Bound in ideal images and in both real and synthetic digital images.  相似文献   

16.
We present an explicit formula for B-spline convolution kernels; these are defined as the convolution of several B-splines of variable widths h(i) and degrees n(i). We apply our results to derive spline-convolution-based algorithms for two closely related problems: the computation of the Radon transform and of its inverse. First, we present an efficient discrete implementation of the Radon transform that is optimal in the least-squares sense. We then consider the reverse problem and introduce a new spline-convolution version of the filtered back-projection algorithm for tomographic reconstruction. In both cases, our explicit kernel formula allows for the use of high-degree splines; these offer better approximation performance than the conventional lower-degree formulations (e.g., piecewise constant or piecewise linear models). We present multiple experiments to validate our approach and to find the parameters that give the best tradeoff between image quality and computational complexity. In particular, we find that it can be computationally more efficient to increase the approximation degree than to increase the sampling rate.  相似文献   

17.
The Mumford–Shah model is one of the most successful image segmentation models. However, existing algorithms for the model are often very sensitive to the choice of the initial guess. To make use of the model effectively, it is essential to develop an algorithm which can compute a global or near global optimal solution efficiently. While gradient descent based methods are well-known to find a local minimum only, even many stochastic methods do not provide a practical solution to this problem either. In this paper, we consider the computation of a global minimum of the multiphase piecewise constant Mumford–Shah model. We propose a hybrid approach which combines gradient based and stochastic optimization methods to resolve the problem of sensitivity to the initial guess. At the heart of our algorithm is a well-designed basin hopping scheme which uses global updates to escape from local traps in a way that is much more effective than standard stochastic methods. In our experiments, a very high-quality solution is obtained within a few stochastic hops whereas the solutions obtained with simulated annealing are incomparable even after thousands of steps. We also propose a multiresolution approach to reduce the computational cost and enhance the search for a global minimum. Furthermore, we derived a simple but useful theoretical result relating solutions at different spatial resolutions.   相似文献   

18.
Sparse representation has been attracting much more attention in visual tracking. However most sparse representation based trackers only focus on how to model the target appearance and do not consider the learning of sparse representation when the training samples are imprecise, and hence may drift or fail in the challenging scene. In this paper, we present a novel online tracking algorithm. The tracker integrates the online multiple instance learning into the recent sparse representation scheme. For tracking, the integrated sparse representation combining texture, intensity and local spatial information is proposed to model the target. This representation takes both occlusion and appearance change into account. Then, an efficient online learning approach is proposed to select the most distinguishable features to separate the target from the background samples. In addition, the sparse representation is dynamically updated online with respect to the current context. Both qualitative and quantitative evaluations on challenging benchmark video sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods.  相似文献   

19.
基于改进深层网络的人脸识别算法   总被引:4,自引:0,他引:4       下载免费PDF全文
目前的人脸识别算法在其特征提取过程中采用手工设计(hand-crafted)特征或利用深度学习自动提取特征.本文提出一种基于改进深层网络自动提取特征的人脸识别算法,可以更准确地提取出目标的鉴别性特征.算法首先对图像进行ZCA(Zero-mean Component Analysis)白化等预处理,减小特征相关性,降低网络训练复杂度.然后,基于卷积、池化、多层稀疏自动编码器构建深层网络特征提取器.所使用的卷积核是通过单独的无监督学习获得的.此改进的深层网络通过预训练和微调,得到一个自动的深层特征提取器.最后,利用Softmax回归模型对提取的特征进行分类.本文算法在多个常用人脸库上进行了实验,表明了其在性能上比传统方法和普通深度学习方法都有所提高.  相似文献   

20.
针对骨架行为识别对时空特征提取不充分以及难以捕捉全局上下文信息的问题,研究了一种将时空注意力机制和自适应图卷积网络相结合的人体骨架行为识别方案。首先,构建基于非局部操作的时空注意力模块,辅助模型关注骨架序列中最具判别性的帧和区域;其次,利用高斯嵌入函数和轻量级卷积神经网络的特征学习能力,并考虑人体先验知识在不同时期的影响,构建自适应图卷积网络;最后,将自适应图卷积网络作为基本框架,并嵌入时空注意力模块,与关节信息、骨骼信息以及各自的运动信息构建双流融合模型。该算法在NTU RGB+D数据集的两种评价标准下分别达到了90.2%和96.2%的准确率,在大规模的数据集Kinetics上体现出模型的通用性,验证了该算法在提取时空特征和捕捉全局上下文信息上的优越性。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号