首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
目的 视觉目标跟踪算法主要包括基于相关滤波和基于孪生网络两大类。前者虽然精度较高但运行速度较慢,无法满足实时要求。后者在速度和精度方面取得了出色的跟踪性能,然而,绝大多数基于孪生网络的目标跟踪算法仍然使用单一固定的模板,导致算法难以有效处理目标遮挡、外观变化和相似干扰物等情形。针对当前孪生网络跟踪算法的不足,提出了一种高效、鲁棒的双模板融合目标跟踪方法(siamese tracker with double template fusion,Siam-DTF)。方法 使用第1帧的标注框作为初始模板,然后通过外观模板分支借助外观模板搜索模块在跟踪过程中为目标获取合适、高质量的外观模板,最后通过双模板融合模块,进行响应图融合和特征融合。融合模块结合了初始模板和外观模板各自的优点,提升了算法的鲁棒性。结果 实验在3个主流的目标跟踪公开数据集上与最新的9种方法进行比较,在OTB2015(object tracking benchmark 2015)数据集中,本文方法的AUC(area under curve)得分和精准度分别为0.701和0.918,相比于性能第2的SiamRPN++(siamese region proposal network++)算法分别提高了0.6%和1.3%;在VOT2016(visual object tracking 2016)数据集中,本文方法取得了最高的期望平均重叠(expected average overlap,EAO)和最少的失败次数,分别为0.477和0.172,而且EAO得分比基准算法SiamRPN++提高了1.6%,比性能第2的SiamMask_E算法提高了1.1%;在VOT2018数据集中,本文方法的期望平均重叠和精确度分别为0.403和0.608,在所有算法中分别排在第2位和第1位。本文方法的平均运行速度达到47帧/s,显著超出跟踪问题实时性标准要求。结论 本文提出的双模板融合目标跟踪方法有效克服了当前基于孪生网络的目标跟踪算法的不足,在保证算法速度的同时有效提高了跟踪的精确度和鲁棒性,适用于工程部署与应用。  相似文献   

2.

Information on social media is multi-modal, most of which contains the meaning of sarcasm. In recent years, many people have studied the problem of sarcasm detection. Many traditional methods have been proposed in this field, but the study of deep learning methods to detect sarcasm is still insufficient. It is necessary to comprehensively consider the information of the text,the changes of the tone of the audio signal,the facial expressions and the body posture in the image to detect sarcasm. This paper proposes a multi-level late-fusion learning framework with residual connections, a more reasonable experimental data-set split and two model variants based on different experimental settings. Extensive experiments on the MUStARD show that our methods are better than other fusion models. In our speaker-independent experimental split, the multi-modality has a 4.85% improvement over the single-modality, and the Error rate reduction has an improvement of 11.8%. The latest code will be updated to this URL later: https://github.com/DingNing123/m_fusion

  相似文献   

3.
针对Siamese跟踪算法在目标形变、相似物体干扰等复杂情况下容易跟踪漂移或丢失的问题,提出一种融合残差连接与通道注意力机制的目标跟踪算法.首先,通过残差连接将模板分支网络提取的浅层结构特征与深层语义特征进行有效的融合,以提高模型的表征能力;其次,引入通道注意力模块,使模型自适应地对不同语义目标特征通道加权,以提高模型...  相似文献   

4.
5.

Deep learning has been extensively researched in the field of document analysis and has shown excellent performance across a wide range of document-related tasks. As a result, a great deal of emphasis is now being placed on its practical deployment and integration into modern industrial document processing pipelines. It is well known, however, that deep learning models are data-hungry and often require huge volumes of annotated data in order to achieve competitive performances. And since data annotation is a costly and labor-intensive process, it remains one of the major hurdles to their practical deployment. This study investigates the possibility of using active learning to reduce the costs of data annotation in the context of document image classification, which is one of the core components of modern document processing pipelines. The results of this study demonstrate that by utilizing active learning (AL), deep document classification models can achieve competitive performances to the models trained on fully annotated datasets and, in some cases, even surpass them by annotating only 15–40% of the total training dataset. Furthermore, this study demonstrates that modern AL strategies significantly outperform random querying, and in many cases achieve comparable performance to the models trained on fully annotated datasets even in the presence of practical deployment issues such as data imbalance, and annotation noise, and thus, offer tremendous benefits in real-world deployment of deep document classification models. The code to reproduce our experiments is publicly available at https://github.com/saifullah3396/doc_al.

  相似文献   

6.
Zhang  Yuteng  Lu  Wenpeng  Ou  Weihua  Zhang  Guoqiang  Zhang  Xu  Cheng  Jinyong  Zhang  Weiyu 《Multimedia Tools and Applications》2020,79(21-22):14751-14776

Question answer selection in the Chinese medical field is very challenging since it requires effective text representations to capture the complex semantic relationships between Chinese questions and answers. Recent approaches on deep learning, e.g., CNN and RNN, have shown their potential in improving the selection quality. However, these existing methods can only capture a part or one-side of semantic relationships while ignoring the other rich and sophisticated ones, leading to limited performance improvement. In this paper, a series of neural network models are proposed to address Chinese medical question answer selection issue. In order to model the complex relationships between questions and answers, we develop both single and hybrid models with CNN and GRU to combine the merits of different neural network architectures. This is different from existing works that can onpy capture partial relationships by utilizing a single network structure. Extensive experimental results on cMedQA dataset demonstrate that the proposed hybrid models, especially BiGRU-CNN, significantly outperform the state-of-the-art methods. The source codes of our models are available in the GitHub (https://github.com/zhangyuteng/MedicalQA-CNN-BiGRU).

  相似文献   

7.
在全卷积孪生网络跟踪算法(SiamFC)的基础上,提出一种融合注意力机制的孪生网络目标跟踪算法。在网络模板分支,通过融合注意力机制,由神经网络学习模板图像的通道相关性和空间相关性,进而增大前景贡献,抑制背景特征,提升网络对正样本特征的辨别力;同时,使用VggNet-19网络提取模板图像的浅层特征和深层特征,两种特征自适应融合。在OTB2015和VOT2018数据集上得到的实验结果表明,与SiamFC相比,所提算法能够更好地应对运动模糊、目标漂移和背景多变等问题,取得了更高的准确率和成功率。  相似文献   

8.
Liu  Caifeng  Feng  Lin  Liu  Guochao  Wang  Huibing  Liu  Shenglan 《Multimedia Tools and Applications》2021,80(5):7313-7331

Music genre classification based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of the existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audio and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the multi-scale time-frequency information into considerations, which transfers more suitable semantic features for the decision-making layer to discriminate the genre of the unknown music clip. The experiments are evaluated on the benchmark datasets including GTZAN, Ballroom, and Extended Ballroom. The experimental results show that the proposed method can achieve 93.9%, 96.7%, 97.2% classification accuracies respectively, which to the best of our knowledge, are the best results on these public datasets so far. It is notable that the trained model by our proposed network possesses tiny size, only 0.18M, which can be applied in mobile phones or other devices with limited computational resources. Codes and model will be available at https://github.com/CaifengLiu/music-genre-classification.

  相似文献   

9.

Person re-identification (Re-ID) in real-world scenarios suffers from various degradations, e.g., low resolution, weak lighting, and bad weather. These degradations hinders identity feature learning and significantly degrades Re-ID performance. To address these issues, in this paper, we propose a degradation invariance learning framework for robust person Re-ID. Concretely, we first design a content-degradation feature disentanglement strategy to capture and isolate task-irrelevant features contained in the degraded image. Then, to avoid the catastrophic forgetting problem, we introduce a memory replay algorithm to further consolidate invariance knowledge learned from the previous pre-training to improve subsequent identity feature learning. In this way, our framework is able to continuously maintain degradation-invariant priors from one or more datasets to improve the robustness of identity features, achieving state-of-the-art Re-ID performance on several challenging real-world benchmarks with a unified model. Furthermore, the proposed framework can be extended to low-level image processing, e.g., low-light image enhancement, demonstrating the potential of our method as a general framework for the various vision tasks. Code and trained models will be available at: https://github.com/hyk1996/Degradation-Invariant-Re-ID-pytorch.

  相似文献   

10.
Accurate estimation of the remaining useful life (RUL) of lithium-ion batteries is critical for their large-scale deployment as energy storage devices in electric vehicles and stationary storage. A fundamental understanding of the factors affecting RUL is crucial for accelerating battery technology development. However, it is very challenging to predict RUL accurately because of complex degradation mechanisms occurring within the batteries, as well as dynamic operating conditions in practical applications. Moreover, due to insignificant capacity degradation in early stages, early prediction of battery life with early cycle data can be more difficult. In this paper, we propose a hybrid deep learning model for early prediction of battery RUL. The proposed method can effectively combine handcrafted features with domain knowledge and latent features learned by deep networks to boost the performance of RUL early prediction. We also design a non-linear correlation-based method to select effective domain knowledge-based features. Moreover, a novel snapshot ensemble learning strategy is proposed to further enhance model generalization ability without increasing any additional training cost. Our experimental results show that the proposed method not only outperforms other approaches in the primary test set having a similar distribution as the training set, but also generalizes well to the secondary test set having a clearly different distribution with the training set. The PyTorch implementation of our proposed approach is available athttps://github.com/batteryrul/battery_rul_early_prediction.   相似文献   

11.
费大胜  宋慧慧  张开华 《计算机应用》2005,40(11):3300-3305
为了解决全卷积孪生视觉跟踪网络(SiamFC)出现相似语义信息干扰物使得跟踪目标发生漂移,导致跟踪失败的问题,设计出一种基于多层特征增强的实时视觉跟踪网络(MFESiam),分别去增强高层和浅层的特征表示能力,从而提升算法的鲁棒性。首先,对于浅层特征,利用一个轻量并且有效的特征融合策略,通过一种数据增强技术模拟一些在复杂场景中的变化,例如遮挡、相似物干扰、快速运动等来增强浅层特征的纹理特性;其次,对于高层特征,提出一个像素感知的全局上下文注意力机制模块(PCAM)来提高目标的长时定位能力;最后,在三个具有挑战性的跟踪基准库OTB2015、GOT-10K和2018年视觉目标跟踪库(VOT2018)上进行大量实验。实验结果表明,所提算法在OTB2015和GOT-10K上的成功率指标比基准SiamFC分别高出6.3个百分点和4.1个百分点,并且以每秒45帧的速度运行达到实时跟踪。在VOT2018实时挑战上,所提算法的平均期望重叠率指标超过2018年的冠军,即高性能的候选区域孪生视觉跟踪器(SiamRPN),验证了所提算法的有效性。  相似文献   

12.
费大胜  宋慧慧  张开华 《计算机应用》2020,40(11):3300-3305
为了解决全卷积孪生视觉跟踪网络(SiamFC)出现相似语义信息干扰物使得跟踪目标发生漂移,导致跟踪失败的问题,设计出一种基于多层特征增强的实时视觉跟踪网络(MFESiam),分别去增强高层和浅层的特征表示能力,从而提升算法的鲁棒性。首先,对于浅层特征,利用一个轻量并且有效的特征融合策略,通过一种数据增强技术模拟一些在复杂场景中的变化,例如遮挡、相似物干扰、快速运动等来增强浅层特征的纹理特性;其次,对于高层特征,提出一个像素感知的全局上下文注意力机制模块(PCAM)来提高目标的长时定位能力;最后,在三个具有挑战性的跟踪基准库OTB2015、GOT-10K和2018年视觉目标跟踪库(VOT2018)上进行大量实验。实验结果表明,所提算法在OTB2015和GOT-10K上的成功率指标比基准SiamFC分别高出6.3个百分点和4.1个百分点,并且以每秒45帧的速度运行达到实时跟踪。在VOT2018实时挑战上,所提算法的平均期望重叠率指标超过2018年的冠军,即高性能的候选区域孪生视觉跟踪器(SiamRPN),验证了所提算法的有效性。  相似文献   

13.
Yang  Lianping  Zhang  Hongliang  Wei  Panpan  Sun  Yubo  Zhang  Xiangde 《Applied Intelligence》2021,51(7):5025-5039

High accuracy and fast face alignment algorithms play an important role in many face-related applications. Generally, the model speed is inversely related to the number of parameters. We construct our network based on densely connected encoder-decoders, which is an efficient method to balance the parameter number and localization results. In each encoder-decoder, we introduce stacking depthwise convolution and depthwise feature fusion within the same channel, which greatly improves the performance of depthwise convolution and reduces the number of model parameters. In addition, we enhance the mean square loss function by assigning different penalty weights to each coordinate according to the distance to the position corresponding to the maximum value in the label heatmap. Experiments show that the model with the improved loss function obtains better localization results. In the experiment, we compare our method to state-of-the-art methods based on 300W and WFLW. The localization error is 2.76% with the common subset of 300W and the model size (0.7M) is small and even utilizes approximately 1% of the number of parameters of the other models. The dataset and model based on WFLW are publicly available at https://github.com/iam-zhanghongliang/DC-EDN.

  相似文献   

14.
束平  许克应  鲍华 《计算机应用研究》2022,39(4):1237-1241+1246
目标跟踪是计算机视觉方向上的一项重要课题,其中尺度变化、形变和旋转是目前跟踪领域较难解决的问题。针对以上跟踪中所面临的具有挑战性的问题,基于已有的孪生网络算法提出多层特征融合和并行自注意力的孪生网络目标跟踪算法(MPSiamRPN)。首先,用修改后的ResNet50对模板图片和搜索图片进行特征提取,为处理网络过深而导致目标部分特征丢失,提出多层特征融合模块(multi-layer feature fusion module, MLFF)将ResNet后三层特征进行融合;其次,引入并行自注意力模块(parallel self-attention module, PSA),该模块由通道自注意力和空间自注意力组成,通道自注意力可以选择性地强调对跟踪有益的通道特征,空间自注意力能学习目标丰富的空间信息;最后,采用区域提议网络(regional proposal network, RPN)来完成分类和回归操作,从而确定目标的位置和形状。实验显示,提出的MPSiamRPN在OTB100、VOT2018两个测试数据集上取得了具有可竞争性的结果。  相似文献   

15.

C-Mantec neural network constructive algorithm Ortega (C-Mantec neural network algorithm implementation on MATLAB. https://github.com/IvanGGomez/CmantecPaco, 2015) creates very compact architectures with generalization capabilities similar to feed-forward networks trained by the well-known back-propagation algorithm. Nevertheless, constructive algorithms suffer much from the problem of overfitting, and thus, in this work the learning procedure is first analyzed for networks created by this algorithm with the aim of trying to understand the training dynamics that will permit optimization possibilities. Secondly, several optimization strategies are analyzed for the position of class separating hyperplanes, and the results analyzed on a set of public domain benchmark data sets. The results indicate that with these modifications a small increase in prediction accuracy of C-Mantec can be obtained but in general this was not better when compared to a standard support vector machine, except in some cases when a mixed strategy is used.

  相似文献   

16.
针对传统孪生网络目标跟踪算法在相似物干扰、目标形变、复杂背景等跟踪环境下无法进行鲁棒跟踪的问题,提出了注意力机制指导的孪生网络目标跟踪方法,以弥补传统孪生跟踪方法存在的性能缺陷.首先,利用卷积神经网络ResNet50的不同网络层来提取多分辨率的目标特征,并设计互注意力模块使模板分支与搜索分支之间的信息能够相互流动.然后...  相似文献   

17.
温静  李强 《计算机应用》2021,41(12):3565-3570
充分利用视频中的时空上下文信息能明显提高目标跟踪性能,但目前大多数基于深度学习的目标跟踪算法仅利用当前帧的特征信息来定位目标,没有利用同一目标在视频前后帧的时空上下文特征信息,导致跟踪目标易受到邻近相似目标的干扰,从而在跟踪定位时会引入一个潜在的累计误差。为了保留时空上下文信息,在SiamMask算法的基础上引入一个短期记忆存储池来存储历史帧特征;同时,提出了外观显著性增强模块(ASBM),一方面增强跟踪目标的显著性特征,另一方面抑制周围相似目标对目标的干扰。基于此,提出一种基于时空上下文信息增强的目标跟踪算法。在VOT2016、VOT2018、DAVIS-2016和DAVIS-2017等四个数据集上进行实验与分析,结果表明所提出的算法相较于SiamMask算法在VOT2016上的准确率和平均重叠率(EAO)分别提升了4个百分点和2个百分点;在VOT2018上的准确率、鲁棒性和EAO分别提升了3.7个百分点、2.8个百分点和1个百分点;在DAVIS-2016上的区域相似度、轮廓精度指标中的下降率均分别降低了0.2个百分点;在DAVIS-2017上的区域相似度、轮廓精度指标中的下降率分别降低了1.3和0.9个百分点。  相似文献   

18.

Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans in crowded scenes, such as group behavior analysis, person re-identification, e-commerce, media editing, video surveillance, autonomous driving and virtual reality, etc. To perform well, models need to comprehensively perceive the semantic information and the differences between instances in a multi-human image, which is recently defined as the multi-human parsing task. In this paper, we first present a new large-scale database “Multi-human Parsing (MHP v2.0)” for algorithm development and evaluation to advance the research on understanding humans in crowded scenes. MHP v2.0 contains 25,403 elaborately annotated images with 58 fine-grained semantic category labels and 16 dense pose key point labels, involving 2–26 persons per image captured in real-world scenes from various viewpoints, poses, occlusion, interactions and background. We further propose a novel deep Nested Adversarial Network (NAN) model for multi-human parsing. NAN consists of three Generative Adversarial Network-like sub-nets, respectively performing semantic saliency prediction, instance-agnostic parsing and instance-aware clustering. These sub-nets form a nested structure and are carefully designed to learn jointly in an end-to-end way. NAN consistently outperforms existing state-of-the-art solutions on our MHP and several other datasets, including MHP v1.0, PASCAL-Person-Part and Buffy. NAN serves as a strong baseline to shed light on generic instance-level semantic part prediction and drive the future research on multi-human parsing. With the above innovations and contributions, we have organized the CVPR 2018 Workshop on Visual Understanding of Humans in Crowd Scene (VUHCS 2018) and the Fine-Grained Multi-human Parsing and Pose Estimation Challenge. These contributions together significantly benefit the community. Code and pre-trained models are available at https://github.com/ZhaoJ9014/Multi-Human-Parsing_MHP.

  相似文献   

19.
In recent years, deep learning has been successfully applied to diverse multimedia research areas, with the aim of learning powerful and informative representations for a variety of visual recognition tasks. In this work, we propose convolutional fusion networks (CFN) to integrate multi-level deep features and fuse a richer visual representation. Despite recent advances in deep fusion networks, they still have limitations due to expensive parameters and weak fusion modules. Instead, CFN uses 1 × 1 convolutional layers and global average pooling to generate side branches with few parameters, and employs a locally-connected fusion module, which can learn adaptive weights for different side branches and form a better fused feature. Specifically, we introduce three key components of the proposed CFN, and discuss its differences from other deep models. Moreover, we propose fully convolutional fusion networks (FCFN) that are an extension of CFN for pixel-level classification applied to several tasks, such as semantic segmentation and edge detection. Our experiments demonstrate that CFN (and FCFN) can achieve promising performance by consistent improvements for both image-level and pixel-level classification tasks, compared to a plain CNN. We release our codes on https://github.com/yuLiu24/CFN. Also, we make a live demo (goliath.liacs.nl) using a CFN model trained on the ImageNet dataset.  相似文献   

20.
Sadrfaridpour  Ehsan  Razzaghi  Talayeh  Safro  Ilya 《Machine Learning》2019,108(11):1879-1917

The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters which requires computationally expensive fitting. This increases the quality but also reduces the performance dramatically. We introduce a generalized fast multilevel framework for regular and weighted SVM and discuss several versions of its algorithmic components that lead to a good trade-off between quality and time. Our framework is implemented using PETSc which allows an easy integration with scientific computing tasks. The experimental results demonstrate significant speed up compared to the state-of-the-art nonlinear SVM libraries. Reproducibility: our source code, documentation and parameters are available at https://github.com/esadr/mlsvm.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号