首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
行人图像在行人再识别中常通过行人检测器自动检测获得,不仅包含行人主体,还包含一些干扰信息(比如,背景、遮挡等)。在基于注意力机制的行人再识别中,增强了对具有显著性特征行人部件的关注,削弱了对带有干扰信息部件的关注,有利于提取更具辨别力的行人特征表示。在深度学习中,卷积神经网络通过对特征映射重新赋权值,得到注意力特征,提出了一种新颖的基于聚类的全局注意力模块(cluster-based global attention module,CGAM)。在CGAM中,将注意力权重学习过程重新考虑为聚类中心学习过程,将特征映射中的空间位置点视为特征节点,通过聚类算法得到每个特征节点的重要分数并进行归一化后作为注意力权重。利用改进的Resnet50作为基本框架,嵌入注意力模块,得到注意力网络,仅使用了全局分支,具有简单高效特点。综上,基于聚类的注意力设计不仅充分利用了特征节点之间的成对相关性,而且挖掘了丰富的全局结构信息,得到一组更可信的注意力权重。实验结果表明,提出的行人再识别算法在Market-1501和DukeMTMC-reID两个流行数据集上均有显著的效果。  相似文献   

2.
Li  Zhi  Guo  Jun  Jiao  Wenli  Xu  Pengfei  Liu  Baoying  Zhao  Xiaowei 《Multimedia Tools and Applications》2020,79(7-8):4931-4947

Person Re-Identification (person re-ID) is an image retrieval task which identifies the same person in different camera views. Generally, a good person re-ID model requires a large dataset containing over 100000 images to reduce the risk of over-fitting. Most current handcrafted person re-ID datasets, however, are insufficient for training a learning model with high generalization ability. In addition, the lacking of images with various levels of occlusion is still remaining in most existing datasets. Motivated by these two problems, this paper proposes a new data augmentation method called Random Linear Interpolation that can enlarge the sizes of person re-ID datasets and improve the generalization ability of the learning model. The key enabler of our approach is generating fused images by interpolating pairs of original images. In other words, the innovation of the proposed approach is considering data augmentation between two random samples. Plenty of experimental results demonstrates that the proposed method is effective to improve baseline models. On Market1501 and DukeMTMC-reID datasets, our approach can achieve 92.71% and 82.19% rank-1 accuracy, respectively.

  相似文献   

3.
The Convolutional Neural Network (CNN) has significantly improved the state-of-the-art in person re-identification (re-ID). In the existing available identification CNN model, the softmax loss function is employed as the supervision signal to train the CNN model. However, the softmax loss only encourages the separability of the learned deep features between different identities. The distinguishing intra-class variations have not been considered during the training process of CNN model. In order to minimize the intra-class variations and then improve the discriminative ability of CNN model, this paper combines a new supervision signal with original softmax loss for person re-ID. Specifically, during the training process, a center of deep features is learned for each pedestrian identity and the deep features are subtracted from the corresponding identity centers, simultaneously. So that, the deep features of the same identity to the center will be pulled efficiently. With the combination of loss functions, the inter-class dispersion and intra-class aggregation can be constrained as much as possible. In this way, a more discriminative CNN model, which has two key learning objectives, can be learned to extract deep features for person re-ID task. We evaluate our method in two identification CNN models (i.e., CaffeNet and ResNet-50). It is encouraging to see that our method has a stable improvement compared with the baseline and yields a competitive performance to the state-of-the-art person re-ID methods on three important person re-ID benchmarks (i.e., Market-1501, CUHK03 and MARS).  相似文献   

4.
5.
Huang  Lei  Zhang  Wenfeng  Nie  Jie  Wei  Zhiqiang 《Multimedia Tools and Applications》2021,80(11):16413-16423

Person re-identification plays important roles in many practical applications. Due to various human poses, complex backgrounds and similarity of person clothes, person re-identification is still a challenging task. In this paper, we mainly focus on the robust and discriminative appearance feature representation and proposed a novel multi-appearance method for person re-identification. First, we proposed a deep feature fusion method and get the multi-appearance feature by combining two Convolutional Neural Networks. Then, in order to further enhance the representation of the appearance feature, the multi-part model was constructed by combining the whole body and the six body parts. Additionally, we optimized the feature extraction process by adding a pooling layer. Comprehensive and comparative experiments with the state-of-the-art methods over publicly available datasets demonstrated that the proposed method can get promising results.

  相似文献   

6.

In this paper, we propose a new feature selection method called kernel fisher discriminant analysis and regression learning based algorithm for unsupervised feature selection. The existing feature selection methods are based on either manifold learning or discriminative techniques, each of which has some shortcomings. Although some studies show the advantages of two-steps method benefiting from both manifold learning and discriminative techniques, a joint formulation has been shown to be more efficient. To do so, we construct a global discriminant objective term of a clustering framework based on the kernel method. We add another term of regression learning into the objective function, which can impose the optimization to select a low-dimensional representation of the original dataset. We use L2,1-norm of the features to impose a sparse structure upon features, which can result in more discriminative features. We propose an algorithm to solve the optimization problem introduced in this paper. We further discuss convergence, parameter sensitivity, computational complexity, as well as the clustering and classification accuracy of the proposed algorithm. In order to demonstrate the effectiveness of the proposed algorithm, we perform a set of experiments with different available datasets. The results obtained by the proposed algorithm are compared against the state-of-the-art algorithms. These results show that our method outperforms the existing state-of-the-art methods in many cases on different datasets, but the improved performance comes with the cost of increased time complexity.

  相似文献   

7.

In this work, we present a novel multi-scale feature fusion network (M-FFN) for image captioning task to incorporate discriminative features and scene contextual information of an image. We construct multi-scale feature fusion network by leveraging spatial transformation and multi-scale feature pyramid networks via feature fusion block to enrich spatial and global semantic information. In particular, we take advantage of multi-scale feature pyramid network to incorporate global contextual information by employing atrous convolutions on top layers of convolutional neural network (CNN). And, the spatial transformation network is exploited on early layers of CNN to remove intra-class variability caused by spatial transformations. Further, the feature fusion block integrates both global contextual information and spatial features to encode the visual information of an input image. Moreover, spatial-semantic attention module is incorporated to learn attentive contextual features to guide the captioning module. The efficacy of the proposed model is evaluated on the COCO dataset.

  相似文献   

8.
Hu  Zheng-ping  Zhang  Rui-xue  Qiu  Yue  Zhao  Meng-yao  Sun  Zhe 《Multimedia Tools and Applications》2021,80(24):33179-33192

C3D has been widely used for video representation and understanding. However, it is performed on spatio-temporal contexts in a global view, which often weakens its capacity of learning local representation. To alleviate this problem, a concise and novel multi-layer feature fusion network with the cooperation of local and global views is introduced. For the current network, the global view branch is used to learn the core video semantics, while the local view branch is used to capture the contextual local semantics. Unlike traditional C3D model, the global view branch can only provide the big view branch with the most activated video features from a broader 3D receptive field. Via adding such shallow-view contexts, the local view branch can learn more robust and discriminative spatio-temporal representations for video classification. Thus we propose 3D convolutional networks with multi-layer-pooling selection fusion for video classification, the integrated deep global feature is combined with the information originated from shallow layer of local feature extraction networks, through the space-time pyramid pooling, adaptive pooling and attention pooling three different pooling units, different time–space feature information is obtained, and finally cascaded and used for classification. Experiments on the UCF-101 and HMDB-51 datasets achieve correct classification rate 95.0% and 72.2% respectively. The results show that the proposed 3D convolutional networks with multi-layer-pooling selection fusion has better classification performance.

  相似文献   

9.
10.
行人再识别通过大时空范围内跨摄像机目标行人图像的检索与匹配,可实现人脸等生物特征失效情况下的行人关联,已成为智能视频监控系统的关键环节和支撑技术,并在智慧公安、智慧城市等国民经济建设中发挥了重要作用。近年行人再识别技术吸引了越来越多的关注,并取得了快速发展与进步。本文在对行人再识别技术进行简介的基础上,面向行人再识别的技术发展和落地应用需求与挑战,总结分析遮挡行人再识别、无监督行人再识别、虚拟数据生成、域泛化行人再识别、换装行人再识别、跨模态行人再识别和行人搜索等热点方向的前沿进展,归纳其发展现状和存在问题,最后对行人再识别技术的发展趋势进行展望。希望通过总结和分析,能够为研究人员开展行人再识别相关研究、推动行人再识别技术进步提供参考。  相似文献   

11.
目的 人体目标再识别的任务是匹配不同摄像机在不同时间、地点拍摄的人体目标。受光照条件、背景、遮挡、视角和姿态等因素影响,不同摄相机下的同一目标表观差异较大。目前研究主要集中在特征表示和度量学习两方面。很多度量学习方法在人体目标再识别问题上了取得了较好的效果,但对于多样化的数据集,单一的全局度量很难适应差异化的特征。对此,有研究者提出了局部度量学习,但这些方法通常需要求解复杂的凸优化问题,计算繁琐。方法 利用局部度量学习思想,结合近几年提出的XQDA(cross-view quadratic discriminant analysis)和MLAPG(metric learning by accelerated proximal gradient)等全局度量学习方法,提出了一种整合全局和局部度量学习框架。利用高斯混合模型对训练样本进行聚类,在每个聚类内分别进行局部度量学习;同时在全部训练样本集上进行全局度量学习。对于测试样本,根据样本在高斯混合模型各个成分下的后验概率将局部和全局度量矩阵加权结合,作为衡量相似性的依据。特别地,对于MLAPG算法,利用样本在各个高斯成分下的后验概率,改进目标损失函数中不同样本的损失权重,进一步提高该方法的性能。结果 在VIPeR、PRID 450S和QMUL GRID数据集上的实验结果验证了提出的整合全局—局部度量学习方法的有效性。相比于XQDA和MLAPG等全局方法,在VIPeR数据集上的匹配准确率提高2.0%左右,在其他数据集上的性能也有不同程度的提高。另外,利用不同的特征表示对提出的方法进行实验验证,相比于全局方法,匹配准确率提高1.3%~3.4%左右。结论 有效地整合了全局和局部度量学习方法,既能对多种全局度量学习算法的性能做出改进,又能避免局部度量学习算法复杂的计算过程。实验结果表明,对于使用不同的特征表示,提出的整合全局—局部度量学习框架均可对全局度量学习方法做出改进。  相似文献   

12.
Huang  Junchu  Zhou  Zhiheng  Shang  Junyuan  Niu  Chang 《Multimedia Tools and Applications》2020,79(25-26):17923-17943

Heterogeneous domain adaptation is a challenging problem due to the fact that it requires generalizing a learning model across training data and testing data with different distributions and features. To alleviate the difficulty of this task, most researchers usually perform some data preprocessing operations. However, such operations may lead to the loss of shareable information before domain adaptation. Moreover, most current work neglects the structural information between data, which is crucial for classification. To overcome the limitations mentioned above, we propose a novel algorithm, named as heterogeneous discriminative features learning and label propagation (HDL), which includes i) features learning with label consistency through two domain-specific projections, and ii) label propagation through exploiting structural information of data. Notably, each of the two sides reinforces each other. For each objective function, the corresponding analytical solutions are presented. Comprehensive experimental evidence on a large number of text categorization, image sclassification and text to image recognition datasets verifies the effectiveness and efficiency of the proposed approach over several state-of-the-art methods.

  相似文献   

13.
目的 姿态变化和遮挡导致行人表现出明显差异,给行人再识别带来了巨大挑战。针对以上问题,本文提出一种融合形变与遮挡机制的行人再识别算法。方法 为了模拟行人姿态的变化,在基础网络输出的特征图上采用卷积的形式为特征图的每个位置学习两个偏移量,偏移量包括水平和垂直两个方向,后续的卷积操作通过考虑每个位置的偏移量提取形变的特征,从而提高网络应对行人姿态改变时的能力;为了解决遮挡问题,本文通过擦除空间注意力高响应对应的特征区域而仅保留低响应特征区域,模拟行人遮挡样本,进一步改善网络应对遮挡样本的能力。在测试阶段,将两种方法提取的特征与基础网络特征级联,保证特征描述子的鲁棒性。结果 本文方法在行人再识别领域3个公开大尺度数据集Market-1501、DukeMTMC-reID和CUHK03(包括detected和labeled)上进行评估,首位命中率Rank-1分别达到89.52%、81.96%、48.79%和50.29%,平均精度均值(mean average precision,mAP)分别达到73.98%、64.45%、43.77%和45.58%。结论 本文提出的融合形变与遮挡机制的行人再识别算法可以学习到鉴别能力更强的行人再识别模型,从而提取更加具有区分性的行人特征,尤其是针对复杂场景,在发生行人姿态改变及遮挡时仍能保持较高的识别准确率。  相似文献   

14.
多方向显著性权值学习的行人再识别   总被引:1,自引:1,他引:0       下载免费PDF全文
目的 针对当前行人再识别匹配块的显著性外观特征不一致的问题,提出一种对视角和背景变化具有较强鲁棒性的基于多向显著性相似度融合学习的行人再识别算法。方法 首先用流形排序估计目标的内在显著性,并融合类间显著性得到图像块的显著性;然后根据匹配块的4种显著性分布情况,通过多向显著性加权融合建立二者的视觉相似度,同时采用基于结构支持向量机排序的度量学习方法获得各方向显著性权重值,形成图像对之间全面的相似度度量。结果 在两个公共数据库进行再识别实验,本文算法较同类方法能获取更为全面的相似度度量,具有较高的行人再识别率,且不受背景变化的影响。对VIPeR数据库测试集大小为316对行人图像的再识别结果进行了定量统计,本文算法的第1识别率(排名第1的搜索结果即为待查询人的比率)为30%,第15识别率(排名前15的搜索结果中包含待查询人的比率)为72%,具有实际应用价值。结论 多方向显著性加权融合能对图像对的显著性分布进行较为全面的描述,进而得到较为全面的相似度度量。本文算法能够实现大场景非重叠多摄像机下的行人再识别,具有较高的识别力和识别精度,且对背景变化具有较强的鲁棒性。  相似文献   

15.
Ma  You  Liu  Zhi  Chen Chen  C. L. Philip 《Applied Intelligence》2022,52(3):2801-2812

Hyperspectral images (HSIs) classification have aroused a great deal of attention recently due to their wide range of practical prospects in numerous fields. Spatial-spectral fusion feature is widely used in HSI classification to get better performance. These methods are mostly based on a simple linear addition with the combined hyper-parameter to fuse the spatial and spectral information. It is necessary to fuse the features in a more suitable method. To solve this problem, we propose a novel HSI classification approach based on Hybrid spatial-spectral feature in broad learning system (HSFBLS). First, we employ an adaptive weighted mean filter to obtain spatial feature. Computing the weights of spatial and spectral channels in hybrid module by two BLS and uniting them with a weighted linear function. Then, we fuse the spectral-spatial feature by sparse autoencoder to get weighted fusion feature as the feature nodes to classify HSI data in BLS. By a two-stage fusion of spatial and spectral information, it can increase the classification accuracy contrast to simple combination. Very satisfactory classification results on typical HSI datasets illustrate the availability of proposed HSFBLS. Moreover, HSFBLS also reduce training time greatly contrast to time-consuming network.

  相似文献   

16.
Du  Haizhou  Duan  Ziyi 《Applied Intelligence》2022,52(3):2496-2509

The multivariate time series often contain complex mixed inputs, with complex correlations between them. Detecting change points in multivariate time series is of great importance, which can find anomalies early and reduce losses, yet very challenging as it is affected by many complex factors, i.e., dynamic correlations and external factors. The performance of traditional methods typically scales poorly. In this paper, we propose Finder, a novel approach of change point detection via multivariate fusion attention networks. Our model consists of two key modules. First, in the time series prediction module, we employ multi-level attention networks based on the Transformer and integrate the external factor fusion component, achieving feature extraction and fusion of multivariate data. Secondly, in the change point detection module, a deep learning classifier is used to detect change points, improving efficiency and accuracy. Extensive experiments prove the superiority and effectiveness of Finder on two real-world datasets. Our approach outperforms the state-of-the-art methods by up to 10.50% on the F1 score.

  相似文献   

17.
目的 车辆重识别指判断不同摄像设备拍摄的车辆图像是否属于同一辆车的检索问题。现有车辆重识别算法使用车辆的全局特征或额外的标注信息,忽略了对多尺度上下文信息的有效抽取。对此,本文提出了一种融合全局与空间多尺度上下文信息的车辆重识别模型。方法 首先,设计一个全局上下文特征选择模块,提取车辆的细粒度判别信息,并且进一步设计了一个多尺度空间上下文特征选择模块,利用多尺度下采样的方式,从全局上下文特征选择模块输出的判别特征中获得其对应的多尺度特征。然后,选择性地集成来自多级特征的空间上下文信息,生成车辆图像的前景特征响应图,以此提升模型对于车辆空间位置特征的感知能力。最后,模型组合了标签平滑的交叉熵损失函数和三元组损失函数,以提升模型对强判别车辆特征的整体学习能力。结果 在VeRi-776(vehicle re-idendification-776)数据集上,与模型PNVR(part-regularized near-duplicate vehicle re-identification)相比,本文模型的mAP(mean average precision)和rank-1 (cumulative...  相似文献   

18.
ABSTRACT

Feature extraction (FE) methods play a central role in the classification of hyperspectral images (HSIs). However, all traditional FE methods work in original feature space (OFS), OFS may suffer from noise, outliers and poorly discriminative features. This paper presents a feature space enriching technique to address the problems of noise, outliers and poorly discriminative features which may exist in OFS. The proposed method is based on low-rank representation (LRR) with the capability of pairwise constraint preserving (PCP) termed LRR-PCP. LRR-PCP does not change the dimension of OFS and only can be used as an appropriate preprocessing procedure for any classification algorithm or DR methods. The proposed LRR-PCP aims to enrich the OFS and obtain extracted feature space (EFS) which results in features richer than OFS. The problems of noise and outliers can be decreased using LRR. But, LRR cannot preserve the intrinsic local structure of the original data and only capture the global structure of data. Therefore, two additional penalty terms are added into the objective function of LRR to keep the local discriminative ability and also preserve the data diversity. LRR-PCP method not only can be used in supervised learning but also in unsupervised and semi-supervised learning frameworks. The effectiveness of LRR-PCP is investigated on three HSI data sets using some existing DR methods and as a denoising procedure before the classification task. All experimental results and quantitative analysis demonstrate that applying LRR-PCP on OFS improves the performance of the classification and DR methods in supervised, unsupervised, and semi-supervised conditions.  相似文献   

19.
Lin  Chuan  Zhang  Zhenguang  Hu  Yihua 《Applied Intelligence》2022,52(10):11027-11042

As the basis of mid-level and high-level vision tasks, edge detection has great significance in the field of computer vision. Edge detection methods based on deep learning usually adopt the structure of the encoding-decoding network, among which the deep convolutional neural network is generally adopted in the encoding network, and the decoding network is designed by researchers. In the design of the encoding-decoding network, researchers pay more attention to the design of the decoding network and ignore the influence of the encoding network, which makes the existing edge detection methods have the problems of weak feature extraction ability and insufficient edge information extraction. To improve the existing methods, this work combines the information transmission mechanism of the retina/lateral geniculate nucleus with an edge detection network based on convolutional neural network and proposes a bionic feature enhancement network. It consists of a pre-enhanced network, an encoding network, and a decoding network. By simulating the information transfer mechanism of the retina/lateral geniculate nucleus, we designed the pre-enhanced network to enhance the ability of the encoding network to extract details and local features. Based on the hierarchical structure of the visual pathway and the integrated feature function of the inferior temporal (IT) cortex, we designed a novel feature fusion network as a decoding network. In a feature fusion network, a down-sampling enhancement module is introduced to boost the feature integration ability of the decoding network. Experimental results demonstrate that we achieve state-of-the-art performance on several available datasets.

  相似文献   

20.
Minyoung Kim 《Pattern recognition》2011,44(10-11):2325-2333
We introduce novel discriminative semi-supervised learning algorithms for dynamical systems, and apply them to the problem of 3D human motion estimation. Our recent work on discriminative learning of dynamical systems has been proven to achieve superior performance than traditional generative learning approaches. However, one of the main issues of learning the dynamical systems is to gather labeled output sequences which are typically obtained from precise motion capture tools, hence expensive. In this paper we utilize a large amount of unlabeled (input) video data to improve the prediction performance of the dynamical systems significantly. We suggest two discriminative semi-supervised learning approaches that extend the well-known algorithms in static domains to the sequential, real-valued multivariate output domains: (i) self-training which we derive as coordinate ascent optimization of a proper discriminative objective over both model parameters and the unlabeled state sequences, (ii) minimum entropy approach which maximally reduces the model's uncertainty in state prediction for unlabeled data points. These approaches are shown to achieve significant improvement against the traditional generative semi-supervised learning methods. We demonstrate the benefits of our approaches on the 3D human motion estimation problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号