首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
跨模态行人重识别(Re-ID)是智能监控系统所面临的一项具有很大挑战的问题,现有的跨模态研究方法中主要基于全局或局部学习表示有区别的模态共享特征。然而,很少有研究尝试融合全局与局部的特征表示。该文提出一种新的多粒度共享特征融合(MSFF)网络,该网络结合了全局和局部特征来学习两种模态的不同粒度表示,从骨干网络中提取多尺度、多层次的特征,全局特征表示的粗粒度信息与局部特征表示的细粒度信息相互协同,形成更具有区别度的特征描述符。此外,为使网络能够提取更有效的共享特征,该文还针对网络中的两种模态的嵌入模式提出了子空间共享特征模块的改进方法,改变传统模态特征权重的特征嵌入方式。将该模块提前放入骨干网络中,使两种模态的各自特征映射到同一子空间中,经过骨干网络产生更丰富的共享权值。在两个公共数据集实验结果证明了所提方法的有效性,SYSU-MM01数据集最困难全搜索单镜头模式下平均精度mAP达到了60.62%。  相似文献   

2.
行人再识别问题中,包含语义信息的中层特征能够提供更强的判别力.由于中层特征也采用局部匹配方式,与底层特征一样存在由于不同行人部分表观区域比较相似而产生误匹配问题.考虑到行人几乎都处于站立姿态,同一行人在垂直方向上的表观序列比不同行人的更相似,提出了在中层特征的基础上引入行人垂直全局表观约束,并融合底层稠密块匹配的识别方法.实验结果表明,算法在最具挑战的公用VIPeR数据库和CUHK01数据库上,均取得了比现有方法更高的命中率.  相似文献   

3.
巩萍  程玉虎  王雪松 《电子学报》2015,43(12):2476-2483
现有肺结节良恶性计算机辅助诊断的依据通常为肺部CT图像的底层特征,而临床医生的诊断依据为高级语义特征.为克服这种图像底层特征和高级语义特征之间的不一致性,提出一种基于语义属性的肺结节良恶性判别方法.首先,利用阈值概率图方法提取肺结节图像;其次,一方面提取肺结节图像的形状、灰度、纹理、大小和位置等底层特征,组成样本特征集.另一方面,根据专家对肺结节属性的标注,提取结节属性集;然后,根据特征集和属性集建立属性预测模型,实现两者之间的映射;最后,利用预测的属性进行肺结节的良恶性分类.LIDC数据库上的实验结果表明所提方法具有较高的分类精度和AUC值.  相似文献   

4.
林森  赵振禹  任晓奎  陶志勇 《红外与激光工程》2022,51(8):20210702-1-20210702-12
3D点云数据处理在物体分割、医学图像分割和虚拟现实等领域起到了重要作用。然而现有3D点云学习网络全局特征提取范围小,难以描述局部高级语义信息,进而导致点云特征表述不完整。针对这些问题,提出一种基于语义信息补偿全局特征的物体点云分类分割网络。首先,将输入的点云数据对齐到规范空间,进行数据的输入转换预处理。然后,利用扩张边缘卷积模块提取转换后数据的每一层特征,并叠加生成全局特征。而在局部特征提取时,利用提取到的低级语义信息来描述高级语义信息和有效几何特征,用于补偿全局特征中遗漏的点云特征。最后,融合全局特征和局部高级语义信息得到点云的整体特征。实验结果表明,文中方法在分类和分割性能上优于目前经典和新颖的算法。  相似文献   

5.
Humans tend to allocate attention to semantic entities. Objects are important in fixation selection, but not all the objects are equally attractive. In this paper, we introduce the concept of attribute bias to characterize the influence of semantic attributes compared with low-level saliency on fixation distribution. Two different ways are adopted to get two sets of semantic attributes. In both cases, most semantic attributes have a positive influence on drawing attention and contribute more than low-level saliency in object areas. We also find that attribute bias is robust to low-level saliency and can consistently reflect the relative attractiveness of objects with different semantic attributes. It is demonstrated that such bias helps make better fixation predictions by distinguishing the importance of objects, although low-level saliency models with better performance are less dramatically improved by attribute bias. These findings indicate the role of conceptual meaning as opposed to features in visual attention.  相似文献   

6.
针对无人机视角下车辆由于尺度小分辨率低等问题而难以精确分类定位,本文设计了一个轻量级特征提取网络用于提供车辆的多尺度中低层信息,并分别将其融入到主干神经网络中,实现中低层特征信息的传递;同时利用主干网络提取有利于车辆与背景或其他类别分类的高级语义信息,然后将深层高级语义特征与浅层特征进行融合实现高级语义信息的传递,因此类似引入双向网络能够有效地传递不同层次的信息,增强车辆的特征信息表示。此外,采用多路空洞卷积进行特征提取,使得中低层信息更加丰富多样性;并设计了一种灵活有效的融合模块,能够将中低层信息较好地融入到主干网络中增强目标车辆的判别性特征。实验结果表明,该算法能够在无人机数据集上取得很好的检测效果,同样满足实时的应用需求。   相似文献   

7.
With the rapid development of mobile Internet and digital technology, people are more and more keen to share pictures on social networks, and online pictures have exploded. How to retrieve similar images from large-scale images has always been a hot issue in the field of image retrieval, and the selection of image features largely affects the performance of image retrieval. The Convolutional Neural Networks (CNN), which contains more hidden layers, has more complex network structure and stronger ability of feature learning and expression compared with traditional feature extraction methods. By analyzing the disadvantage that global CNN features cannot effectively describe local details when they act on image retrieval tasks, a strategy of aggregating low-level CNN feature maps to generate local features is proposed. The high-level features of CNN model pay more attention to semantic information, but the low-level features pay more attention to local details. Using the increasingly abstract characteristics of CNN model from low to high. This paper presents a probabilistic semantic retrieval algorithm, proposes a probabilistic semantic hash retrieval method based on CNN, and designs a new end-to-end supervised learning framework, which can simultaneously learn semantic features and hash features to achieve fast image retrieval. Using convolution network, the error rate is reduced to 14.41% in this test set. In three open image libraries, namely Oxford, Holidays and ImageNet, the performance of traditional SIFT-based retrieval algorithms and other CNN-based image retrieval algorithms in tasks are compared and analyzed. The experimental results show that the proposed algorithm is superior to other contrast algorithms in terms of comprehensive retrieval effect and retrieval time.  相似文献   

8.
The task of multimodal sentiment classification aims to associate multimodal information, such as images and texts with appropriate sentiment polarities. There are various levels that can affect human sentiment in visual and textual modalities. However, most existing methods treat various levels of features independently without having effective method for feature fusion. In this paper, we propose a multi-level fusion classification (MFC) model to predict the sentiment polarity based on the fusing features from different levels by exploiting the dependency among them. The proposed architecture leverages convolutional neural networks ( CNNs) with multiple layers to extract levels of features in image and text modalities. Considering the dependencies within the low-level and high-level features, a bi-directional (Bi) recurrent neural network (RNN) is adopted to integrate the learned features from different layers in CNNs. In addition, a conflict detection module is incorporated to address the conflict between modalities. Experiments on the Flickr dataset demonstrate that the MFC method achieves comparable performance compared with strong baseline methods.  相似文献   

9.
Most of current salient object detection (SOD) methods focus on well-lit scenes, and their performance drops when generalized into low-light scenes due to limitations such as blurred boundaries and low contrast. To solve this problem, we propose a global guidance-based integration network (G2INet) customized for low-light SOD. First, we propose a Global Information Flow (GIF) to extract comprehensive global information, for guiding the fusion of multi-level features. To facilitate information integration, we design a Multi-level features Cross Integration (MCI) module, which progressively fuses low-level details, high-level semantics, and global information by interweaving. Furthermore, a U-shaped Attention Refinement (UAR) module is proposed to further refine edges and details for accurate saliency predictions. In terms of five metrics, extensive experimental results demonstrate that our method outperforms the existing twelve state-of-the-art models.  相似文献   

10.
Nowadays, software‐defined networking (SDN) is regarded as the best solution for the centralized handling and monitoring of large networks. However, it should be noted that SDN architecture suffers from the same security issues, which are the case with common networks. As a case in point, one of the shortcomings of SDNs is related to its high vulnerability to distributed denial of service (DDoS) attacks and other similar ones. Indeed, anomaly detection systems have been considered to deal with these attacks. The challenges are related to designing these systems including gathering data, extracting effective features, and selecting the best model for anomaly detection. In this paper, a novel combined approach is proposed; this method uses NetFlow protocol for gathering information and generating dataset, information gain ratio (IGR), in order to select the effective and relevant features and ensemble learning scheme (Stacking) for developing a structure with desirable performance and efficiency for detecting anomaly in SDN environment. The results obtained from the experiments revealed that the proposed method performs better than other methods in terms of enhancing accuracy (AC) and detection rate (DR) and reducing classification error (CE) and false alarm rate (FAR). The AC, DR, CE, and FAR of the proposed model were measured as 99.92%, 99.83%, 0.08%, and 0.03%, respectively. Furthermore, the proposed method prevents the occurrence of excessive overload on the controller and OpenFlow.  相似文献   

11.
为了提高行人属性识别的准确率,提出了一种基于多尺度注意力网络的行人属性识别算法。为了提高算法的特征表达能力和属性判别能力,首先,在残差网络ResNet50的基础上,增加了自顶向下的特征金字塔和注意力模块,自顶向下的特征金字塔由自底向上提取的视觉特征构建;然后,融合特征金字塔中不同尺度的特征,为每层特征的通道注意力赋予不同的权重。最后,改进了模型损失函数以减弱数据不平衡对属性识别率的影响。在RAP和PA-100K数据集上的实验结果表明,与现有算法相比,本算法对行人属性识别的平均精度、准确度、F1性能更好。  相似文献   

12.
Learning-based shadow detection methods have achieved an impressive performance, while these works still struggle on complex scenes, especially ambiguous soft shadows. To tackle this issue, this work proposes an efficient shadow detection network (ESDNet) and then applies uncertainty analysis and graph convolutional networks for detection refinement. Specifically, we first aggregate global information from high-level features and harvest shadow details in low-level features for obtaining an initial prediction. Secondly, we analyze the uncertainty of our ESDNet for an input shadow image and then take its intensity, expectation, and entropy into account to formulate a semi-supervised graph learning problem. Finally, we solve this problem by training a graph convolution network to obtain the refined detection result for every training image. To evaluate our method, we conduct extensive experiments on several benchmark datasets, i.e., SBU, UCF, ISTD, and even on soft shadow scenes. Experimental results demonstrate that our strategy can improve shadow detection performance by suppressing the uncertainties of false positive and false negative regions, achieving state-of-the-art results.  相似文献   

13.
With the widespread deployment of cloud services, data center networks are developing toward large‐scale, multi‐path networks. Conventional switching‐oriented data center network meets difficulties in terms of scalability and flexibility to support increasing bandwidth requirements for cloud services. To solve this problem, a simple and scalable architecture, MatrixDCN, is proposed in this paper. MatrixDCN is an approximate non‐blocking network, in which switches and servers are arranged in rows and columns that compose a matrix structure. A MatrixDCN network can accommodate up to hundreds of thousands of servers without bandwidth bottlenecks. Furthermore, the physical topology of a MatrixDCN network can be designed consistently with its logic topology, which helps to reduce the complexity of the management and maintenance of a data center. An efficient routing algorithm, named fault‐avoidance routing (FAR), is well designed for MatrixDCN to fully leverage the regularity in the topology. FAR builds two routing tables for a router. A BRT is built based on local topology, and a novel negative routing table (NRT) is increasingly built based on learned partial network failures, which really avoids the problem of network convergence and further shortens the calculating time of routing tables. FAR also greatly reduces the size of routing tables by introducing NRTs at routers. Theoretical analysis and simulations show that MatrixDCN has advantages on the scalability of topology, network throughput, and the performance of FAR. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

14.
人脸表情识别在人机交互等人工智能领域发挥着 重要作用,当前研究忽略了人脸的语 义信息。本 文提出了一种融合局部语义与全局信息的人脸表情识别网络,由两个分支组成:局部语义区 域提取分支 和局部-全局特征融合分支。首先利用人脸解析数据集训练语义分割网络得到人脸语义解析 ,通过迁移训 练的方法得到人脸表情数据集的语义解析。在语义解析中获取对表情识别有意义的区域及其 语义特征, 并将局部语义特征与全局特征融合,构造语义局部特征。最后,融合语义局部特征与全局特 征构成人脸 表情的全局语义复合特征,并通过分类器分为7种基础表情之一。本文同时提出了解冻部分 层训练策略, 该训练策略使语义特征更适用于表情识别,减 少语义信息冗余性。在两个公开数据集JAFFE 和KDEF上 的平均识别准确率分别达到了93.81%和88.78% ,表现优于目前的深度学习方法和传统方法。实验结果证 明了本文提出的融合局部语义和全局信息的网络能够很好地描述表情信息。  相似文献   

15.
To extract decisive features from gesture images and solve the problem of information redundancy in the existing gesture recognition methods, we propose a new multi-scale feature extraction module named densely connected Res2Net (DC-Res2Net) and design a feature fusion attention module (FFA). Firstly, based on the new dimension residual network (Res2Net), the DC-Res2Net uses channel grouping to extract fine-grained multi-scale features, and dense connection has been adopted to extract stronger features of different scales. Then, we apply a selective kernel network (SK-Net) to enhance the representation of effective features. Afterwards, the FFA has been designed to remove redundant information in features by fusing low-level location features with high-level semantic features. Finally, experiments have been conducted to validate our method on the OUHANDS, ASL, and NUS-II datasets. The results demonstrate the superiority of DC-Res2Net and FFA, which can extract more decisive features and remove redundant information while ensuring high recognition accuracy and low computational complexity.  相似文献   

16.
针对行人重识别无监督跨域迁移问题,提出一种 基于域鉴别网络和域自适应的行人重识别算法。首先,使用改 进ResNet-50训练监督域鉴别网络模型,加入共享空间组件得到特征 不变属性,用于区分类间图像,并基 于对比损失和差异损失来提高模型的分类性能。其次,利用域自适应无监督迁移方法由源域 数据集导出特 征不变属性,并应用到未标记的目标域数据集上。最后,匹配查询图像和共享空间中的图库 图像执行跨域 行人重识别。为验证算法有效性,在CUHK03、Market-1501和DukeMTMC-reID数据集上进行了实验,算法 在Rank-1准确度分别达到34.1%、38.1%和28.3%,在mAP分别达到34.2%、17. 1%和17.5%,最后还验证了 模型各个组件在训练阶段的必要性。结果表明本文算法在大规模数据集上的性能优于现有的 一些无监督行人重识别方法,甚至接近于某些传统监督学习方法的性能。  相似文献   

17.
Aggregation of local and global contextual information by exploiting multi-level features in a fully convolutional network is a challenge for the pixel-wise salient object detection task. Most existing methods still suffer from inaccurate salient regions and blurry boundaries. In this paper, we propose a novel edge-aware global and local information aggregation network (GLNet) to fully exploit the integration of side-output local features and global contextual information and utilization of contour information of salient objects. The global guidance module (GGM) is proposed to learn discriminative multi-level information with the direct guidance of global semantic knowledge for more accurate saliency prediction. Specifically, the GGM consists of two key components, where the global feature discrimination module exploits the inter-channel relationship of global semantic features to boost representation power, and the local feature discrimination module enables different side-output local features to selectively learn informative locations by fusing with global attentive features. Besides, we propose an edge-aware aggregation module (EAM) to employ the correlation between salient edge information and salient object information for generating estimated saliency maps with explicit boundaries. We evaluate our proposed GLNet on six widely-used saliency detection benchmark datasets by comparing with 17 state-of-the-art methods. Experimental results show the effectiveness and superiority of our proposed method on all the six benchmark datasets.  相似文献   

18.
目前主流的深度融合方法仅利用卷积运算来提取图像局部特征,但图像与卷积核之间的交互过程与内容无关,且不能有效建立特征长距离依赖关系,不可避免地造成图像上下文内容信息的丢失,限制了红外与可见光图像的融合性能。为此,本文提出了一种红外与可见光图像多尺度Transformer融合方法。以Swin Transformer为组件,架构了Conv Swin Transformer Block模块,利用卷积层增强图像全局特征的表征能力。构建了多尺度自注意力编码-解码网络,实现了图像全局特征提取与全局特征重构;设计了特征序列融合层,利用SoftMax操作计算特征序列的注意力权重系数,突出了源图像各自的显著特征,实现了端到端的红外与可见光图像融合。在TNO、Roadscene数据集上的实验结果表明,该方法在主观视觉描述和客观指标评价都优于其他典型的传统与深度学习融合方法。本方法结合自注意力机制,利用Transformer建立图像的长距离依赖关系,构建了图像全局特征融合模型,比其他深度学习融合方法具有更优的融合性能和更强的泛化能力。  相似文献   

19.
针对智能交通系统中小尺度交通标志识别率低的问题,文中提出一种改进卷积神经网络的交通标志识别方法。该方法通过在Faster R-CNN算法的低层特征图上增加优化的RPN网络,提升了小尺度交通标志的检测率。该方法还利用Max Pooling方法实了现图像的局部细节特征与全局语义特征充分融合。在TT-100K数据集上稍微实验结果表明新方法可以明显提高小尺度交通标志的识别率。  相似文献   

20.
This paper proposes a novel two-stream encoder–decoder network that utilizes both the high-level and the low-level image features for precisely localizing forged regions in a manipulated image. This is motivated by the fact that the forgery creation process generally introduces both the high-level artefacts (e.g., unnatural contrast) and the low-level artefacts (e.g., noise inconsistency) to the forged images. In the proposed two-stream network, one stream learns the low-level manipulation-related features in the encoder side by extracting noise residuals through a set of high-pass filters in the first layer. In the second stream, the encoder learns the high-level image manipulation features from the input image RGB values. The coarse feature maps each encoder are upsampled by the corresponding decoder network to produce the dense feature maps. The dense feature maps of the two streams are concatenated and fed to a final convolutional layer with sigmoidal activation to produce the pixel-wise prediction. We have carried out experimental analyses on multiple standard forensics datasets to evaluate the performance of the proposed method. The experimental results show the efficacy of the proposed method with respect to the state-of-the-art.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号