首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
产思贤  刘鹏  张卓 《光电子快报》2021,17(6):349-353
In the object detection task, how to better deal with small objects is a great challenge. The detection accuracy of small objects greatly affects the final detection performance. Our propose a detection framework WeBox based on weak edges for small object detection in dense scenes, and proposes to train the richer convolutional features (RCF) edges detection network in a weakly supervised way to generate multi-instance proposals. Then through the region proposal network (RPN) network to locate each object in the multi-instance proposals, in order to ensure the effectiveness of the multi-instance proposals, we correspondingly proposed a multi-instance proposals evaluation criterion. Finally, we use faster region-based convolutional neural network (R-CNN) to process WeBox single-instance proposals and fine-tune the final results at the pixel level. The experiments have been carried out on BDCI and TT100K proves that our method maintains high computational efficiency while effectively improving the accuracy of small objects detection.  相似文献   

2.
深度学习模型中的特征金字塔网络(Feature Pyramid Network,FPN)常被用作合成孔径雷达(Synthetic Aperture Radar,SAR)图像中多目标船舶的检测。针对复杂场景下多目标船舶检测问题,提出了一种基于改进锚点框的FPN模型。首先将特征金字塔模型嵌入传统的RPN(Region Proposal Network)并映射成新的特征空间用于目标检测,然后利用基于形状相似度距离(Shape Similar Distance,SSD)度量的Kmeans聚类算法优化FPN的初始锚点框,并使用SAR船舶数据集测试。实验结果表明,所提算法目标检测精确率达到98.62%,在复杂场景下与YOLO、Faster RCNN、FPN based on VGG/ResNet等模型进行对比,模型准确率提高,整体性能更好。  相似文献   

3.
To address the problem of identification of authenticity and integrity of video content and the location of video tampering area,a deep learning detection algorithm based on video noise flow was proposed.Firstly,based on SRM (spatial rich model) and C3D (3D convolution) neural network,a feature extractor,a frame discriminator and a RPN (region proposal network) based spatial locator were constructed.Secondly,the feature extractor was combined with the frame discriminator and the spatial locator respectively,and then two neural networks were built.Finally,two kinds of deep learning models were trained by the enhanced data,which were used to locate the tampered area in temporal domain and spatial domain respectively.The test results show that the accuracy of temporal-domain location is increased to 98.5%,and the average intersection over union of spatial localization and tamper area labeling is 49%,which can effectively locate the tamper area in temporal domain and spatial domain.  相似文献   

4.
Analysis of first-person (egocentric) videos involving human actions could help in the solutions of many problems. These videos include a large number of fine-grained action categories with hand–object interactions. In this paper, a compositional verb–noun model including two complementary temporal streams is proposed with various fusion strategies to recognize egocentric actions. The first step is based on construction of verb and object video models as decomposition of actions with a special attention on hands. Particularly, the verb video model that is the spatial–temporal encoding of hand actions and the object video model that is the object scores with hand–object layout are represented as two separate pathways. The second step is the fusion stage to identify action category, where distinct verb and object models are combined to give their action judgments. We propose fusion strategies with recurrent steps collecting verb and object label judgments along a temporal video sequence. We evaluate recognition performances for individual verb and object models; and we present extensive experimental evaluations for action recognition over recurrent-based fusion approaches on the EGTEA Gaze+ dataset.  相似文献   

5.
Computer vision tasks are often expected to be executed on compressed images. Classical image compression standards like JPEG 2000 are widely used. However, they do not account for the specific end-task at hand. Motivated by works on recurrent neural network (RNN)-based image compression and three-dimensional (3D) reconstruction, we propose unified network architectures to solve both tasks jointly. These joint models provide image compression tailored for the specific task of 3D reconstruction. Images compressed by our proposed models, yield 3D reconstruction performance superior as compared to using JPEG 2000 compression. Our models significantly extend the range of compression rates for which 3D reconstruction is possible. We also show that this can be done highly efficiently at almost no additional cost to obtain compression on top of the computation already required for performing the 3D reconstruction task.  相似文献   

6.
目前孪生网络跟踪器已经具有比较良好的表现,但是对于卷积神经网络所提取的特征仍没有较好地利用其特点,同时孪生网络通过相似性学习进行跟踪的特性使跟踪器的准确性和鲁棒性存在不足。提出了一种金字塔式特征融合的方法,根据骨干网络特征提取层不同深度具有不同侧重的特点提高网络对目标的表征能力,然后使用注意力机制对区域推荐网络(Region Proposal Network,RPN)进行增强,最终实现更精准更鲁棒的跟踪。在OTB100数据集的实验中,新提出的SiamERPN(Siamese Enhanced RPN)算法分别得到了0.668的成功率和0.876的精度,测试结果好于基线算法和其他对比算法。  相似文献   

7.
覃剑  肖婷 《电子学报》2018,46(7):1719-1725
行人检测在汽车驾驶辅助系统和智能视频监控等领域有广泛的应用,而行人候选框的生成是行人识别、定位及跟踪的一项重要前期工作.本文提出一种基于区域复合概率(Local Mixture Probability,LMP)模型的在线生成行人候选框的方法.该方法根据区域相似性将监控场景划分为多个子区域,随之对各区域内行人的位置和尺度分别建立泊松模型和高斯模型.通过模型的学习与更新可以获取目标出现的概率信息以及目标尺度的分布情况,从而为候选框的生成提供依据,避免遍历搜索的盲目性.实验结果表明,该算法能够在生成较少数目候选框的情况下获得较高的覆盖率.  相似文献   

8.
In this paper, a novel framework, named as global-local feature attention network with reranking strategy (GLAN-RS), is presented for image captioning task. Rather than only adopting unitary visual information in the classical models, GLAN-RS explores the attention mechanism to capture local convolutional salient image maps. Furthermore, we adopt reranking strategy to adjust the priority of the candidate captions and select the best one. The proposed model is verified using the Microsoft Common Objects in Context (MSCOCO) benchmark dataset across seven standard evaluation metrics. Experimental results show that GLAN-RS significantly outperforms the state-of-the-art approaches, such as multimodal recurrent neural network (MRNN) and Google NIC, which gets an improvement of 20% in terms of BLEU4 score and 13 points in terms of CIDER score.  相似文献   

9.
We propose a framework, consisting of several algorithms to recognize human activities that involve manipulating objects. Our proposed algorithm identifies objects being manipulated and models high-level tasks being performed accordingly. Realistic settings for such tasks pose several problems for computer vision, including sporadic occlusion by subjects, non-frontal poses, and objects with few local features. We show how size and segmentation information derived from depth data can address these challenges using simple and fast techniques. In particular, we show how to robustly and without supervision find the manipulating hand, properly detect/recognize objects and properly use the temporal information to fill in the gaps between sporadically detected objects, all through careful inclusion of depth cues. We evaluate our approach on a challenging dataset of 12 kitchen tasks that involve 24 objects performed by 2 subjects. The entire framework yields 82%/84% precision (74%/83%recall) for task/object recognition. Our techniques outperform the state-of-the-art significantly in activity/object recognition.  相似文献   

10.
Uplink Macro Diversity of Limited Backhaul Cellular Network   总被引:1,自引:0,他引:1  
In this work, new achievable rates are derived for the uplink channel of a cellular network with joint multicell processing (MCP), where unlike previous results, the ideal backhaul network has finite capacity per cell. Namely, the cell sites are linked to the central joint processor via lossless links with finite capacity. The new rates are based on compress-and-forward schemes combined with local decoding. Further, the cellular network is abstracted by symmetric models, which render analytical treatment plausible. For this family of idealistic models, achievable rates are presented for both Gaussian and fading channels. The rates are given in closed form for the classical Wyner model and the soft-handover model. These rates are then demonstrated to be rather close to the optimal unlimited backhaul joint processing rates, even for modest backhaul capacities, supporting the potential gain offered by the joint MCP approach. Particular attention is also given to the low-signal-to-noise ratio (SNR) characterization of these rates through which the effect of the limited backhaul network is explicitly revealed. In addition, the rate at which the backhaul capacity should scale in order to maintain the original high-SNR characterization of an unlimited backhaul capacity system is found.  相似文献   

11.
时态空间中时态序列模式的数据挖掘   总被引:2,自引:2,他引:0  
时态数据挖掘是目前数据挖掘领域的研究热点。与其它相关研究不同,文章致力于利用时态序列模式挖掘进行预测与决策。首先介绍了时态类型的分类;然后定义了一个新的时态空间模型,用以描述基于不同时态类型、不同属性的各个不同对象的状态,并且为高效地进行预测与决策提供支持;最后,给出了时态空间模型中数据挖掘的四种时态序列模式,对时态数据挖掘的研究具有重要意义。  相似文献   

12.
在车辆重识别(re-identification,Re-ID) 任务中,通过对全局及局部信息的联合提取已成为目前主流的方法,是许多重识别模型在提取局部信息时只关注了丰富程度而忽略了完整性。针对该问题,提出了一种基于关系融合和特征分解的算法。该算法从空间与通道维度出发,设计对骨干网络所提取的特征沿垂直、水平、通道3维度分割,首先,为了更好地凸显车辆的前景区域,提出一种混合注意力模块(mixed attention module,MAM) ,之后,为了在空间维度上挖掘丰富特征信息的同时使得网络关注更完整的感兴趣区域,设计对垂直及水平方向的分割后的特征实现基于图的关系融合。为了赋予网络捕捉更具判别性信息的能力,在通道方向上对分割后的局部特征实现特征分解。最后,在全局分支特征与局部分支下所提取的鲁棒性特征的共同作用下实现车辆重识别。实验结果表明,本文算法在两个主流车辆重识别数据集上取得了更先进的性能。  相似文献   

13.
基于时间Petri网的工作流系统模型的线性推理   总被引:24,自引:5,他引:24  
刘婷  林闯  刘卫东 《电子学报》2002,30(2):245-248
目前工作流理论的研究主要集中在工作流管理模型的结构及正确性分析,很少有人研究与时间有关的工作流模型的性质,特别是模型中的时序关系推理及性能计算问题.本文重点研究了这方面的问题,用时间Petri网表示工作流模型并对基本工作流模型进行时序分析,给出线性时间推理的规则,运用这些规则,可对复杂的工作流模型进行逐步化简,并在线性时间复杂度内解决时间推理问题.  相似文献   

14.
Theories and models for Internet quality of service   总被引:8,自引:0,他引:8  
We survey advances in theories and models for Internet quality of service (QoS). We start with the theory of network calculus, which lays the foundation for support of deterministic performance guarantees in networks, and illustrate its applications to integrated services, differentiated services, and streaming media playback delays. We also present mechanisms and architecture for scalable support of guaranteed services in the Internet, based on the concept of a stateless core. Methods for scalable control operations are also discussed. We then turn our attention to statistical performance guarantees and describe several new probabilistic results that can be used for a statistical dimensioning of differentiated services. Lastly, we review proposals and results in supporting performance guarantees in a best effort context. These include models for elastic throughput guarantees based on TCP performance modeling, techniques for some QoS differentiation without access control, and methods that allow an application to control the performance it receives, in the absence of network support  相似文献   

15.
Weakly supervised temporal action localization (WSTAL) is crucial for real world applications, as it relieves the huge burden of frame-level annotations for fully supervised action detection. Most existing WSTAL methods focused on classifying video snippets, or detecting action boundaries. However, the predictions from these well-designed models have not been fully utilized. Accordingly, we propose a weakly-supervised framework called the progressive enhancement network (PEN), which takes full advantages of the predictions generated by the preceding models to enhance the subsequent models. Specifically, snippet-level pseudo labels are generated from the preceding predictions by considering the similarity and temporal distance between action snippets. Then subsequent models are progressively enhanced by using pseudo labels as a supervision, and utilizing their underlying semantics to make the feature representation more qualified for the temporal localization task. Extensive experiments which are carried out on two popular benchmarks, THUMOS’14 and ActivityNet v1.2, demonstrate the effectiveness of our method.  相似文献   

16.
Efficient message dissemination in vehicular ad-hoc networks (VANETs) is crucial for supporting communication among vehicles and also between users and the Internet, with minimal delay and overhead but maximum reachability. To improve the message dissemination in these networks, we show the need to study the graph-theoretic properties of VANETs, since they neither follow the small-world nor the scale-free network characteristics often found in large self-organized networks. We consider three fundamental properties: connectivity, node degree, and clustering coefficient. For each property, we develop and validate analytical models for both the urban and highway scenarios, building an extensive graph structure perspective on VANETs. With this, we see how connectivity changes with network density, that VANETs exhibit truncated Gaussian node degree distributions, and that network clustering coefficients do not depend on the network’s size or density. We then show how these results can be used to generate individual behavior favorable to the whole network using local information. The usefulness of this new approach is demonstrated by proposing new mechanisms to enhance the urban vehicular broadcasting protocol UV-CAST. Our results show that these new mechanisms lead to excellent performance while reducing the overhead in the UV-CAST protocol.  相似文献   

17.
Weakly supervised temporal action localization is a challenging computer vision problem that uses only video-level labels and lacks the supervision of temporal annotations. In this task, the majority of existing methods usually identify the most discriminative snippets and ignore other relevant snippets. To address this problem, we propose a deep feature enhancing and selecting network. It generates multiple masks for both capturing more complete temporal interval of actions and keeping its high classification accuracy. After that, we further propose a novel selection strategy to balance the influence of multiple masks and improve the model performance. In the experiments, we evaluate the proposed method on the THUMOS’14 and ActivityNet datasets, and the results show the effectiveness of our approach for weakly supervised temporal action localization.  相似文献   

18.
钱夔  宋爱国 《电子学报》2015,43(6):1084-1089
为了更好地模拟人类视觉系统中的注意力选择,本文提出一种改进型机器人仿生认知神经网络.首先模拟人类视觉皮层结构,在已有模型基础上建立改进型仿生认知神经网络模型;增加位置层(Position Motor,PM)到感受野(Receptive Field,RF)的自上而下(top-down)的视觉注意,同时下颞叶(Inferior Temporal,IT)不再接收全局视觉信息,而改为接收带有自下而上(bottom-up)视觉注意的局部信息,不仅降低数据处理的复杂度,也更加符合人类格式塔心理;最后利用该模型实现机器人复杂背景下目标识别与跟踪.实验结果证明该方法在有效减少数据冗余、缩短处理时间的同时,还可有效提高机器人视觉系统对目标的识别准确率.  相似文献   

19.
语音时频特征的时间依赖性、局部相关性、全局相关性等特性,使得传统的神经网络结构与时频域语音增强任务无法完全相适应。针对这一问题,首先利用卷积层代替门控循环单元网络中的全连接层,构成卷积门控循环网络,解决门控循环单元网络在时间维度建模时无法提取频率维度局部相关性的问题;又因卷积层无法提取频率维度的全局相关性,进一步利用注意力机制关注全局相关性的能力,解决卷积门控循环网络无法关注频率维度全局相关性的问题,最后提出了一种深度融合自注意力机制的自注意-卷积门控循环网络。实验证明,该网络通过关注时频域特征的多种特性,有效地提升了语音增强性能。  相似文献   

20.
以HK网络模型为基础,提出了两个度分布与聚类系数均可调的改进HK网络模型。改进模型联合考虑“优先连接”、“三角结构”、“内部演化”等演化机制。在新节点加入时,分别考虑加入单个节点和社团的情况,将 TF 机理移到旧节点之间进行网络演化。仿真结果表明,两个改进模型不仅继承了HK模型的高聚类无标度特性,同时克服了HK模型演化过程中单一加入单个节点的方式及新旧节点之间TF机理的限制。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号