首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对存在明显光照变化或遮挡物等室外复杂场景下,现有基于深度学习的视觉即时定位与地图构建(visual simultaneous localization and mapping,视觉SLAM)回环检测方法没有很好地利用图像的语义信息、场景细节且实时性差等问题,本文提出了一种YOLO-NKLT视觉SLAM回环检测方法。采用改进损失函数的YOLOv5网络模型获取具有语义信息的图像特征,构建训练集,对网络重训练,使提取的特征更加适用于复杂场景下的回环检测。为了进一步提高闭环检测的实时性,提出了一种基于非支配排序的KLT降维方法。通过在New College数据集和光照等变化更复杂的Nordland数据集上进行实验,结果表明:室外复杂场景下,相较于其他传统和基于深度学习的方法,所提方法具有更高的鲁棒性,可以取得更佳的准确率和实时性表现。  相似文献   

2.
The detection of near-duplicate video clips (NDVCs) is an area of current research interest and intense development. Most NDVC detection methods represent video clips with a unique set of low-level visual features, typically describing color or texture information. However, low-level visual features are sensitive to transformations of the video content. Given the observation that transformations tend to preserve the semantic information conveyed by the video content, we propose a novel approach for identifying NDVCs, making use of both low-level visual features (this is, MPEG-7 visual features) and high-level semantic features (this is, 32 semantic concepts detected using trained classifiers). Experimental results obtained for the publicly available MUSCLE-VCD-2007 and TRECVID 2008 video sets show that bimodal fusion of visual and semantic features facilitates robust NDVC detection. In particular, the proposed method is able to identify NDVCs with a low missed detection rate (3% on average) and a low false alarm rate (2% on average). In addition, the combined use of visual and semantic features outperforms the separate use of either of them in terms of NDVC detection effectiveness. Further, we demonstrate that the effectiveness of the proposed method is on par with or better than the effectiveness of three state-of-the-art NDVC detection methods either making use of temporal ordinal measurement, features computed using the Scale-Invariant Feature Transform (SIFT), or bag-of-visual-words (BoVW). We also show that the influence of the effectiveness of semantic concept detection on the effectiveness of NDVC detection is limited, as long as the mean average precision (MAP) of the semantic concept detectors used is higher than 0.3. Finally, we illustrate that the computational complexity of our NDVC detection method is competitive with the computational complexity of the three aforementioned NDVC detection methods.  相似文献   

3.
机器人在大尺度场景下开展同时定位与建图(SLAM)任务时,其闭环检测环节会出现较严重的错匹配或者漏匹配问题,因此,采用残差网络(ResNet)对图像序列进行特征提取,并提出一种新的闭环检测算法。通过预训练的ResNet提取输入图像的全局特征,并对该帧图像及之前具有一定长度的图像序列的特征按照下采样的方式进行拼接,将结果作为当前帧图像的特征,保证图像特征的丰富性与准确性。同时,设计一种双层查询的方法以获得最相似的图像帧,并对最相似图像进行一致性检验,确保闭环的准确性。在闭环检测主流公开数据集New College和City Centre上,所提算法在100%准确率下的召回率为83%,在99%准确率下的召回率为85%。与传统的词袋方法和VGG16方法相比,所提算法具有显著的提升。  相似文献   

4.
5.
The human visual system analyzes the complex scenes rapidly. It devotes the limited perceptual resources to the most salient subsets and/or objects of scenes while ignoring their less salient parts. Gaze prediction models try to predict the human eye fixations (human gaze) under free-viewing conditions while imitating the attentive mechanism. Previous studies on saliency benchmark datasets have shown that visual attention is affected by the salient objects of the scenes and their features. These features include the identity, the location, and the visual features of objects in the scenes, beside to the context of the input image. Moreover, the human eye fixations often converge to the specific parts of salient objects in the scenes. In this paper, we propose a deep gaze prediction model using object detection via image segmentation. It uses some deep neural modules to find the identity, location, and visual features of the salient objects in the scenes. In addition, we introduce a deep module to capture the prior bias of human eye fixations. To evaluate our model, several challenging saliency benchmark datasets are used in the experiments. We also conduct an ablation study to show the effectiveness of our proposed modules and its architecture. Despite its fewer parameters, our model has comparable, or even better performance on some datasets, to the state-of-the-art saliency models.  相似文献   

6.
本文提出了一种场景文本检测方法,用于应对复杂自然场景中文本检测的挑战。该方法采用了双重注意力和多尺度特征融合的策略,通过双重注意力融合机制增强了文本特征通道之间的关联性,提升了整体检测性能。在考虑到深层特征图上下采样可能引发的语义信息损失的基础上,提出了空洞卷积多尺度特征融合金字塔(dilated convolution multi-scale feature fusion pyramid structure, MFPN),它采用双融合机制来增强语义特征,有助于加强语义特征,克服尺度变化的影响。针对不同密度信息融合引发的语义冲突和多尺度特征表达受限问题,创新性地引入了多尺度特征融合模块(multi-scale feature fusion module, MFFM)。此外,针对容易被冲突信息掩盖的小文本问题,引入了特征细化模块(feature refinement module, FRM)。实验表明,本文的方法对复杂场景中文本检测有效,其F值在CTW1500、ICDAR2015和Total-Text 3个数据集上分别达到了85.6%、87.1%和86.3%。  相似文献   

7.
8.
Human–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extractor followed by detection layers comprising small filters (eg, 1 × 1 or 3 × 3). Although small filters can capture local spatial features with a few parameters, they fail to capture larger context information relevant for recognizing interactions between humans and distant objects owing to their small receptive regions. Hence, we herein propose a three‐stream HOI detection network that employs a context convolution module (CCM) in each stream branch. The CCM can capture larger contexts from input feature maps by adopting combinations of large separable convolution layers and residual‐based convolution layers without increasing the number of parameters by using fewer large separable filters. We evaluate our HOI detection method using two benchmark datasets, V‐COCO and HICO‐DET, and demonstrate its state‐of‐the‐art performance.  相似文献   

9.
欧先锋  晏鹏程  王汉谱  涂兵  何伟  张国云  徐智 《电子学报》2000,48(12):2384-2393
复杂场景中的运动目标检测是计算机视觉领域的重要问题,其检测准确度仍然是一大挑战.本文提出并设计了一种用于复杂场景中运动目标检测的深度帧差卷积神经网络(Deep Difference Convolutional Neural Network,DFDCNN).DFDCNN由DifferenceNet和AppearanceNet组成,不需要后处理就可以预测分割前景像素.DifferenceNet具有孪生Encoder-Decoder结构,用于学习两个连续帧之间的变化,从输入(t帧和t+1帧)中获取时序信息;AppearanceNet用于从输入(t帧)中提取空间信息,并与时序信息融合;同时,通过多尺度特征图融合和逐步上采样来保留多尺度空间信息,以提高网络对小目标的敏感性.在公开标准数据集CDnet2014和I2R上的实验结果表明:DFDCNN不仅在动态背景、光照变化和阴影存在的复杂场景中具有更好的检测性能,而且在小目标存在的场景中也具有较好的检测效果.  相似文献   

10.
11.
基于部件的自动目标检测方法研究   总被引:3,自引:1,他引:3  
该文提出了一种新的自动目标检测算法,实现对自然场景图像及高分辨率遥感图像中结构相对复杂的人造目标的自动检测。该方法基于组成物体的几何部件处理问题,降低了对训练样本数量的需求。首先选择两类典型特征,基于机器学习训练对应的分类器,有效地减少了背景中某些物体与前景目标部分特性相似对检测方法准确率的影响;然后利用标值点过程对问题建模,以对目标分布的先验约束和分类器的响应作为数据能量,自顶向下地自动检测目标。实验结果表明,该方法准确率高、鲁棒性好,具有较强的实际应用价值。  相似文献   

12.
周炫余  刘娟  卢笑  邵鹏  罗飞 《电子学报》2017,45(1):140-146
针对纯视觉行人检测方法存在的误检、漏检率高,遮挡目标以及小尺度目标检测精度低等问题,提出一种联合文本和图像信息的行人检测方法.该方法首先利用图像分析的方法初步获取图像目标的候选框,其次通过文本分析的方法获取文本中有关图像目标的实体表达,并提出一种基于马尔科夫随机场的模型用于推断图像候选框与文本实体表达之间的共指关系(Coreference Relation),以此达到联合图像和文本信息以辅助机器视觉提高交通场景下行人检测精度的目的.在增加了图像文本描述的加州理工大学行人检测数据集上进行的测评结果表明,该方法不仅可以在图像信息的基础上联合文本信息提高交通场景中的行人检测精度,也能在文本信息的基础上联合图像信息提高文本中的指代消解(Anaphora Resolution)精度.  相似文献   

13.
郑云飞  张雄伟  曹铁勇  孙蒙 《电子学报》2017,45(11):2593-2601
基于底层视觉特征和先验知识的显著性区域检测算法难以检测一些复杂的显著性目标,人的视觉系统能分辨这些目标是由于其中包含丰富的语义知识.本文构建了一个基于全卷积结构的语义显著性区域检测网络,用数据驱动的方式构建从图像底层特征到人类语义认知的映射,提取语义显著性区域.针对网络提取的语义显著性区域的缺点,本文进一步引入颜色信息、目标边界信息、空间一致性信息获得准确的超像素级前景和背景概率.最后提出一个优化模型融合前景和背景概率信息、语义信息、空间一致性信息得到最终的显著性区域图.在6个数据集上与15种最新算法的比较实验证明了本文算法的有效性和鲁棒性.  相似文献   

14.
Aggregation of local and global contextual information by exploiting multi-level features in a fully convolutional network is a challenge for the pixel-wise salient object detection task. Most existing methods still suffer from inaccurate salient regions and blurry boundaries. In this paper, we propose a novel edge-aware global and local information aggregation network (GLNet) to fully exploit the integration of side-output local features and global contextual information and utilization of contour information of salient objects. The global guidance module (GGM) is proposed to learn discriminative multi-level information with the direct guidance of global semantic knowledge for more accurate saliency prediction. Specifically, the GGM consists of two key components, where the global feature discrimination module exploits the inter-channel relationship of global semantic features to boost representation power, and the local feature discrimination module enables different side-output local features to selectively learn informative locations by fusing with global attentive features. Besides, we propose an edge-aware aggregation module (EAM) to employ the correlation between salient edge information and salient object information for generating estimated saliency maps with explicit boundaries. We evaluate our proposed GLNet on six widely-used saliency detection benchmark datasets by comparing with 17 state-of-the-art methods. Experimental results show the effectiveness and superiority of our proposed method on all the six benchmark datasets.  相似文献   

15.
挖掘隐藏在网络中不同于正常数据对象的离群点是数据挖掘的重要任务之一.目前,针对双类型异质信息网络离群点检测的研究工作相对较少,原本适用于同质网络的离群点检测方法将很难适用于双类型异质网络.为此,提出了异质信息网络中基于排序和聚类的离群点检测方法(RKBOutlier).从异质信息网络中抽取两种类型的对象以及链接两种对象的语义信息,将待检测的数据作为属性对象,将另一类型数据作为目标对象,对目标对象进行聚类来检测属性对象在各个聚类中的分布情况,数据分布异常的对象即为离群点.将排序和聚类相结合来显著提高聚类的准确度.实验结果表明,RKBOutlier可以在双类型异质信息网络中有效地检测出离群点.  相似文献   

16.
A content-based image retrieval mechanism to support complex similarity queries is presented. The image content is defined by three kinds of features: quantifiable features describing the visual information, nonquantifiable features describing the semantic information, and keywords describing more abstract semantic information. In correspondence with these feature sets, we construct three types of indexes: visual indexes, semantic indexes, and keyword indexes. Index structures are elaborated to provide effective and efficient retrieval of images based on their contents. The underlying index structure used for all indexes is the HG-tree. In addition to the HG-tree, the signature file and hashing technique are also employed to index keywords and semantic features. The proposed indexing scheme combines and extends the HG-tree, the signature file, and the hashing scheme to support complex similarity queries. We also propose a new evaluation strategy to process the complex similarity queries. Experiments have been carried out on large image collections to demonstrate the effectiveness of the proposed retrieval mechanism.  相似文献   

17.
LiDAR-based 3D Object detection is one of the popular topics in recent years, and it is widely used in the fields of autonomous driving and robot controlling. However, due to the scanning pattern of LiDAR, the point clouds of objects at far distance are sparse and more difficult to be detected. To solve this problem, we propose a two-stage network based on spatial context information, named SC-RCNN (Spatial Context RCNN), for object detection in 3D point cloud scenes. SC-RCNN first uses a backbone with sparse convolutions and submanifold sparse convolutions to extract the voxel features of point scenes and generate a series of candidate boxes. For the sparsity of far-distance point clouds, we design the local grid point pooling (LGP Pooling) to extract features and spatial context information around candidate regions for subsequent box refinement. In addition, we propose the pyramid candidate box augmentation (PCB Augmentation) to expand the candidate boxes with a multi-scale style, enriching the feature encoding. The experimental results show that SC-RCNN significantly outperforms previous methods on KITTI dataset and Waymo dataset, and is particularly robust to the sparsity of point clouds.  相似文献   

18.
Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called "Dense-Sift," that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.  相似文献   

19.
Zero-shot learning (ZSL) aims to recognize new objects that have never seen before by associating categories with their semantic knowledge. Existing works mainly focus on learning better visual-semantic mapping to align the visual and semantic space, while the effectiveness of learning discriminative visual features is neglected. In this paper, we propose an object-centric complementary features (OCF) learning model to take full advantage of visual information of objects with the guidance of semantic knowledge. This model can automatically discover the object region and obtain fine-scale samples without any human annotation. Then, the attention mechanism is used in our model to capture long-range visual features corresponding to semantic knowledge like ‘four legs’ and subtle visual differences between similar categories. Finally, we train our model with the guidance of semantic knowledge in an end-to-end manner. Our method is evaluated on three widely used ZSL datasets, CUB, AwA2, and FLO, and the experiment results demonstrate the efficacy of the object-centric complementary features, and our proposed method outperforms the state-of-the-art methods.  相似文献   

20.
The cutting-edge RGB saliency models are prone to fail for some complex scenes, while RGB-D saliency models are often affected by inaccurate depth maps. Fortunately, light field images can provide a sufficient spatial layout depiction of 3D scenes. Therefore, this paper focuses on salient object detection of light field images, where a Similarity Retrieval-based Inference Network (SRI-Net) is proposed. Due to various focus points, not all focal slices extracted from light field images are beneficial for salient object detection, thus, the key point of our model lies in that we attempt to select the most valuable focal slice, which can contribute more complementary information for the RGB image. Specifically, firstly, we design a focal slice retrieval module (FSRM) to choose an appropriate focal slice by measuring the foreground similarity between the focal slice and RGB image. Secondly, in order to combine the original RGB image and the selected focal slice, we design a U-shaped saliency inference module (SIM), where the two-stream encoder is used to extract multi-level features, and the decoder is employed to aggregate multi-level deep features. Extensive experiments are conducted on two widely used light field datasets, and the results firmly demonstrate the superiority and effectiveness of the proposed SRI-Net.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号