首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Automatic image orientation detection for natural images is a useful, yet challenging research topic. Humans use scene context and semantic object recognition to identify the correct image orientation. However, it is difficult for a computer to perform the task in the same way because current object recognition algorithms are extremely limited in their scope and robustness. As a result, existing orientation detection methods were built upon low-level vision features such as spatial distributions of color and texture. Discrepant detection rates have been reported for these methods in the literature. We have developed a probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues within a Bayesian framework. Our current accuracy is 90 percent for unconstrained consumer photos, impressive given the findings of a psychophysical study conducted recently. The proposed framework is an attempt to bridge the gap between computer and human vision systems and is applicable to other problems involving semantic scene content understanding.  相似文献   

3.
基于局部显著区域的自然场景识别   总被引:6,自引:1,他引:5       下载免费PDF全文
场景识别是移动机器人实现拓扑导航的关键。针对未知环境,提出一种基于视觉局部显著区域的自然场景识别方法。首先,提出带反馈的显著性检测模型(FSDM)自底向上进行图像分析;然后,根据显著位置,基于分形实现自动尺度选择,以构造合适尺寸的局部显著区域。对场景图像中的显著区域采用梯度方向、二阶不变矩、归一化色调3种特征进行不变性表示,并根据其匹配率实现场景识别。实验结果表明,FSDM具有较高的显著性检测精度。而且室内室外环境的多次场景识别实验也表明,该方法与全局外观方法相比能够更好地容忍尺度、视角等变化引起的差异,静态场景识别具有较高的准确性。  相似文献   

4.
Spatial judgments with monoscopic and stereoscopic presentation of perspective displays were investigated in the present study. The stimulus configuration emulated a visual scene consisting of a volume of airspace above a ground reference plane. Two target symbols were situated at various positions in the space, and observers were instructed to identify the relative depth or altitude of the two symbols. Three viewing orientations (15, 45, or 90 deg elevation angle) were implemented in the perspective projection. In the monoscopic view, depth cues in size, brightness, occlusion, and linear perspective were provided in the format. In the stereoscopic view, binocular disparity was added along the line of sight from the center of projection to reinforce the relative depth in the visual scene. Results revealed that spatial judgments were affected by manipulation of the relative spatial positions of the two target symbols and by the interaction between relative position and viewing orientation. The addition of binocular disparity improved judgments of three-dimensional spatial relationships, and the enhancement was greater when monocular depth cues were less effective and/or ambiguous in recovering the three-dimensional spatial characteristics.  相似文献   

5.
Detection of visual saliency is valuable for applications like robot navigation, adaptive image compression, and object recognition. In this paper, we propose a fast frequency domain visual saliency method by use of the binary spectrum of Walsh–Hadamard transform (WHT). The method achieves saliency detection by simply exploiting the WHT components of the scene under view. Unlike space domain-based approaches, our method performs the cortical center-surround suppression in frequency domain and thus has implicit biological plausibility. By virtue of simplicity and speed of the WHT, the proposed method is very simple and fast in computation, and outperforms existing state-of-the-art saliency detection methods, when evaluated by using the capability of eye fixation prediction. In arduous tasks of ship detection in multispectral imagery, large amount of multispectral data require real-time processing and analyzing. As a very fast and effective technique for saliency detection, the proposed method is modified and applied to automatic ship detection in multispectral imagery. The robustness of the method against sea clutters is further proved.  相似文献   

6.
丛润民  张晨  徐迈  刘鸿羽  赵耀 《软件学报》2023,34(4):1711-1731
受人类的视觉注意力机制启发, 显著性目标检测任务旨在定位给定场景中最吸引人注意的目标或区域. 近年来, 随着深度相机的发展和普及, 深度图像已经被成功应用于各类计算机视觉任务, 这也为显著性目标检测技术提供了新思路. 通过引入深度图像, 不仅能使计算机更加全面地模拟人类视觉系统, 而且深度图像所提供的结构、位置等补充信息也可以为低对比度、复杂背景等困难场景的检测提供新的解决方案. 鉴于深度学习时代下RGB-D显著目标检测任务发展迅速, 旨在从该任务关键问题的解决方案出发, 对现有相关研究成果进行归纳、总结和梳理, 并在常用RGB-D SOD数据集上进行不同方法的定量分析和定性比较. 最后, 对该领域面临的挑战及未来的发展趋势进行总结与展望.  相似文献   

7.
The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images include occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many objects are present in the same scene. Several models for object categorization use appearance and context information from objects to improve recognition accuracy. Appearance information, based on visual cues, can successfully identify object classes up to a certain extent. Context information, based on the interaction among objects in the scene or global scene statistics, can help successfully disambiguate appearance inputs in recognition tasks. In this work we address the problem of incorporating different types of contextual information for robust object categorization in computer vision. We review different ways of using contextual information in the field of object categorization, considering the most common levels of extraction of context and the different levels of contextual interactions. We also examine common machine learning models that integrate context information into object recognition frameworks and discuss scalability, optimizations and possible future approaches.  相似文献   

8.
A neural network architecture for the segmentation and recognition of colored and textured visual stimuli is presented. The architecture is based on the Boundary Contour System and Feature Contour System (BCS/FCS) of S. Grossberg and E. Mingolla. The architecture proposes a biologically-inspired mechanism for color processing based on antagonist interactions. It suggests how information from different modalities (i.e. color or texture) can be fused together to form a coherent segmentation of the visual scene. It identifies two stages of visual pattern recognition, namely, a global preattentive recognition of the visual scene followed by a local attentive recognition within a particular visual context. The global and local classification and recognition of visual stimuli use ART-type models of G. Carpenter and S. Grossberg for pattern learning and recognition based on color and texture. One example is presented corresponding to an figure-figure separation task. The architecture provides a mechanism for segmentation, categorization and recognition of images from different classes based on self-organizing principles of perception and pattern recognition.  相似文献   

9.
We investigated human perceptual performance allowed by relatively impoverished information conveyed in nighttime natural scenes. We used images of nighttime outdoor scenes rendered in image-intensified low-light visible (i2) sensors, thermal infrared (ir) sensors, and an i2/ir fusion technique with information added. We found that nighttime imagery provides adequate low-level image information for effective perceptual organization on a classification task, but that performance for exemplars within a given object category is dependent on the image type. Overall performance was best with the false-color fused images. This is consistent with the suggestion in the literature that color plays a predominate role in perceptual grouping and segmenting of objects in a scene and supports the suggestion that the addition of color in complex achromatic scenes aids the perceptual organization required for visual search. In the present study, we address the issue of assessment of perceptual performance with alternative night-vision sensors and fusion methods and begin to characterize perceptual organization abilities permitted by the information in relatively impoverished images of complex scenes. Applications of this research include improving night vision, medical, and other devices that use alternative sensors or degraded imagery.  相似文献   

10.
Salient object detection is to identify objects or regions with maximum visual recognition in an image, which brings significant help and improvement to many computer visual processing tasks. Although lots of methods have occurred for salient object detection, the problem is still not perfectly solved especially when the background scene is complex or the salient object is small. In this paper, we propose a novel Weak Feature Boosting Network (WFBNet) for the salient object detection task. In the WFBNet, we extract the unpredictable regions (low confidence regions) of the image via a polynomial function and enhance the features of these regions through a well-designed weak feature boosting module (WFBM). Starting from a coarse saliency map, we gradually refine it according to the boosted features to obtain the final saliency map, and our network does not need any post-processing step. We conduct extensive experiments on five benchmark datasets using comprehensive evaluation metrics. The results show that our algorithm has considerable advantages over the existing state-of-the-art methods.  相似文献   

11.
Accuracy of memory performance per se is an imperfect reflection of the cognitive activity (awareness states) that underlies performance in memory tasks. The aim of this research is to investigate the effect of varied visual and interaction fidelity of immersive virtual environments on memory awareness states. A between groups experiment was carried out to explore the effect of rendering quality on location-based recognition memory for objects and associated states of awareness. The experimental space, consisting of two interconnected rooms, was rendered either flat-shaded or using radiosity rendering. The computer graphics simulations were displayed on a stereo head-tracked head mounted display. Participants completed a recognition memory task after exposure to the experimental space and reported one of four states of awareness following object recognition. These reflected the level of visual mental imagery involved during retrieval, the familiarity of the recollection, and also included guesses. Experimental results revealed variations in the distribution of participants' awareness states across conditions while memory performance failed to reveal any. Interestingly, results revealed a higher proportion of recollections associated with mental imagery in the flat-shaded condition. These findings comply with similar effects revealed in two earlier studies summarized here, which demonstrated that the less "naturalistic" interaction interface or interface of low interaction fidelity provoked a higher proportion of recognitions based on visual mental images.  相似文献   

12.
Building (street) orientation is one of the important parameters for estimation of building bulk size (height and width) from corner reflector effects using remotely sensed radar image data. However, this parameter is difficult to obtain directly from radar data. Other sensor data such as optical and near infrared data may provide possibilities. This paper reports on a method for detection and recognition of street orientation in remotely sensed Landsat TM and/or SPOT HRV imagery. The methodology includes two steps: (1) multiscale wavelet transform techniques are employed to detect edges; (2) the predominant street orientation for each 20 × 20 pixel block is then recognised by applying a simple algorithm to the detected edges which contain most of the information about street orientations.  相似文献   

13.
马苗  王伯龙  吴琦  武杰  郭敏 《软件学报》2019,30(4):867-883
作为计算机视觉、多媒体、人工智能和自然语言处理等领域的交叉性研究课题,视觉场景描述的研究内容是自动生成一个或多个语句用于描述图像或视频中呈现的视觉场景信息.视觉场景中内容的丰富性和自然语言表达的多样性使得视觉场景描述成为一项充满挑战的任务,综述了现有视觉场景描述方法及其效果评价.首先,论述了视觉场景描述的定义、研究任务及方法分类,简要分析了视觉场景描述与多模态检索、跨模态学习、场景分类、视觉关系检测等相关技术的关系;然后分类讨论视觉场景描述的主要方法、模型及研究进展,归纳日渐增多的基准数据集;接下来,梳理客观评价视觉场景描述效果的主要指标和视觉场景描述技术面临的问题与挑战,最后讨论未来的应用前景.  相似文献   

14.
遥感图像场景分类对土地资源管理具有重要意义,然而高分辨率遥感图像中地物分布复杂,图像中存在着与当前场景无关的冗余信息,会对场景的精确分类造成影响.对此,提出一种基于脉冲卷积神经网络(SCNN)稀疏表征的场景分类方法.从稀疏表征出发,利用脉冲神经元的稀疏脉冲输出特性,设计脉冲卷积神经网络,去除遥感图像中与场景无关的冗余信息,实现对图像的稀疏表征;提出基于脉冲输出交叉熵损失函数的反向传播算法,在该算法的基础上利用梯度下降训练脉冲卷积神经网络,优化网络参数,实现遥感图像场景分类;通过实验验证方法的有效性,将所提出方法应用于Google和UCM两个遥感图像数据集,并与传统的卷积神经网络(CNN)进行对比.实验结果表明,所提出方法可以对遥感图像进行稀疏表征,实现场景分类;相对于卷积神经网络,所提出方法在遥感图像场景分类任务上更具有优势.  相似文献   

15.
M Maltz  D Shinar 《Human factors》1999,41(1):15-25
This 2-part study focuses on eye movements to explain driving-related visual performance in younger and older persons. In the first task, participants' eye movements were monitored as they viewed a traffic scene image with a numeric overlay and visually located the numbers in their sequential order. The results showed that older participants had significantly longer search episodes than younger participants, and that the visual search of older adults was characterized by more fixations and shorter saccades, although the average fixation durations remained the same. In the second task, participants viewed pictures of traffic scenes photographed from the driver's perspective. Their task was to assume the role of the driver and regard the image accordingly. Results in the second task showed that older participants allocated a larger percentage of their visual scan time to a small subset of areas in the image, whereas younger participants scanned the images more evenly. Also, older participants revisited the same areas and younger participants did not. The results suggest how aging might affect the efficacy of visual information processing. Potential applications of this research include training older drivers for a more effective visual search, and providing older drivers with redundant information in case some information is missed.  相似文献   

16.
人类感知的外界信息80%以上是通过视觉得到的,因此,图像处理的应用领域必然涉及到人类生活和工作的方方面面。图像增强是处于图像处理的预处理阶段,它是图像处理的一个重要环节,在整个图像处理过程中起着承前启后的重要作用。其目的就是为了改善图像的质量和视觉效果,或将图像转换成更适合于人眼观察或机器分析识别的形式,以便从图像中获取更加有用的信息。如何有效地增强图像是图像分析中的一个难点,全文对几类经典的图像增强技术进行了归纳分析,并对一些算法进行了描述。  相似文献   

17.
铁路检测、监测领域产生海量的图像数据,基于图像场景进行分类对图像后续分析、管理具有重要价值.本文提出一种结合深度卷积神经神经网络DCNN (Deep Convolutional Neural Networks)与梯度类激活映射Grad-CAM (Grad Class Activation Mapping)的可视化场景分类模型,DCNN在铁路场景分类图像数据集进行迁移学习,实现特征提取,Grad-CAM根据梯度全局平均计算权重实现对类别的加权热力图及激活分数计算,提升分类模型可解释性.实验中对比了不同的DCNN网络结构对铁路图像场景分类任务性能影响,对场景分类模型实现可视化解释,基于可视化模型提出了通过降低数据集内部偏差提升模型分类能力的优化流程,验证了深度学习技术对于图像场景分类任务的有效性.  相似文献   

18.
可供性是指在环境内物体所提供的一系列交互可能,描述环境属性与个体之间的连接过程。其中,视觉可供性研究即通过使用图像、视频等视觉数据,探究视觉主体与环境或物体交互的可能性,涉及到场景识别、动作识别、物体检测等相关领域。视觉可供性可广泛应用于机器人、场景理解等领域。根据目前已有的相关研究,按功能可供性、行为可供性、社交可供性三方面对视觉可供性进行分类,并针对每一类可供性检测方法按照传统机器学习方法和深度学习方法进行详细论述。对当前典型的视觉可供性数据集进行归纳与分析,对视觉可供性的应用方向及未来可能的研究方向进行讨论。  相似文献   

19.
在目标检测网络(ObjectNet)和场景识别网络相结合的方法中,由于ObjectNet提取的目标特征和场景网络提取的场景特征的维度和性质不一致,且目标特征中存在影响场景判断的冗余信息,导致场景识别的准确率低。针对这个问题,提出一种改进的结合目标检测的室内场景识别方法。首先,在ObjectNet中引入类转换矩阵(CCM),将ObjectNet输出的目标特征进行转化,使得目标特征的维度与场景特征的维度相一致,以此减少特征维度不一致带来的信息丢失;然后采用上下文门控(CG)机制对特征中的冗余信息进行抑制,从而降低不相关信息的权重,提高了目标特征在场景识别中的作用。该方法在MIT Indoor67数据集上的识别准确率达到90.28%,与维护空间布局的对象语义特征(SOSF)方法相比识别准确率提高了0.77个百分点;其在SUN397数据集上识别准确率达到81.15%,与交替专家层次结构(HoAS)方法相比识别准确率提高了1.49个百分点。实验结果表明,所提方法提高了室内场景识别的准确率。  相似文献   

20.
This paper proposes a novel method based on Spectral Regression (SR) for efficient scene recognition. First, a new SR approach, called Extended Spectral Regression (ESR), is proposed to perform manifold learning on a huge number of data samples. Then, an efficient Bag-of-Words (BOW) based method is developed which employs ESR to encapsulate local visual features with their semantic, spatial, scale, and orientation information for scene recognition. In many applications, such as image classification and multimedia analysis, there are a huge number of low-level feature samples in a training set. It prohibits direct application of SR to perform manifold learning on such dataset. In ESR, we first group the samples into tiny clusters, and then devise an approach to reduce the size of the similarity matrix for graph learning. In this way, the subspace learning on graph Laplacian for a vast dataset is computationally feasible on a personal computer. In the ESR-based scene recognition, we first propose an enhanced low-level feature representation which combines the scale, orientation, spatial position, and local appearance of a local feature. Then, ESR is applied to embed enhanced low-level image features. The ESR-based feature embedding not only generates a low dimension feature representation but also integrates various aspects of low-level features into the compact representation. The bag-of-words is then generated from the embedded features for image classification. The comparative experiments on open benchmark datasets for scene recognition demonstrate that the proposed method outperforms baseline approaches. It is suitable for real-time applications on mobile platforms, e.g. tablets and smart phones.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号