首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
史静  朱虹  王栋  杜森 《中国图象图形学报》2017,22(12):1750-1757
目的 目前对于场景分类问题,由于其内部结构的多样性和复杂性,以及光照和拍摄角度的影响,现有算法大多通过单纯提取特征进行建模,并没有考虑场景图像中事物之间的相互关联,因此,仍然不能达到一个理想的分类效果。本文针对场景分类中存在的重点和难点问题,充分考虑人眼的视觉感知特性,利用显著性检测,并结合传统的视觉词袋模型,提出了一种融合视觉感知特性的场景分类算法。方法 首先,对图像进行多尺度分解,并提取各尺度下的图像特征,接着,检测各尺度下图像的视觉显著区域,最后,将显著区域信息与多尺度特征进行有机融合,构成多尺度融合窗选加权SIFT特征(WSSIFT),对场景进行分类。结果 为了验证本文算法的有效性,该算法在3个标准数据集SE、LS以及IS上进行测试,并与不同方法进行比较,分类准确率提高了约3%~17%。结论 本文提出的融合视觉感知特性的场景分类算法,有效地改善了单纯特征描述的局限性,并提高了图像的整体表达。实验结果表明,该算法对于多个数据集都具有较好的分类效果,适用于场景分析、理解、分类等机器视觉领域。  相似文献   

2.
Natural scene statistics at the centre of gaze   总被引:1,自引:0,他引:1  
Early stages of visual processing may exploit the characteristic structure of natural visual stimuli. This structure may differ from the intrinsic structure of natural scenes, because sampling of the environment is an active process. For example, humans move their eyes several times a second when looking at a scene. The portions of a scene that fall on the fovea are sampled at high spatial resolution, and receive a disproportionate fraction of cortical processing. We recorded the eye positions of human subjects while they viewed images of natural scenes. We report that active selection affected the statistics of the stimuli encountered by the fovea, and also by the parafovea up to eccentricities of 4 degrees. We found two related effects. First, subjects looked at image regions that had high spatial contrast. Second, in these regions, the intensities of nearby image points (pixels) were less correlated with each other than in images selected at random. These effects could serve to increase the information available to the visual system for further processing. We show that both of these effects can be simply obtained by constructing an artificial ensemble comprised of the highest-contrast regions of images.  相似文献   

3.
The assumption that a real scene is a single sample under an assumed model allows simulated scenes with stochastic properties similar to those of the actual scene, which can be utilized for evaluation and validation of proposed models and investigation of the reliability of the results. With this purpose, an appropriate model-based approach to account for stochastic properties of the scenes is required. This research focused on development of a hierarchical stochastic model to characterize processes observed in remotely sensed imagery and simulation of scenes based on the developed models to provide a general methodology for dynamic spatial landscape modeling and a variety of image processing research.The new model is based on a comprehensive stochastic representation of the scene. At the higher level, region formation process is modeled as a large scale characteristic of the scene employing a Markov random field. The boundary variation around the adjacent regions is dealt with using fuzzy approach. The natural variability within each region is represented at the lower level of the hierarchy. For this, two approaches based on the different assumptions are suggested in modeling the statistical features of continuous radiance field. In the first model, pixel intensities are assumed to be independently and identically distributed and the second model employs a continuous random field including possible contextual information. Finally, this integrated simulation process forms the multispectral images.  相似文献   

4.
自动图像标注是一项具有挑战性的工作,它对于图像分析理解和图像检索都有着重要的意义.在自动图像标注领域,通过对已标注图像集的学习,建立语义概念空间与视觉特征空间之间的关系模型,并用这个模型对未标注的图像集进行标注.由于低高级语义之间错综复杂的对应关系,使目前自动图像标注的精度仍然较低.而在场景约束条件下可以简化标注与视觉特征之间的映射关系,提高自动标注的可靠性.因此提出一种基于场景语义树的图像标注方法.首先对用于学习的标注图像进行自动的语义场景聚类,对每个场景语义类别生成视觉场景空间,然后对每个场景空间建立相应的语义树.对待标注图像,确定其语义类别后,通过相应的场景语义树,获得图像的最终标注.在Corel5K图像集上,获得了优于TM(translation model)、CMRM(cross media relevance model)、CRM(continous-space relevance model)、PLSA-GMM(概率潜在语义分析-高期混合模型)等模型的标注结果.  相似文献   

5.
This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data. A scene model is proposed that labels regions according to an identifiable activity in each region, such as entry/exit zones, junctions, paths, and stop zones. We present several unsupervised methods that learn these scene elements and present results that show the efficiency of our approach. Finally, we describe how the models can be used to support the interpretation of moving objects in a visual surveillance environment.  相似文献   

6.
魏彤  李绪 《机器人》2020,42(3):336-345
现有的同步定位与地图创建(SLAM)算法在动态环境中的定位与建图精度通常会大幅度下降,为此提出了一种基于动态区域剔除的双目视觉SLAM算法.首先,基于立体视觉几何约束方法判别场景中动态的稀疏特征点,接下来根据场景深度和颜色信息进行场景区域分割;然后利用动态点与场景分割结果标记出场景中的动态区域,进而剔除现有双目ORB-SLAM算法中动态区域内的特征点,消除场景中的动态目标对SLAM精度的影响;最后进行实验验证,本文算法在KITTI数据集上的动态区域分割查全率达到92.31%.在室外动态环境下,视觉导盲仪测试中动态区域分割查全率达到93.62%,较改进前的双目ORB-SLAM算法的直线行走定位精度提高82.75%,环境建图效果也明显改善,算法的平均处理速度达到4.6帧/秒.实验结果表明本文算法能够显著提高双目视觉SLAM算法在动态场景中的定位与建图精度,且能够满足视觉导盲的实时性要求.  相似文献   

7.

In this paper, we propose two approaches to analyze the crowd scenes. The first one is motion units and meta-tracking based approach (MUDAM Approach). In this approach, the scene is divided into a number of dynamic divisions with coherent motion dynamics called the motion units (MUs). By analyzing the relationships between these MUs using a proposed continuation likelihood, the scene entrance and exit gates are retrieved. A meta-tracking procedure is then applied and the scene dominant motion pathways are retrieved. To overcome the limitations of the MUDAM approach, and detect some of the anomalies, that may happen in these scenes, we proposed another new LSTM based approach. In this approach, the scene is divided into a number of static overlapped spatial regions named super regions (SRs), which cover the whole scene. Long Short Term Memory (LSTM) is used in defining a predictive model for each of the scene SRs. Each LSTM predictive model uses its SR tracklets in the training, such that, it can capture the whole motion dynamics of that SR. Using apriori known scene entrance segments, the proposed LSTM predictive models are applied and the scene dominant motion pathways are retrieved. an anomaly metric is formulated to be used with the LSTM predictive models to detect the scene anomalies. Prototypes of our proposed approaches were developed and evaluated on the challenging New York Grand Central station scene, in addition to four other crowded scenes. Four types of anomalies that may happen in the crowded scenes were defined in the context, and our proposed LSTM based approach was used in detecting these anomalies. Experimental results on anomalies detection were applied too on a number of data sets. Ov erall, the proposed approaches managed to outperform the state of the art methods in retrieving the scene gates and common pathways, in addition to detecting motion anomalies.

  相似文献   

8.
9.
Abstract

The objective of image segmentation in remote sensing is to define regions in an image that correspond to objects in the ground scene. Traditional scene models underlying image segmentation procedures have assumed that objects as manifest in images have internal variances that are both low and equal. This scene model is unrealistically simple. An alternative scene model recognizes different scales of objects in scenes. Each level in the hierarchy is nested, or composed of objects or categories of objects from the preceding level. Different objects may have distinct attributes, allowing for relaxation of assumptions like equal variance.

A multiple-pass, region-based segmentation algorithm improves the segmentation of images from scenes better modelled as a nested hierarchy. A multiple-pass approach allows slow and careful growth of regions while inter-region distances are below a global threshold. Past the global threshold, a minimum region size parameter forces development of regions in areas of high local variance. Maximum and viable region size parameters limit the development of undesirably large regions.

Application of the segmentation algorithm for forest stand delineation in Landsat TM imagery yields regions corresponding to identifiable features in the landscape. The use of a local variance, adaptive-window texture channel in conjunction with spectral bands improves the ability to define regions corresponding to sparsely-stocked forest stands which have high internal variance.  相似文献   

10.
Bottom-up spatiotemporal visual attention model for video analysis   总被引:3,自引:0,他引:3  
The human visual system (HVS) has the ability to fixate quickly on the most informative (salient) regions of a scene and therefore reducing the inherent visual uncertainty. Computational visual attention (VA) schemes have been proposed to account for this important characteristic of the HVS. A video analysis framework based on a spatiotemporal VA model is presented. A novel scheme has been proposed for generating saliency in video sequences by taking into account both the spatial extent and dynamic evolution of regions. To achieve this goal, a common, image-oriented computational model of saliency-based visual attention is extended to handle spatiotemporal analysis of video in a volumetric framework. The main claim is that attention acts as an efficient preprocessing step to obtain a compact representation of the visual content in the form of salient events/objects. The model has been implemented, and qualitative as well as quantitative examples illustrating its performance are shown.  相似文献   

11.
提出了一种融合场景上下文信息的两级分类算法,从单幅图像中恢复场景结构。室外场景的结构化特征使其3维结构可以粗略地分为3类:"地面","天空"以及"竖直物体"。首先,把图像分割成具有灰度和颜色一致性的区域;其次确定特征显著区域("确定区域")的结构,将特征不明显的区域标记为"未知区域";然后根据"未知区域"与"确定区域"的相似性及"确定区域"场景结构对"未知区域"的可能结构进行投票,将投票最多的结构类型赋予"未知区域";最后介绍场景结构恢复在构造场景3维模型方面的应用。实验结果表明,由于利用了场景结构的上下文信息,该算法场景结构恢复的正确率为92.3%,优于现有算法88.1%的恢复正确率。  相似文献   

12.
基于三维模型的交通场景视觉监控   总被引:6,自引:2,他引:4  
视觉监控是计算机视觉研究的前沿方向.动态场景视觉监控就是利用计算机视觉和人工智能的理论和方法.通过对摄像机拍录的图像序列进行自动分析来对场景中的运动物体进行定位、跟踪和识别,并对物体的运动行为作出判断或者解释,达到监控的目的.本文结合交通场景监控这一特定任务,实现一个包括摄像机标定、模型可视化、运动车辆的姿态优化与定位、跟踪预测、基于轨迹分析的行为理解等功能算法的交通场景视觉监控系统.从算法和实现的角度出发,文章对系统中各个功能模块进行了较为详细的描述与讨论.  相似文献   

13.
《Pattern recognition letters》2003,24(9-10):1261-1274
The aim of this paper is to develop a rich set of visual primitives that can be used by a camera-endowed robot as it is exploring a scene and thus generating an attentional sequence––spatio-temporally related sets of visual features. Our starting point is inspired by the work of Gallant et al. on the area V4 response of the macaque monkeys to Cartesian and non-Cartesian stimuli. The novelty of these stimuli is that in addition to the conventional sinusoidal gratings, they also include non-Cartesian stimuli such as circular, polar and hyperbolic gratings. Based on this stimulus set and introducing frequency as a parameter, we obtain a rich set of visual primitives. These visual primitives are biologically motivated, nearly orthogonal with some degree of redundancy, can be made complete as required and yet implementable on off-the-shelf hardware for real-time selective vision-robot applications. Attentional sequences are then formed as a spatio-temporal sequence of observations––each of which encodes the filter responses of each fovea as an observation vector consisting of responses of 50 filters. A series of experiments serve to demonstrate the use of these visual primitives in attention-based real-life scene recognition tasks: (1) modeling complex scenes based on average attentional sequence responses and (2) fast real-time recognition of relatively complex scenes with a few saccades––based on the comparison of the current attentional sequence to the a priori learned average observation vectors.  相似文献   

14.
We present a computational scene model and also derive novel algorithms for computing audio and visual scenes and within-scene structures in films. We use constraints derived from film-making rules and from experimental results in the psychology of audition, in our computational scene model. Central to the computational model is the notion of a causal, finite-memory viewer model. We segment the audio and video data separately. In each case, we determine the degree of correlation of the most recent data in the memory with the past. The audio and video scene boundaries are determined using local maxima and minima, respectively. We derive four types of computable scenes that arise due to different kinds of audio and video scene boundary synchronizations. We show how to exploit the local topology of an image sequence in conjunction with statistical tests, to determine dialogs. We also derive a simple algorithm to detect silences in audio. An important feature of our work is to introduce semantic constraints based on structure and silence in our computational model. This results in computable scenes that are more consistent with human observations. The algorithms were tested on a difficult data set: three commercial films. We take the first hour of data from each of the three films. The best results: computational scene detection: 94%; dialogue detection: 91%; and recall 100% precision.  相似文献   

15.
In the field of visual recognition such as scene categorization, representing an image based on the local feature (e.g., the bag-of-visual-word (BOVW) model and the bag-of-contextual-visual-word (BOCVW) model) has become popular and one of the most successful methods. In this paper, we propose a method that uses localized maximum-margin learning to fuse different types of features during the BOCVW modeling for eventual scene classification. The proposed method fuses multiple features at the stage when the best contextual visual word is selected to represent a local region (hard assignment) or the probabilities of the candidate contextual visual words used to represent the unknown region are estimated (soft assignment). The merits of the proposed method are that (1) errors caused by the ambiguity of single feature when assigning local regions to the contextual visual words can be corrected or the probabilities of the candidate contextual visual words used to represent the region can be estimated more accurately; and that (2) it offers a more flexible way in fusing these features through determining the similarity-metric locally by localized maximum-margin learning. The proposed method has been evaluated experimentally and the results indicate its effectiveness.  相似文献   

16.
17.
We describe and validate a simple context-based scene recognition algorithm for mobile robotics applications. The system can differentiate outdoor scenes from various sites on a college campus using a multiscale set of early-visual features, which capture the "gist" of the scene into a low-dimensional signature vector. Distinct from previous approaches, the algorithm presents the advantage of being biologically plausible and of having low-computational complexity, sharing its low-level features with a model for visual attention that may operate concurrently on a robot. We compare classification accuracy using scenes filmed at three outdoor sites on campus (13,965 to 34,711 frames per site). Dividing each site into nine segments, we obtain segment classification rates between 84.21 percent and 88.62 percent. Combining scenes from all sites (75,073 frames in total) yields 86.45 percent correct classification, demonstrating the generalization and scalability of the approach  相似文献   

18.
This paper presents a novel method for virtual view synthesis that allows viewers to virtually fly through real soccer scenes, which are captured by multiple cameras in a stadium. The proposed method generates images of arbitrary viewpoints by view interpolation of real camera images near the chosen viewpoints. In this method, cameras do not need to be strongly calibrated since projective geometry between cameras is employed for the interpolation. For avoiding the complex and unreliable process of 3-D recovery, object scenes are segmented into several regions according to the geometric property of the scene. Dense correspondence between real views, which is necessary for intermediate view generation, is automatically obtained by applying projective geometry to each region. By superimposing intermediate images for all regions, virtual views for the entire soccer scene are generated. The efforts for camera calibration are reduced and correspondence matching requires no manual operation; hence, the proposed method can be easily applied to dynamic events in a large space. An application for fly-through observations of soccer match replays is introduced along with the algorithm of view synthesis and experimental results. This is a new approach for providing arbitrary views of an entire dynamic event.  相似文献   

19.
Current techniques for generating animated scenes involve either videos (whose resolution is limited) or a single image (which requires a significant amount of user interaction). In this paper, we describe a system that allows the user to quickly and easily produce a compelling-looking animation from a small collection of high resolution stills. Our system has two unique features. First, it applies an automatic partial temporal order recovery algorithm to the stills in order to approximate the original scene dynamics. The output sequence is subsequently extracted using a second-order Markov Chain model. Second, a region with large motion variation can be automatically decomposed into semiautonomous regions such that their temporal orderings are softly constrained. This is to ensure motion smoothness throughout the original region. The final animation is obtained by frame interpolation and feathering. Our system also provides a simple-to-use interface to help the user to fine-tune the motion of the animated scene. Using our system, an animated scene can be generated in minutes. We show results for a variety of scenes.  相似文献   

20.
Most successful approaches on scene recognition tend to efficiently combine global image features with spatial local appearance and shape cues. On the other hand, less attention has been devoted for studying spatial texture features within scenes. Our method is based on the insight that scenes can be seen as a composition of micro-texture patterns. This paper analyzes the role of texture along with its spatial layout for scene recognition. However, one main drawback of the resulting spatial representation is its huge dimensionality. Hence, we propose a technique that addresses this problem by presenting a compact Spatial Pyramid (SP) representation. The basis of our compact representation, namely, Compact Adaptive Spatial Pyramid (CASP) consists of a two-stages compression strategy. This strategy is based on the Agglomerative Information Bottleneck (AIB) theory for (i) compressing the least informative SP features, and, (ii) automatically learning the most appropriate shape for each category. Our method exceeds the state-of-the-art results on several challenging scene recognition data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号