首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Statistics of natural image categories   总被引:9,自引:0,他引:9  
In this paper we study the statistical properties of natural images belonging to different categories and their relevance for scene and object categorization tasks. We discuss how second-order statistics are correlated with image categories, scene scale and objects. We propose how scene categorization could be computed in a feedforward manner in order to provide top-down and contextual information very early in the visual processing chain. Results show how visual categorization based directly on low-level features, without grouping or segmentation stages, can benefit object localization and identification. We show how simple image statistics can be used to predict the presence and absence of objects in the scene before exploring the image.  相似文献   

3.
Visual vocabulary representation approach has been successfully applied to many multimedia and vision applications, including visual recognition, image retrieval, and scene modeling/categorization. The idea behind the visual vocabulary representation is that an image can be represented by visual words, a collection of local features of images. In this work, we will develop a new scheme for the construction of visual vocabulary based on the analysis of visual word contents. By considering the content homogeneity of visual words, we design a visual vocabulary which contains macro-sense and micro-sense visual words. The two types of visual words are appropriately further combined to describe an image effectively. We also apply the visual vocabulary to construct image retrieving and categorization systems. The performance evaluation for the two systems indicates that the proposed visual vocabulary achieves promising results.  相似文献   

4.
The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images include occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many objects are present in the same scene. Several models for object categorization use appearance and context information from objects to improve recognition accuracy. Appearance information, based on visual cues, can successfully identify object classes up to a certain extent. Context information, based on the interaction among objects in the scene or global scene statistics, can help successfully disambiguate appearance inputs in recognition tasks. In this work we address the problem of incorporating different types of contextual information for robust object categorization in computer vision. We review different ways of using contextual information in the field of object categorization, considering the most common levels of extraction of context and the different levels of contextual interactions. We also examine common machine learning models that integrate context information into object recognition frameworks and discuss scalability, optimizations and possible future approaches.  相似文献   

5.
Photo‐realistic rendering of virtual objects into real scenes is one of the most important research problems in computer graphics. Methods for capture and rendering of mixed reality scenes are driven by a large number of applications, ranging from augmented reality to visual effects and product visualization. Recent developments in computer graphics, computer vision, and imaging technology have enabled a wide range of new mixed reality techniques including methods for advanced image based lighting, capturing spatially varying lighting conditions, and algorithms for seamlessly rendering virtual objects directly into photographs without explicit measurements of the scene lighting. This report gives an overview of the state‐of‐the‐art in this field, and presents a categorization and comparison of current methods. Our in‐depth survey provides a tool for understanding the advantages and disadvantages of each method, and gives an overview of which technique is best suited to a specific problem.  相似文献   

6.
Scenes are closely related to the kinds of objects that may appear in them. Objects are widely used as features for scene categorization. On the other hand, landscapes with more spatial structures of scenes are representative of scene categories. In this paper, we propose a deep learning based algorithm for scene categorization. Specifically, we design two-pathway convolutional neural networks for exploiting both object attributes and spatial structures of scene images. Different from conventional deep learning methods, which usually focus on only one aspect of images, each pathway of the proposed architecture is tuned to capture a different aspect of images. As a result, complementary information of image contents can be utilized effectively. In addition, to deal with the feature redundancy problem caused by combining features from different sources, we adopt the ? 2,1 norm during classifier training to control selectivity of each type of features. Extensive experiments are conducted to evaluate the proposed method. Obtained results demonstrate that the proposed approach achieves superior performances over conventional methods. Moreover, the proposed method is a general framework, which can be easily extended to more pathways and applied to solve other problems.  相似文献   

7.
Recently, various bag-of-features (BoF) methods show their good resistance to within-class variations and occlusions in object categorization. In this paper, we present a novel approach for multi-object categorization within the BoF framework. The approach addresses two issues in BoF related methods simultaneously: how to avoid scene modeling and how to predict labels of an image when multiple categories of objects are co-existing. We employ a biased sampling strategy which combines the bottom-up, biologically inspired saliency information and loose, top-down class prior information for object class modeling. Then this biased sampling component is further integrated with a multi-instance multi-label leaning and classification algorithm. With the proposed biased sampling strategy, we can perform multi-object categorization within an image without semantic segmentation. The experimental results on PASCAL VOC2007 and SUN09 show that the proposed method significantly improves the discriminative ability of BoF methods and achieves good performance in multi-object categorization tasks.  相似文献   

8.
提出一种概率签名的图像分布描述及对应的图像分类算法.算法首先通过高斯混合模型建立图像局部特征分布,然后以混合模型中各个模式的均值为聚类中心,以图像中满足约束条件的局部特征对相应模式的后验概率之和为聚类大小来形成初始的概率签名,最后执行一个压缩过程确定最终的概率签名特征,并通过训练基于Earth Mover's Distance (EMD)核的SVM分类器完成图像分类.概率签名允许一个局部特征对多个聚类做出反映,可以编码更多判别信息以及从视觉感知上捕捉更多的相似性.通过与其它图像分类方法在场景识别和对象分类两项任务上的对比实验,验证了文中提出的分类方法的有效性.  相似文献   

9.
Motion based Painterly Rendering   总被引:1,自引:0,他引:1  
Previous painterly rendering techniques normally use image gradients for deciding stroke orientations. Image gradients are good for expressing object shapes, but difficult to express the flow or movements of objects. In real painting, the use of brush strokes corresponding to the actual movement of objects allows viewers to recognize objects' motion better and thus to have an impression of the dynamic. In this paper, we propose a novel painterly rendering algorithm to express dynamic objects based on their motion information. We first extract motion information (magnitude, direction, standard deviation) of a scene from a set of consecutive image sequences from the same view. Then the motion directions are used for determining stroke orientations in the regions with significant motions, and image gradients determine stroke orientations where little motion is observed. Our algorithm is useful for realistically and dynamically representing moving objects. We have applied our algorithm for rendering landscapes. We could segment a scene into dynamic and static regions, and express the actual movement of dynamic objects using motion based strokes.  相似文献   

10.
文章在VS .NET 2003环境下,利用C对图像及视频流的处理能力,实现对运动物体进行检测的方法.方法主要是在"背景帧"基础上,使用滤波技术,对运动物体进行自动检测,该方法简单、有效,有助于对运动物体的进一步识别.  相似文献   

11.
Traditional computer graphics methods render images that appear sharp at all depths. Adding blur can add realism to a scene, provide a sense of scale, and draw a viewer’s attention to a particular region of a scene. Our image-based blur algorithm needs to distinguish whether a portion of an image is either from a single object or is part of more than one object. This motivates two approaches to identify objects after an image has been rendered. We illustrate how these techniques can be used in conjunction with our image space method to add blur to a scene.  相似文献   

12.
图象主要区域的提取是图象语义抽取及其应用的基础 .为了更好地进行图象语义的抽取 ,提出了一种面向图象语义的图象主要区域自动提取方法 .该方法首先将图象划分成固定大小的子块 ,并通过对子块特征进行聚类来获得图象的初始区域分割 ;而后 ,经过一系列的后处理来优化分割结果 ,并实现前景和背景区分 ;最后通过分析每个背景区域的重要程度 ,去除掉不相关的背景区域 .通过对包含有显著对象的户外图象进行的实验表明 :该方法不仅可以去除图象中 ,大量与图象语义不相关的内容 ,而且能保留图象的主要信息 ,这就为进一步的图象语义应用打好了基础 .  相似文献   

13.
14.
With the continuously increasing needs of location information for users around the world, applications of geospatial information have gained a lot of attention in both research and commercial organizations. Extraction of semantics from the image content for geospatial information seeking and knowledge discovery has been thus becoming a critical process. Unfortunately, the available geographic images may be blurred, either too light or too dark. It is therefore often hard to extract geographic features directly from images. In this paper, we describe our developed methods in applying local scale-invariant features and bag-of-keypoints techniques to annotating images, in order to carry out image categorization and geographic knowledge discovery tasks. First, local scale-invariant features are extracted from geographic images as representative geographic features. Subsequently, the bag-of-keypoints methods are used to construct a visual vocabulary and generate feature vectors to support image categorization and annotation. The annotated images are classified by using geographic nouns. The experimental results show that the proposed approach is sensible and can effectively enhance the tasks of geographic knowledge discovery.  相似文献   

15.
This article addresses the problem of creating interactive mixed reality applications where virtual objects interact in images of real world scenarios. This is relevant to create games and architectural or space planning applications that interact with visual elements in the images such as walls, floors and empty spaces. These scenarios are intended to be captured by the users with regular cameras or using previously taken photographs. Introducing virtual objects in photographs presents several challenges, such as pose estimation and the creation of a visually correct interaction between virtual objects and the boundaries of the scene. The two main research questions addressed in this article include, the study of the feasibility of creating interactive augmented reality (AR) applications where virtual objects interact in a real world scenario using the image detected high-level features and, also, verifying if untrained users are capable and motivated enough to perform AR initialization steps. The proposed system detects the scene automatically from an image with additional features obtained using basic annotations from the user. This operation is significantly simple to accommodate the needs of non-expert users. The system analyzes one or more photos captured by the user and detects high-level features such as vanishing points, floor and scene orientation. Using these features it will be possible to create mixed and augmented reality applications where the user interactively introduces virtual objects that blend with the picture in real time and respond to the physical environment. To validate the solution several system tests are described and compared using available external image datasets.  相似文献   

16.
融合图像场景及物体先验知识的图像描述生成模型   总被引:1,自引:0,他引:1       下载免费PDF全文
目的 目前基于深度卷积神经网络(CNN)和长短时记忆(LSTM)网络模型进行图像描述的方法一般是用物体类别信息作为先验知识来提取图像CNN特征,忽略了图像中的场景先验知识,造成生成的句子缺乏对场景的准确描述,容易对图像中物体的位置关系等造成误判。针对此问题,设计了融合场景及物体类别先验信息的图像描述生成模型(F-SOCPK),将图像中的场景先验信息和物体类别先验信息融入模型中,协同生成图像的描述句子,提高句子生成质量。方法 首先在大规模场景类别数据集Place205上训练CNN-S模型中的参数,使得CNN-S模型能够包含更多的场景先验信息,然后将其中的参数通过迁移学习的方法迁移到CNNd-S中,用于捕捉待描述图像中的场景信息;同时,在大规模物体类别数据集Imagenet上训练CNN-O模型中的参数,然后将其迁移到CNNd-O模型中,用于捕捉图像中的物体信息。提取图像的场景信息和物体信息之后,分别将其送入语言模型LM-S和LM-O中;然后将LM-S和LM-O的输出信息通过Softmax函数的变换,得到单词表中每个单词的概率分值;最后使用加权融合方式,计算每个单词的最终分值,取概率最大者所对应的单词作为当前时间步上的输出,最终生成图像的描述句子。结果 在MSCOCO、Flickr30k和Flickr8k 3个公开数据集上进行实验。本文设计的模型在反映句子连贯性和准确率的BLEU指标、反映句子中单词的准确率和召回率的METEOR指标及反映语义丰富程度的CIDEr指标等多个性能指标上均超过了单独使用物体类别信息的模型,尤其在Flickr8k数据集上,在CIDEr指标上,比单独基于物体类别的Object-based模型提升了9%,比单独基于场景类别的Scene-based模型提升了近11%。结论 本文所提方法效果显著,在基准模型的基础上,性能有了很大提升;与其他主流方法相比,其性能也极为优越。尤其是在较大的数据集上(如MSCOCO),其优势较为明显;但在较小的数据集上(如Flickr8k),其性能还有待于进一步改进。在下一步工作中,将在模型中融入更多的视觉先验信息,如动作类别、物体与物体之间的关系等,进一步提升描述句子的质量。同时,也将结合更多视觉技术,如更深的CNN模型、目标检测、场景理解等,进一步提升句子的准确率。  相似文献   

17.
This paper proposes a method of detecting movable paths during visual navigation for a robot operating in an unknown structured environment. The proposed approach detects and segments the floor by computing plane normals from motion fields in image sequences. A floor is a useful object for mobile robots in structured environments, because it presents traversable paths if existing static or dynamic objects are removed effectively. In spite of this advantage, it cannot be easily detected from 2D image. In this paper, some geometric features observed in the scene and assumptions about images are exploited so that a plane normal can be employed as an effective clue to separate the floor from the scene. In order to use the plane normal, two methods are proposed and integrated with a designed iterative refinement process. Then, the floor can be accurately detected even when mismatched point correspondences are obtained. The results of preliminary experiments on real data demonstrate the effectiveness of the proposed methods.  相似文献   

18.
This paper presents a novel approach based on contextual Bayesian networks (CBN) for natural scene modeling and classification. The structure of the CBN is derived based on domain knowledge, and parameters are learned from training images. For test images, the hybrid streams of semantic features of image content and spatial information are piped into the CBN-based inference engine, which is capable of incorporating domain knowledge as well as dealing with a number of input evidences, producing the category labels of the entire image. We demonstrate the promise of this approach for natural scene classification, comparing it with several state-of-art approaches.  相似文献   

19.
唐榕  李骞  唐绍恩 《计算机工程》2023,49(2):314-320
能见度对人类生产生活、交通运输安全等具有重要影响,是地面自动气象观测的重要内容之一,但由于受影响因素较多,目前能见度检测仍缺乏统一的标准和检定规程。现有基于图像的能见度检测方法大多从整幅图像或局部区域中提取视觉特征估计能见度,未考虑不同景深目标物对应子图像的质量衰减程度不同,导致检测结果精确度和稳定性不高。提出一种新的能见度检测方法,使用预训练的神经图像评估,从不同景深目标物对应的子图像中提取视觉特征,并将提取的特征和能见度真值输入到全连接网络,以训练子图像的能见度映射模型。根据子图像与全局图像间的关系,动态建立各目标在能见度整体估计过程中的权重回归模型,按照权重融合各目标物能见度估计值,得到整幅图像的能见度检测值。实验结果表明,该方法能有效提升回归模型的预测精度,其在不同能见度区间的检测正确率均超过85%。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号