首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The boundaries of objects in an image are often considered a nuisance to be “handled” due to the occlusion they exhibit. Since most, if not all, computer vision techniques aggregate information spatially within a scene, information spanning these boundaries, and therefore from different physical surfaces, is invariably and erroneously considered together. In addition, these boundaries convey important perceptual information about 3D scene structure and shape. Consequently, their identification can benefit many different computer vision pursuits, from low-level processing techniques to high-level reasoning tasks. While much focus in computer vision is placed on the processing of individual, static images, many applications actually offer video, or sequences of images, as input. The extra temporal dimension of the data allows the motion of the camera or the scene to be used in processing. In this paper, we focus on the exploitation of subtle relative-motion cues present at occlusion boundaries. When combined with more standard appearance information, we demonstrate these cues’ utility in detecting occlusion boundaries locally. We also present a novel, mid-level model for reasoning more globally about object boundaries and propagating such local information to extract improved, extended boundaries.  相似文献   

2.
The central problem in vision is to determine scene properties from image properties. This is difficult because the problem, formally posed, is underconstrained. Methods that infer scene properties from images make assumptions about how the world determines what we see. In remote sensing, some of these assumptions can be dealt with explicitly. Available scene knowledge, in the form of a digital terrain model and a ground cover map, is used to synthesize an image for a given date and time. The synthesis process assumes that the surface is a perfectly diffuse or "lambertian" reflector. A scene radiance equation is described based on simple models of direct solar irradiance, diffuse sky irradiance, and atmospheric path radiance. Parameters of the model are estimated from the real image. A statistical comparison of the real image and the synthetic image is used to judge how well the model represents the mapping from scene to image.
The methods presented for image synthesis are similar to those used in computer graphics. The motivation, however, is different. In graphics, the goal is to produce an effective rendering of the scene for a human viewer. Here, the goal is to predict properties of real images. In vision, one must deal with a confounding of effects due to surface shape, surface material, illumination, shadows, and atmosphere. These effects often detract from, rather than enhance, the determination of invariant scene characteristics.  相似文献   

3.
Statistical cue integration in DAG deformable models   总被引:4,自引:0,他引:4  
Deformable models are a useful modeling paradigm in computer vision. A deformable model is a curve, a surface, or a volume, whose shape, position, and orientation are controlled through a set of parameters. They can represent manufactured objects, human faces and skeletons, and even bodies of fluid. With low-level computer vision and image processing techniques, such as optical flow, we extract relevant information from images. Then, we use this information to change the parameters of the model iteratively until we find a good approximation of the object in the images. When we have multiple computer vision algorithms providing distinct sources of information (cues), we have to deal with the difficult problem of combining these, sometimes conflicting contributions in a sensible way. In this paper, we introduce the use of a directed acyclic graph (DAG) to describe the position and Jacobian of each point of deformable models. This representation is dynamic, flexible, and allows computational optimizations that would be difficult to do otherwise. We then describe a new method for statistical cue integration method for tracking deformable models that scales well with the dimension of the parameter space. We use affine forms and affine arithmetic to represent and propagate the cues and their regions of confidence. We show that we can apply the Lindeberg theorem to approximate each cue with a Gaussian distribution, and can use a maximum-likelihood estimator to integrate them. Finally, we demonstrate the technique at work in a 3D deformable face tracking system on monocular image sequences with thousands of frames.  相似文献   

4.
Automatic image orientation detection for natural images is a useful, yet challenging research topic. Humans use scene context and semantic object recognition to identify the correct image orientation. However, it is difficult for a computer to perform the task in the same way because current object recognition algorithms are extremely limited in their scope and robustness. As a result, existing orientation detection methods were built upon low-level vision features such as spatial distributions of color and texture. Discrepant detection rates have been reported for these methods in the literature. We have developed a probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues within a Bayesian framework. Our current accuracy is 90 percent for unconstrained consumer photos, impressive given the findings of a psychophysical study conducted recently. The proposed framework is an attempt to bridge the gap between computer and human vision systems and is applicable to other problems involving semantic scene content understanding.  相似文献   

5.
Intrinsic images are a mid‐level representation of an image that decompose the image into reflectance and illumination layers. The reflectance layer captures the color/texture of surfaces in the scene, while the illumination layer captures shading effects caused by interactions between scene illumination and surface geometry. Intrinsic images have a long history in computer vision and recently in computer graphics, and have been shown to be a useful representation for tasks ranging from scene understanding and reconstruction to image editing. In this report, we review and evaluate past work on this problem. Specifically, we discuss each work in terms of the priors they impose on the intrinsic image problem. We introduce a new synthetic ground‐truth dataset that we use to evaluate the validity of these priors and the performance of the methods. Finally, we evaluate the performance of the different methods in the context of image‐editing applications.  相似文献   

6.

Local occlusion cue has been successfully exploited to infer depth ordering from monocular image. However, due to uncertainty of occluded relations, inconsistent results frequently arise, especially for the image of complex scenarios. We propose a depth propagation mechanism which incorporates local occlusion and global ground cues together in the way of probabilistic-to-energetic Bayesian framework. By maximizing posterior namely minimizing energy of latent relative depth variables with well-defined pairwise occlusion priori, we recover correct depth ordering in monocular setting. Our model can guarantee the consistency of relative depth labeling in automatically constructed topological graph via transferring more confident aligned multi-depth cues amongst different segments. Experiments demonstrate that more reasonable and accurate outcomes can be achieved by our depth propagation mechanism and they are also superior to common-used occlusion-based approaches in complex nature.

  相似文献   

7.
The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images include occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many objects are present in the same scene. Several models for object categorization use appearance and context information from objects to improve recognition accuracy. Appearance information, based on visual cues, can successfully identify object classes up to a certain extent. Context information, based on the interaction among objects in the scene or global scene statistics, can help successfully disambiguate appearance inputs in recognition tasks. In this work we address the problem of incorporating different types of contextual information for robust object categorization in computer vision. We review different ways of using contextual information in the field of object categorization, considering the most common levels of extraction of context and the different levels of contextual interactions. We also examine common machine learning models that integrate context information into object recognition frameworks and discuss scalability, optimizations and possible future approaches.  相似文献   

8.
We address the problem of localizing and obtaining high-resolution footage of the people present in a scene. We propose a biologically-inspired solution combining pre-attentive, low-resolution sensing for detection with shiftable, high-resolution, attentive sensing for confirmation and further analysis. The detection problem is made difficult by the unconstrained nature of realistic environments and human behaviour, and the low resolution of pre-attentive sensing. Analysis of human peripheral vision suggests a solution based on integration of relatively simple but complementary cues. We develop a Bayesian approach involving layered probabilistic modeling and spatial integration using a flexible norm that maximizes the statistical power of both dense and sparse cues. We compare the statistical power of several cues and demonstrate the advantage of cue integration. We evaluate the Bayesian cue integration method for human detection on a labelled surveillance database and find that it outperforms several competing methods based on conjunctive combinations of classifiers (e.g., Adaboost). We have developed a real-time version of our pre-attentive human activity sensor that generates saccadic targets for an attentive foveated vision system. Output from high-resolution attentive detection algorithms and gaze state parameters are fed back as statistical priors and combined with pre-attentive cues to determine saccadic behaviour. The result is a closed-loop system that fixates faces over a 130 deg field of view, allowing high-resolution capture of facial video over a large dynamic scene.  相似文献   

9.
从复杂的自然图像中获取目标轮廓是计算机视觉中的经典难题,而提供符合人类感知特性和自然图像统计规律的线索合并模型是提高轮廓质量的关键问题。利用连续性和相似性线索进行轮廓编组,提出一种线索合并模型,拟合格式塔规则中连续性和相似性的统计联合条件概率。该线索合并模型解释了如何用两个相互独立的线索变量得到两个相关线索联合分布的特殊形式,克服了判别式模型刻意回避的相关线索合并问题,是更符合自然图像统计特性和人类感知特性的格式塔线索量化模型。将该模型应用于自然图像的轮廓提取中,实验结果证实了模型的有效性。  相似文献   

10.
This letter presents an improved cue integration approach to reliably separate coherent moving objects from their background scene in video sequences. The proposed method uses a probabilistic framework to unify bottom-up and top-down cues in a parallel, "democratic" fashion. The algorithm makes use of a modified Bayes rule where each pixel's posterior probabilities of figure or ground layer assignment are derived from likelihood models of three bottom-up cues and a prior model provided by a top-down cue. Each cue is treated as independent evidence for figure-ground separation. They compete with and complement each other dynamically by adjusting relative weights from frame to frame according to cue quality measured against the overall integration. At the same time, the likelihood or prior models of individual cues adapt toward the integrated result. These mechanisms enable the system to organize under the influence of visual scene structure without manual intervention. A novel contribution here is the incorporation of a top-down cue. It improves the system's robustness and accuracy and helps handle difficult and ambiguous situations, such as abrupt lighting changes or occlusion among multiple objects. Results on various video sequences are demonstrated and discussed. (Video demos are available at http://organic.usc.edu:8376/ approximately tangx/neco/index.html .).  相似文献   

11.
Occlusion reasoning is a fundamental problem in computer vision. In this paper, we propose an algorithm to recover the occlusion boundaries and depth ordering of free-standing structures in the scene. Rather than viewing the problem as one of pure image processing, our approach employs cues from an estimated surface layout and applies Gestalt grouping principles using a conditional random field (CRF) model. We propose a hierarchical segmentation process, based on agglomerative merging, that re-estimates boundary strength as the segmentation progresses. Our experiments on the Geometric Context dataset validate our choices for features, our iterative refinement of classifiers, and our CRF model. In experiments on the Berkeley Segmentation Dataset, PASCAL VOC 2008, and LabelMe, we also show that the trained algorithm generalizes to other datasets and can be used as an object boundary predictor with figure/ground labels.  相似文献   

12.
Motion blur due to camera shake is a common occurrence. During image capture, the apparent motion of a scene point in the image plane varies according to both camera motion and scene structure. Our objective is to infer the camera motion and the depth map of static scenes using motion blur as a cue. To this end, we use an unblurred–blurred image pair. Initially, we develop a technique to estimate the transformation spread function (TSF) which symbolizes the camera shake. This technique uses blur kernels estimated at different points across the image. Based on the estimated TSF, we recover the complete depth map of the scene within a regularization framework.  相似文献   

13.
《Image and vision computing》2014,32(6-7):405-413
Finding regions of interest (ROIs) is a fundamentally important problem in the area of computer vision and image processing. Previous studies addressing this issue have mainly focused on investigating chromatic cues to characterize visually salient image regions, while less attention has been devoted to monochromatic cues. The purpose of this paper is the study of monochromatic cues, which have the potential to complement chromatic cues, for the detection of ROIs in an image. This paper first presents a taxonomy of existing ROI detection approaches using monochromatic cues, ranging from well-known algorithms to the most recently published techniques. We then propose a novel monochromatic cue for ROI detection. Finally, a comparative evaluation has been conducted on large scale challenging test sets of real-world natural scenes. Experimental results demonstrate that the use of our proposed monochromatic cue yields a more accurate identification of ROIs. This paper serves as a benchmark for future research on this particular topic and a steppingstone for developers and practitioners interested in adopting monochromatic cues to ROI detection systems and methodologies.  相似文献   

14.
目的 传统的单目视觉深度测量方法具有设备简单、价格低廉、运算速度快等优点,但需要对相机进行复杂标定,并且只在特定的场景条件下适用。为此,提出基于运动视差线索的物体深度测量方法,从图像中提取特征点,利用特征点与图像深度的关系得到测量结果。方法 对两幅图像进行分割,获取被测量物体所在区域;然后采用本文提出的改进的尺度不变特征变换SIFT(scale-invariant feature transtorm)算法对两幅图像进行匹配,结合图像匹配和图像分割的结果获取被测量物体的匹配结果;用Graham扫描法求得匹配后特征点的凸包,获取凸包上最长线段的长度;最后利用相机成像的基本原理和三角几何知识求出图像深度。结果 实验结果表明,本文方法在测量精度和实时性两方面都有所提升。当图像中的物体不被遮挡时,实际距离与测量距离之间的误差为2.60%,测量距离的时间消耗为1.577 s;当图像中的物体存在部分遮挡时,该方法也获得了较好的测量结果,实际距离与测量距离之间的误差为3.19%,测量距离所需时间为1.689 s。结论 利用两幅图像上的特征点来估计图像深度,对图像中物体存在部分遮挡情况具有良好的鲁棒性,同时避免了复杂的摄像机标定过程,具有实际应用价值。  相似文献   

15.
Tracking multiple objects into a scene is one of the most active research topics in computer vision. The art of identifying each target within the scene along a video sequence has multiple issues to be solved, being collision and occlusion events among the most challenging ones. Because of this, when dealing with human detection, it is often very difficult to obtain a full body image, which introduces complexity in the process. The task becomes even more difficult when dealing with unpredictable trajectories, like in sport environments. Thus, head-shoulder omega shape becomes a powerful tool to perform the human detection. Most of the contributions to this field involve a detection technique followed by a tracking system based on the omega-shape features. Based on these works, we present a novel methodology for providing a full tracking system. Different techniques are combined to both detect, track and recover target identifications under unpredictable trajectories, such as sport events. Experimental results into challenging sport scenes show the performance and accuracy of this technique. Also, the system speed opens the door for obtaining a real-time system using GPU programing in standard desktop machines, being able to be used in higher-level human behavioral systems, with multiple applications.  相似文献   

16.
《Advanced Robotics》2013,27(10):1043-1058
Many applications in computer vision are based on a single static camera observing a scene which is static except for one or more figures (people, vehicles, etc.) moving through it. In these applications it is useful to understand whether the moving figure is partially occluded by some static element of the scene. Such partial occlusions, when undetected, confuse the analysis of the figure's pose and activity. We present an algorithm that uses only the information provided by moving figures to simply and efficiently derive the position of static occluding bodies. Once these occlusions are obtained, we demonstrate successful reasoning about the occlusion status of future figures within the same scene. The occlusion positions from multiple views of the same scene are used to extract an estimate of the three-dimensional position and shape of the occlusion. Experimental results validating the method are included.  相似文献   

17.
Nonlocal mean (NM) is an efficient method for many low-level image processing tasks. However, it is challenging to directly utilize NM for saliency detection. This is because that conventional NM method can only extract the structure of the image itself and is based on regular pixel-level graph. However, saliency detection usually requires human perceptions and more complex connectivity of image elements. In this paper, we propose a novel generalized nonlocal mean (GNM) framework with the object-level cue which fuses the low-level and high-level cues to generate saliency maps. For a given image, we first use uniqueness to describe the low-level cue. Second, we adopt the objectness algorithm to find potential object candidates, then we pool the object measures onto patches to generate two high-level cues. Finally, by fusing these three cues as an object-level cue for GNM, we obtain the saliency map of the image. Extensive experiments show that our GNM saliency detector produces more precise and reliable results compared to state-of-the-art algorithms.  相似文献   

18.
19.
Vision-based road detection is an important research topic in different areas of computer vision such as the autonomous navigation of mobile robots. In outdoor unstructured environments such as villages and deserts, the roads are usually not well-paved and have variant colors or texture distributions. Traditional region- or edge-based approaches, however, arc effective only in specific environments, and most of them have weak adaptability to varying road types and appearances. In this paper we describe a novel top-down based hybrid algorithm which properly combines both region and edge cues from the images. The main difference between our proposed algorithm and previous ones is that, before road detection, an off-line scene classifier is efficiently learned by both low- and high-level image cues to predict the unstructured road model. This scene classification can bc considered a decision process which guides the selection of the optimal solution from region- or edge-based approaches to detect the road. Moreover, a temporal smoothing mechanism is incorporated, which further makes both model prediction and region classification more stable. Experimental results demonstrate that compared with traditional region- and edge-based algorithms, our algorithm is more robust in detecting the road areas with diverse road types and varying appearances in unstructured conditions.  相似文献   

20.
Computer vision is concerned with extracting information about a scene by analyzing images of that scene. Performing any computer vision task requires an enormous amount of computation. Exploiting parallelism appears to be a promising way to improve the performance of computer vision systems. Past work in this area has focused on applying parallel processing techniques to image-operator level parallelism. In this article, we discuss the parallelism of computer vision in the control level and present a distributed image understanding system (DIUS).In DIUS, control-level parallelism is exploited by a dynamic scheduler. Furthermore, two levels of rules are used in the control mechanism. Meta-rules are concerned mainly with which strategy should be driven and the execution sequence of the system; control rules determine which task needs to be done next. A prototype system has been implemented within a parallel programming environment, Strand, which provides various virtual architectures mapping to either a shared-memory machine, Sequent, or to the Sun network.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号