首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Visual context provides cues about an object’s presence, position and size within the observed scene, which should be used to increase the performance of object detection techniques. However, in computer vision, object detectors typically ignore this information. We therefore present a framework for visual-context-aware object detection. Methods for extracting visual contextual information from still images are proposed, which are then used to calculate a prior for object detection. The concept is based on a sparse coding of contextual features, which are based on geometry and texture. In addition, bottom-up saliency and object co-occurrences are exploited, to define auxiliary visual context. To integrate the individual contextual cues with a local appearance-based object detector, a fully probabilistic framework is established. In contrast to other methods, our integration is based on modeling the underlying conditional probabilities between the different cues, which is done via kernel density estimation. This integration is a crucial part of the framework which is demonstrated within the detailed evaluation. Our method is evaluated using a novel demanding image data set and compared to a state-of-the-art method for context-aware object detection. An in-depth analysis is given discussing the contributions of the individual contextual cues and the limitations of visual context for object detection.  相似文献   

2.
This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with the "missing data" approach for robust speech recognition in noise. Missing data time-frequency masks are created using probability distributions based on estimates of interaural time and level differences (ITD and ILD) for mixed utterances in reverberated conditions; these masks indicate which regions of the spectrum constitute reliable evidence of the target speech signal. A number of experiments compare the relative efficacy of the binaural cues when used individually and in combination. We also investigate the ability of the system to generalize to acoustic conditions not encountered during training. Performance on a continuous digit recognition task using this method is found to be good, even in a particularly challenging environment with three concurrent male talkers.  相似文献   

3.
The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images include occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many objects are present in the same scene. Several models for object categorization use appearance and context information from objects to improve recognition accuracy. Appearance information, based on visual cues, can successfully identify object classes up to a certain extent. Context information, based on the interaction among objects in the scene or global scene statistics, can help successfully disambiguate appearance inputs in recognition tasks. In this work we address the problem of incorporating different types of contextual information for robust object categorization in computer vision. We review different ways of using contextual information in the field of object categorization, considering the most common levels of extraction of context and the different levels of contextual interactions. We also examine common machine learning models that integrate context information into object recognition frameworks and discuss scalability, optimizations and possible future approaches.  相似文献   

4.
5.
Real-time structured light coding for adaptive patterns   总被引:2,自引:0,他引:2  
Coded structured light is a technique that allows the 3D reconstruction of poorly or non-textured scene areas. With the codes uniquely associated with visual primitives of the projected pattern, the correspondence problem is quickly solved by means of local information only, with robustness against disturbances like high surface curvatures, partial occlusions, out-of-field of view or out-of-focus. Real-time 3D reconstruction with one shot is possible with pseudo-random arrays, where the encoding is done in a single pattern using spatial neighbourhood. To correct more mismatched visual primitives and to get patterns globally more robust, a higher Hamming distance between all the used codewords should be suited. Recent works in the structured light field have shown a growing interest for adaptive patterns. These can account for geometrical or spectral specificities of the scene to provide better features matching and reconstructions. Up till today, such patterns cannot benefit from the robustness offered by spatial neighbourhood coding with a minimal Hamming distance constraint, because the existing algorithms for such a class of coding are designed with an offline coding only. In this article, we show that due to two new contributions, a mixed exploration/exploitation search behaviour and a O(n 2) to ~O(n) complexity reduction using the epipolar constraint, the real-time coding of patterns having similar properties than those coded offline can be achieved. This allows to design a complete closed-loop processing pipeline for adaptive patterns.  相似文献   

6.
We address the problem of localizing and obtaining high-resolution footage of the people present in a scene. We propose a biologically-inspired solution combining pre-attentive, low-resolution sensing for detection with shiftable, high-resolution, attentive sensing for confirmation and further analysis. The detection problem is made difficult by the unconstrained nature of realistic environments and human behaviour, and the low resolution of pre-attentive sensing. Analysis of human peripheral vision suggests a solution based on integration of relatively simple but complementary cues. We develop a Bayesian approach involving layered probabilistic modeling and spatial integration using a flexible norm that maximizes the statistical power of both dense and sparse cues. We compare the statistical power of several cues and demonstrate the advantage of cue integration. We evaluate the Bayesian cue integration method for human detection on a labelled surveillance database and find that it outperforms several competing methods based on conjunctive combinations of classifiers (e.g., Adaboost). We have developed a real-time version of our pre-attentive human activity sensor that generates saccadic targets for an attentive foveated vision system. Output from high-resolution attentive detection algorithms and gaze state parameters are fed back as statistical priors and combined with pre-attentive cues to determine saccadic behaviour. The result is a closed-loop system that fixates faces over a 130 deg field of view, allowing high-resolution capture of facial video over a large dynamic scene.  相似文献   

7.
Understanding human behaviour is a high level perceptual problem, one which is often dominated by the contextual knowledge of the environment, and where concerns such as occlusion, scene clutter and high within-class variations are commonplace. Nonetheless, such understanding is highly desirable for automated visual surveillance. We consider this problem in a context of a workflow analysis within an industrial environment. The hierarchical nature of the workflow is exploited to split the problem into ‘activity’ and ‘task’ recognition. In this, sequences of low level activities are examined for instances of a task while the remainder are labelled as background. An initial prediction of activity is obtained using shape and motion based features of the moving blob of interest. A sequence of these activities is further adjusted by a probabilistic analysis of transitions between activities using hidden Markov models (HMMs). In task detection, HMMs are arranged to handle the activities within each task. Two separate HMMs for task and background compete for an incoming sequence of activities. Imagery derived from a camera mounted overhead the target scene has been chosen over the more conventional oblique views (from the side) as this view does not suffer from as much occlusion, and it poses a manageable detection and tracking problem while still retaining powerful cues as to the workflow patterns. We evaluate our approach both in activity and task detection on a challenging dataset of surveillance of human operators in a car manufacturing plant. The experimental results show that our hierarchical approach can automatically segment the timeline and spatially localize a series of predefined tasks that are performed to complete a workflow.  相似文献   

8.
复杂场景下实现快速稳定地自适应跟踪是视觉领域亟需解决的课题之一, 利用目标的多特征信息进行高效融合是提升跟踪算法鲁棒性能的重要途径。本文首先基于DST(Dempster-Shafer Theory)和PCR5(Proportional Conflict Redistribution No.5)设计一种新的合并策略融合运动目标的颜色和纹理特征,其次在粒子滤波框架下建立复杂场景下的多目标自适应跟踪模型,最终实现了复杂场景下多特征信息融合的自适应视觉跟踪。实验结果及性能分析表明,该方法在不良的跟踪条件下,高冲突证据的自适应处理能力得到明显改善,有效提高了粒子的使用效率和跟踪的鲁棒性,可以较好实现复杂场景下准确、稳定地多目标跟踪。  相似文献   

9.
Semantic scene classification based only on low-level vision cues has had limited success on unconstrained image sets. On the other hand, camera metadata related to capture conditions provide cues independent of the captured scene content that can be used to improve classification performance. We consider three problems, indoor-outdoor classification, sunset detection, and manmade-natural classification. Analysis of camera metadata statistics for images of each class revealed that metadata fields, such as exposure time, flash fired, and subject distance, are most discriminative for each problem. A Bayesian network is employed to fuse content-based and metadata cues in the probability domain and degrades gracefully even when specific metadata inputs are missing (a practical concern). Finally, we provide extensive experimental results on the three problems using content-based and metadata cues to demonstrate the efficacy of the proposed integrated scene classification scheme.  相似文献   

10.
Most successful approaches on scene recognition tend to efficiently combine global image features with spatial local appearance and shape cues. On the other hand, less attention has been devoted for studying spatial texture features within scenes. Our method is based on the insight that scenes can be seen as a composition of micro-texture patterns. This paper analyzes the role of texture along with its spatial layout for scene recognition. However, one main drawback of the resulting spatial representation is its huge dimensionality. Hence, we propose a technique that addresses this problem by presenting a compact Spatial Pyramid (SP) representation. The basis of our compact representation, namely, Compact Adaptive Spatial Pyramid (CASP) consists of a two-stages compression strategy. This strategy is based on the Agglomerative Information Bottleneck (AIB) theory for (i) compressing the least informative SP features, and, (ii) automatically learning the most appropriate shape for each category. Our method exceeds the state-of-the-art results on several challenging scene recognition data sets.  相似文献   

11.
Robust camera pose and scene structure analysis for service robotics   总被引:1,自引:0,他引:1  
Successful path planning and object manipulation in service robotics applications rely both on a good estimation of the robot’s position and orientation (pose) in the environment, as well as on a reliable understanding of the visualized scene. In this paper a robust real-time camera pose and a scene structure estimation system is proposed. First, the pose of the camera is estimated through the analysis of the so-called tracks. The tracks include key features from the imaged scene and geometric constraints which are used to solve the pose estimation problem. Second, based on the calculated pose of the camera, i.e. robot, the scene is analyzed via a robust depth segmentation and object classification approach. In order to reliably segment the object’s depth, a feedback control technique at an image processing level has been used with the purpose of improving the robustness of the robotic vision system with respect to external influences, such as cluttered scenes and variable illumination conditions. The control strategy detailed in this paper is based on the traditional open-loop mathematical model of the depth estimation process. In order to control a robotic system, the obtained visual information is classified into objects of interest and obstacles. The proposed scene analysis architecture is evaluated through experimental results within a robotic collision avoidance system.  相似文献   

12.
Video provides strong cues for automatic road extraction that are not available in static aerial images. In video from a static camera, or stabilized (or geo-referenced) aerial video data, motion patterns within a scene enable function attribution of scene regions. A “road”, for example, may be defined as a path of consistent motion — a definition which is valid in a large and diverse set of environments. The spatio-temporal structure tensor field is an ideal representation of the image derivative distribution at each pixel because it can be updated in real time as video is acquired. An eigen-decomposition of the structure tensor encodes both the local scene motion and the variability in the motion. Additionally, the structure tensor field can be factored into motion components, allowing explicit determination of traffic patterns in intersections. Example results of a real time system are shown for an urban scene with both well-traveled and infrequently traveled roads, indicating that both can be discovered simultaneously. The method is ideal in urban traffic scenes, which are the most difficult to analyze using static imagery.  相似文献   

13.
Falls have been reported as the leading cause of injury-related visits to emergency departments and the primary etiology of accidental deaths in elderly. Thus, the development of robust home surveillance systems is of great importance. In this article, such a system is presented, which tries to address the fall detection problem through visual cues. The proposed methodology utilizes a fast, real-time background subtraction algorithm, based on motion information in the scene and pixels intensity, capable to operate properly in dynamically changing visual conditions, in order to detect the foreground object. At the same time, it exploits 3D space’s measures, through automatic camera calibration, to increase the robustness of fall detection algorithm which is based on semi-supervised learning approach. The above system uses a single monocular camera and is characterized by minimal computational cost and memory requirements that make it suitable for real-time large scale implementations.  相似文献   

14.
In this paper, we propose an efficient and robust method for multiple targets tracking in cluttered scenes using multiple cues. Our approach combines the use of Monte Carlo sequential filtering for tracking and Dezert-Smarandache theory (DSmT) to integrate the information provided by the different cues. The use of DSmT provides the necessary framework to quantify and overcome the conflict that might appear between the cues due to the occlusion. Our tracking approach is tested with color and location cues on a cluttered scene where multiple targets are involved in partial or total occlusion.  相似文献   

15.
This paper presents a new method of grouping and matching line segments to recognize objects. We propose a dynamic programming-based formulation extracting salient line patterns by defining a robust and stable geometric representation that is based on perceptual organizations. As the endpoint proximity, we detect several junctions from image lines. We then search for junction groups by using the collinear constraint between the junctions. Junction groups similar to the model are searched in the scene, based on a local comparison. A DP-based search algorithm reduces the time complexity for the search of the model lines in the scene. The system is able to find reasonable line groups in a short time.  相似文献   

16.
With technology scaling, crosstalk fault has become a serious problem in reliable data transfer through Network on Chip (NoC) channels. The effects of crosstalk fault depend on transition patterns appearing on the wires of NoC channels. Among these patterns, Triplet Opposite Direction (TOD) imposes the worst crosstalk effects. Crosstalk Avoidance Codes (CACs) are the overhead-efficient mechanisms to tackle TODs. The main problem of CACs is their high imposed overheads to NoC routers. To solve this problem, this paper proposes an overhead-efficient coding mechanism called Penultimate-Subtracted Fibonacci (PS-Fibo) to alleviate crosstalk faults in NoC wires. PS-Fibo coding mechanism benefits the novel numerical system that not only completely removes TODs but also, is applicable to a wide range of NoC channel widths. The PS-Fibo coding mechanism is evaluated using BookSim-2 and VHDL-based simulations in the terms of codec efficiency on the crosstalk fault reduction, codec power consumption, codec area occupation and network performance. Evaluation results, carried out for a wide range of NoC channel widths indicate that PS-Fibo can improve power consumption and area occupations of codec and NoC performance with respect to the other state-of-the-art coding mechanisms.  相似文献   

17.
This paper addresses the problem of wireless transmission of a captured scene from multiple cameras, which do not communicate among each other, to a joint decoder. Correlation among different camera views calls for distributed source coding for efficient multiview image compression. The fact that cameras are placed within a short range of each other results in a high level of interference, multipath fading, and noise effects during communications. We develop a novel two-camera system, that employs multiterminal source coding and complete complementary data spreading, so that while the former technique exploits the statistical correlation between camera views, and performs joint compression to reduce transmission rates, the spreading technique will protect transmitted data by mitigating the effects of wireless fading channels. Our results indicate that the proposed system is competitive when compared to two independently JPEG encoded streams at low to medium transmission rates.  相似文献   

18.
在三维提升视频编码框架中,视频运动场景切换时相邻视频帧之间的时间相关性将显著减弱,使得解码视频图像在场景切换处质量急剧下降.针对这一问题,提出了一种新的基于视频亮度分量的场景切换检测方法,并根据场景切换自适应分配图像组(groupof picture GOP)大小.实验结果表明,该自适应分配GOP策略有效提高了三维提升小波视频图像的编解码质量,降低了场景切换对视频编码的影响.  相似文献   

19.
在虚拟现实等技术领域中,都涉及到由现实世界中的实际景物建立对应的计算机描述的虚拟景物的问题,为此提出了利用计算机视觉与CAD几何建模技术相结合的三维珠体建模途径,首先通过编码光栅方法获取三维物体的深度图象,并采用数学形态学的方法加以分割,然后利用代数曲面拟合手段对分割后的三维曲面片进行重建,并使用CAD几何建模工具由重建的曲面片构成物体的几何模型,该文给出了初步的实验结果,证明所提出的技术途径基本可行。  相似文献   

20.
Natural scene features stabilize and extend the tracking range of augmented reality (AR) pose-tracking systems. We develop robust computer vision methods to detect and track natural features in video images. Point and region features are automatically and adaptively selected for properties that lead to robust tracking. A multistage tracking algorithm produces accurate motion estimates, and the entire system operates in a closed-loop that stabilizes its performance and accuracy. We present demonstrations of the benefits of using tracked natural features for AR applications that illustrate direct scene annotation, pose stabilization, and extendible tracking range. Our system represents a step toward integrating vision with graphics to produce robust wide-area augmented realities  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号