Object segmentation of unknown objects with arbitrary shape in cluttered scenes is an ambitious goal in computer vision and became a great impulse with the introduction of cheap and powerful RGB-D sensors. We introduce a framework for segmenting RGB-D images where data is processed in a hierarchical fashion. After pre-clustering on pixel level parametric surface patches are estimated. Different relations between patch-pairs are calculated, which we derive from perceptual grouping principles, and support vector machine classification is employed to learn Perceptual Grouping. Finally, we show that object hypotheses generation with Graph-Cut finds a globally optimal solution and prevents wrong grouping. Our framework is able to segment objects, even if they are stacked or jumbled in cluttered scenes. We also tackle the problem of segmenting objects when they are partially occluded. The work is evaluated on publicly available object segmentation databases and also compared with state-of-the-art work of object segmentation. 相似文献
The authors used lexical decision in a dichotic listening situation and measured identity priming across channels to explore whether unattended stimuli can be processed lexically. In 6 experiments, temporal synchronization of prime and target words was manipulated, and acoustic saliency of the unattended prime was varied by embedding it in a carrier sentence or in babble speech. When the prime was acoustically salient, a cross-channel priming effect emerged, and participants were aware of the prime. When the prime was less salient, no identity priming was found, and participants failed to notice the prime. Saliency was manipulated in ways that did not degrade the prime. Results are inconsistent with models of late filtering, which predict equal priming irrespective of prime saliency. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献
In this paper, a novel face segmentation algorithm is proposed based on facial saliency map (FSM) for head-and-shoulder type video application. This method consists of three stages. The first stage is to generate the saliency map of input video image by our proposed facial attention model. In the second stage, a geometric model and an eye-map built from chrominance components are employed to localize the face region according to the saliency map. The third stage involves the adaptive boundary correction and the final face contour extraction. Based on the segmented result, an effective boundary saliency map (BSM) is then constructed, and applied for the tracking based segmentation of the successive frames. Experimental evaluation on test sequences shows that the proposed method is capable of segmenting the face area quite effectively. 相似文献
In the recent advancements in image and video analysis, the detection of salient regions in the image becomes the initial step. This plays a crucial role in deciding the performance of such algorithms. In this work, a Multi-Resolution Feature Extraction (MRFE) technique that makes use of Discrete Wavelet Convolutional Neural Network (DWCNN) for generating features is employed. An Enhanced Feature Extraction (EFE) module extracts additional features from the high level features of the DWCNN, which are used to frame both channel as well as spatial attention models for yielding contextual attention maps. A new hybrid loss function is also proposed, which is a combination of Balanced Cross Entropy (BCE) loss and Edge based Structural Similarity (ESSIM) loss that effectively identifies and segments the salient regions with clear boundaries. The method is tested exhaustively with five different benchmark datasets and is proved superior to the existing state-of-the-art methods with a minimum Mean Absolute error (MAE) of 0.03 and F-measure of 0.956. 相似文献
With the emerging development of three-dimensional (3D) related technologies, 3D visual saliency modeling is becoming particularly important and challenging. This paper presents a new depth perception and visual comfort guided saliency computational model for stereoscopic 3D images. The prominent advantage of the proposed model is that we incorporate the influence of depth perception and visual comfort on 3D visual saliency computation. The proposed saliency model is composed of three components: 2D image saliency, depth saliency and visual comfort based saliency. In the model, color saliency, texture saliency and spatial compactness are computed respectively and fused to derive 2D image saliency. Global disparity contrast is considered to compute depth saliency. Particularly, we train a visual comfort prediction function to distinguish stereoscopic image pair as high comfortable stereo viewing (HCSV) or low comfortable stereo viewing (LCSV), and devise different computational rules to generate a visual comfort based saliency map. The final 3D saliency map is obtained by using a linear combination and enhanced by a “saliency-center bias” model. Experimental results show that the proposed 3D saliency model outperforms the state-of-the-art models on predicting human eye fixations and visual comfort assessment. 相似文献
Detection and recognition of a stairway as upstairs, downstairs and negative (e.g., ladder, level ground) are the fundamentals of assisting the visually impaired to travel independently in unfamiliar environments. Previous studies have focused on using massive amounts of RGB-D scene data to train traditional machine learning (ML)-based models to detect and recognize stationary stairway and escalator stairway separately. Nevertheless, none of them consider jointly training these two similar but different datasets to achieve better performance. This paper applies an adversarial learning algorithm on the indicated unsupervised domain adaptation scenario to transfer knowledge learned from the labeled RGB-D escalator stairway dataset to the unlabeled RGB-D stationary dataset. By utilizing the developed method, a feedforward convolutional neural network (CNN)-based feature extractor with five convolution layers can achieve 100% classification accuracy on testing the labeled escalator stairway data distributions and 80.6% classification accuracy on testing the unlabeled stationary data distributions. The success of the developed approach is demonstrated for classifying stairway on these two domains with a limited amount of data. To further demonstrate the effectiveness of the proposed method, the same CNN model is evaluated without domain adaptation and the results are compared with those of the presented architecture. 相似文献
The Iterative Closest Point (ICP) scheme has been widely used for the registration of surfaces and point clouds. However, when working on depth image sequences where there are large geometric planes with small (or even without) details, existing ICP algorithms are prone to tangential drifting and erroneous rotational estimations due to input device errors. In this paper, we propose a novel ICP algorithm that aims to overcome such drawbacks, and provides significantly stabler registration estimation for simultaneous localization and mapping (SLAM) tasks on RGB-D camera inputs. In our approach, the tangential drifting and the rotational estimation error are reduced by: 1) updating the conventional Euclidean distance term with the local geometry information, and 2) introducing a new camera stabilization term that prevents improper camera movement in the calculation. Our approach is simple, fast, effective, and is readily integratable with previous ICP algorithms. We test our new method with the TUM RGB-D SLAM dataset on state-of-the-art real-time 3D dense reconstruction platforms, i.e., ElasticFusion and Kintinuous. Experiments show that our new strategy outperforms all previous ones on various RGB-D data sequences under different combinations of registration systems and solutions.