首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
6D object pose (3D rotation and translation) estimation from RGB-D image is an important and challenging task in computer vision and has been widely applied in a variety of applications such as robotic manipulation, autonomous driving, augmented reality etc. Prior works extract global feature or reason about local appearance from an individual frame, which neglect the spatial geometric relevance between two frames, limiting their performance for occluded or truncated objects in heavily cluttered scenes. In this paper, we present a dual-stream network for estimating 6D pose of a set of known objects from RGB-D images. Our novelty stands in contrast to prior work that learns latent geometric consistency in pairwise dense feature representations from multiple observations of the same objects in a self-supervised manner. We show in experiments that our method outperforms state-of-the-art approaches on 6D object pose estimation in two challenging datasets, YCB-Video and LineMOD.  相似文献   

2.
3.
The human visual system analyzes the complex scenes rapidly. It devotes the limited perceptual resources to the most salient subsets and/or objects of scenes while ignoring their less salient parts. Gaze prediction models try to predict the human eye fixations (human gaze) under free-viewing conditions while imitating the attentive mechanism. Previous studies on saliency benchmark datasets have shown that visual attention is affected by the salient objects of the scenes and their features. These features include the identity, the location, and the visual features of objects in the scenes, beside to the context of the input image. Moreover, the human eye fixations often converge to the specific parts of salient objects in the scenes. In this paper, we propose a deep gaze prediction model using object detection via image segmentation. It uses some deep neural modules to find the identity, location, and visual features of the salient objects in the scenes. In addition, we introduce a deep module to capture the prior bias of human eye fixations. To evaluate our model, several challenging saliency benchmark datasets are used in the experiments. We also conduct an ablation study to show the effectiveness of our proposed modules and its architecture. Despite its fewer parameters, our model has comparable, or even better performance on some datasets, to the state-of-the-art saliency models.  相似文献   

4.
Low cost RGB-D cameras such as the Microsoft’s Kinect or the Asus’s Xtion Pro are completely changing the computer vision world, as they are being successfully used in several applications and research areas. Depth data are particularly attractive and suitable for applications based on moving objects detection through foreground/background segmentation approaches; the RGB-D applications proposed in literature employ, in general, state of the art foreground/background segmentation techniques based on the depth information without taking into account the color information. The novel approach that we propose is based on a combination of classifiers that allows improving background subtraction accuracy with respect to state of the art algorithms by jointly considering color and depth data. In particular, the combination of classifiers is based on a weighted average that allows to adaptively modifying the support of each classifier in the ensemble by considering foreground detections in the previous frames and the depth and color edges. In this way, it is possible to reduce false detections due to critical issues that can not be tackled by the individual classifiers such as: shadows and illumination changes, color and depth camouflage, moved background objects and noisy depth measurements. Moreover, we propose, for the best of the author’s knowledge, the first publicly available RGB-D benchmark dataset with hand-labeled ground truth of several challenging scenarios to test background/foreground segmentation algorithms.  相似文献   

5.
This paper integrates fully automatic video object segmentation and tracking including detection and assignment of uncovered regions in a 2-D mesh-based framework. Particular contributions of this work are (i) a novel video object segmentation method that is posed as a constrained maximum contrast path search problem along the edges of a 2-D triangular mesh, and (ii) a 2-D mesh-based uncovered region detection method along the object boundary as well as within the object. At the first frame, an optimal number of feature points are selected as nodes of a 2-D content-based mesh. These points are classified as moving (foreground) and stationary nodes based on multi-frame node motion analysis, yielding a coarse estimate of the foreground object boundary. Color differences across triangles near the coarse boundary are employed for a maximum contrast path search along the edges of the 2-D mesh to refine the boundary of the video object. Next, we propagate the refined boundary to the subsequent frame by using motion vectors of the node points to form the coarse boundary at the next frame. We detect occluded regions by using motion-compensated frame differences and range filtered edge maps. The boundaries of detected uncovered regions are then refined by using the search procedure. These regions are either appended to the foreground object or tracked as new objects. The segmentation procedure is re-initialized when unreliable motion vectors exceed a certain number. The proposed scheme is demonstrated on several video sequences.  相似文献   

6.
基于模板匹配的视频对象分割   总被引:6,自引:1,他引:6  
宋立锋  韦岗  王群生 《电子学报》2002,30(7):1075-1078
视频对象分割是MPEG-4标准关键技术.本文结合模板匹配和基于运动估值和补偿的对象跟踪方法,提出了一种可以从复杂场景中分割出MPEG-4视频对象的新方法.在使用运动估值和补偿得到分割掩膜后,以初始帧对象颜色为模板,在当前帧的轮廓边界区域通过模板匹配检测对象,使轮廓精确化.本文方法在一定范围内有效解决了遮挡问题,并能够以初始帧跟踪任意长序列中的对象.  相似文献   

7.
Building consistent models of objects and scenes from moving sensors is an important prerequisite for many recognition, manipulation, and navigation tasks. Our approach integrates color and depth measurements seamlessly in a multi-resolution map representation. We process image sequences from RGB-D cameras and consider their typical noise properties. In order to align the images, we register view-based maps efficiently on a CPU using multi-resolution strategies. For simultaneous localization and mapping (SLAM), we determine the motion of the camera by registering maps of key views and optimize the trajectory in a probabilistic framework. We create object models and map indoor scenes using our SLAM approach which includes randomized loop closing to avoid drift. Camera motion relative to the acquired models is then tracked in real-time based on our registration method. We benchmark our method on publicly available RGB-D datasets, demonstrate accuracy, efficiency, and robustness of our method, and compare it with state-of-the-art approaches. We also report on several successful public demonstrations where it was used in mobile manipulation tasks.  相似文献   

8.
Two new region-based methods for video object tracking using active contours are presented. The first method is based on the assumption that the color histogram of the tracked object is nearly stationary from frame to frame. The proposed method is based on minimizing the color histogram difference between the estimated objects at a reference frame and the current frame using a dynamic programming framework. The second method is defined for scenes where there is an out-of-focus blur difference between the object of interest and the background. In such scenes, the proposed “defocus energy” can be utilized for automatic segmentation of the object boundary, and it can be combined with the histogram method to track the object more efficiently. Experiments demonstrate that the proposed methods are successful in difficult scenes with significant background clutter.  相似文献   

9.
10.
Depth segmentation has the challenge of separating the objects from their supporting surfaces in a noisy environment. To address the issue, a novel segmentation scheme based on disparity analysis is proposed. First, we transform a depth scene into the corresponding U-V disparity map. Then, we conduct a region-based detection method to divide the object region into several targets in the processed U-disparity map. Thirdly, the horizontal plane regions may be mapped as slant lines in the V-disparity map, the Random Sample Consensus (RANSAC) algorithm is improved to fit such multiple lines. Moreover, noise regions are reduced by image processing strategies during the above processes. We respectively evaluate our approach on both real-world scenes and public data sets to verify the flexibility and generalization. Sufficient experimental results indicate that the algorithm can efficiently segment and label a full-view scene into a group of valid regions as well as removing surrounding noise regions.  相似文献   

11.
复杂图象序列的自适应目标提取和跟踪方法   总被引:6,自引:3,他引:6  
张天序  戴可荣 《电子学报》1994,22(10):46-53
本文根据视知觉原理研究了提取和跟踪复杂图象中运动目标的计算模型和算法,分析和检验了改变模型有关参数对图象分割门限的影响。与某些常规算法不同的是,新方法综合考虑了目标一背景条件、视觉非线性、帧间相关性和差异性,目标提取作为一个完整瓣两步过程包括三个准则和一个快速寻优算法,目标跟踪使用二值模板匹配,给出了在可见光图象序列上的实验结果。  相似文献   

12.
Automatic object segmentation is a fundamentally difficult problem due to issues such as shadow, lighting, and semantic gaps. Edges play a critical role in object segmentation; however, it is almost impossible for the computer to know which edges correspond to object boundaries and which are caused by internal texture discontinuities. Active 3-D cameras, which provide streams of depth and RGB frames, are poised to become inexpensive and widespread. The depth discontinuities provide useful information for identifying object boundaries, which makes automatic object segmentation possible. However, the depth frames are extremely noisy. Also, the depth and RGB information often lose synchronization when the object is moving fast, due to different response time of the RGB and depth sensors. We show how to use the combined depth and RGB information to mitigate these problems and produce an accurate silhouette of the object. On a large dataset (24 objects with 1500 images), we provide both qualitative and quantitative evidences that our proposed techniques are effective.  相似文献   

13.
We propose a principled framework for recursively segmenting deformable objects across a sequence of frames. We demonstrate the usefulness of this method on left ventricular segmentation across a cardiac cycle. The approach involves a technique for learning the system dynamics together with methods of particle-based smoothing as well as nonparametric belief propagation on a loopy graphical model capturing the temporal periodicity of the heart. The dynamic system state is a low-dimensional representation of the boundary, and the boundary estimation involves incorporating curve evolution into recursive state estimation. By formulating the problem as one of state estimation, the segmentation at each particular time is based not only on the data observed at that instant, but also on predictions based on past and future boundary estimates. Although this paper focuses on left ventricle segmentation, the method generalizes to temporally segmenting any deformable object.  相似文献   

14.
为了从视频序列中分割出完整的、一致的运动视频对象,该文使用基于模糊聚类的分割算法获得组成对象边界的像素,从而提取对象。该算法首先使用了当前帧以及之前一些帧的图像信息计算其在小波域中不同子带的运动特征,并根据这些运动特征构造了低分辨率图像的运动特征矢量集;然后,使用模糊C-均值聚类算法分离出图像中发生显著变化的像素,以此代替帧间差图像,并利用传统的变化检测方法获得对象变化检测模型,从而提取对象;同时,使用相继两帧之间的平均绝对差值大小确定计算当前帧运动特征所需帧的数量,保证提取视频对象的精确性。实验结果证明该方法对于分割各种图像序列中的视频对象是有效的。  相似文献   

15.
We implement a video object segmentation system that integrates the novel concept of Voronoi Order with existing surface optimization techniques to support the MPEG-4 functionality of object-addressable video content in the form of video objects. The major enabling technology for the MPEG-4 standard are systems that compute video object segmentation, i.e., the extraction of video objects from a given video sequence. Our surface optimization formulation describes the video object segmentation problem in the form of an energy function that integrates many visual processing techniques. By optimizing this surface, we balance visual information against predictions of models with a priori information and extract video objects from a video sequence. Since the global optimization of such an energy function is still an open problem, we use Voronoi Order to decompose our formulation into a tractable optimization via dynamic programming within an iterative framework. In conclusion, we show the results of the system on the MPEG-4 test sequences, introduce a novel objective measure, and compare results against those that are hand-segmented by the MPEG-4 committee.  相似文献   

16.
In this work, we propose a panoptic segmentation model that integrates bottom-up and top-down methods. Our framework is designed to guarantee both the performance and the inference speed. We also focus on improving the quality of semantic and instance masks. The proposed auxiliary task with the silhouette-based enhanced features can help the model improve the prediction quality of mask contours. Additionally, we introduce a new mask quality score intending to solve the occlusion problem. The model has less chance of ignoring small objects, which often have lower confidence scores than larger objects behind them. The results show that the proposed mask quality score can better distinguish the priority of objects when the occlusion occurs. We demonstrate the results of our work on two datasets: the COCO dataset and the CityScapes dataset. Via our approach, we obtained competitive results with fast inference time.  相似文献   

17.
This paper presents a novel framework for detecting abandoned objects by introducing a fully-automatic GrabCut object segmentation. GrabCut seed initialization is treated as a background (BG) modelling problem that focuses only on unhanded objects and objects that become immobile. The BG distribution is constructed with dual Gaussian mixtures that are comprised of high and low learning rate models. We propose a primitive BG model-based removed object validation and Haar feature-based cascade classifier for still-people detection once a candidate for a released object has been detected. Our system can obtain more robust and accurate results for real environments based on evaluations of realistic scenes from CAVIAR, PETS2006, CDnet 2014, and our own datasets.  相似文献   

18.
1 IntroductionAutomaticsegmentationofmovingobjectsfromvideosequencesisadifficultandchallengingproblemincomputervisionsystems.Ithasmanyapplicationssuchasvideosurveillance,trafficmonitoring ,peopletrackingandvideocommunication[1~4] .Italsoplaysanimportantroleinsupportingcontent basedimagecoding,especiallyaftertheemergenceofthevideocodingstandardMPEG 4[5~ 1 4 ] .Therearealotofresearchworksonmovingob jectssegmentationandextraction .Thesealgorithmscanberoughlyclassifiedintotwocategories:inter …  相似文献   

19.
We present an unsupervised motion-based object segmentation algorithm for video sequences with moving camera, employing bidirectional inter-frame change detection. For every frame, two error frames are generated using motion compensation. They are combined and a segmentation algorithm based on thresholding is applied. We employ a simple and effective error fusion scheme and consider spatial error localization in the thresholding step. We find the optimal weights for the weighted mean thresholding algorithm that enables unsupervised robust moving object segmentation. Further, a post processing step for improving the temporal consistency of the segmentation masks is incorporated and thus we achieve improved performance compared to the previously proposed methods. The experimental evaluation and comparison with other methods demonstrate the validity of the proposed method.  相似文献   

20.
We propose a pattern classification based approach for simultaneous three-dimensional (3-D) object modeling and segmentation in image volumes. The 3-D objects are described as a set of overlapping ellipsoids. The segmentation relies on the geometrical model and graylevel statistics. The characteristic parameters of the ellipsoids and of the graylevel statistics are embedded in a radial basis function (RBF) network and they are found by means of unsupervised training. A new robust training algorithm for RBF networks based on alpha-trimmed mean statistics is employed in this study. The extension of the Hough transform algorithm in the 3-D space by employing a spherical coordinate system is used for ellipsoidal center estimation. We study the performance of the proposed algorithm and we present results when segmenting a stack of microscopy images.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号