首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. A context model can rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit of context models has been limited because most of the previous methods were tested on data sets with only a few object categories, in which most images contain one or two object categories. In this paper, we introduce a new data set with images that contain many instances of different object categories, and propose an efficient model that captures the contextual information among more than a hundred object categories using a tree structure. Our model incorporates global image features, dependencies between object categories, and outputs of local detectors into one probabilistic framework. We demonstrate that our context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories. In addition, our model can be applied to scene understanding tasks that local detectors alone cannot solve, such as detecting objects out of context or querying for the most typical and the least typical scenes in a data set.  相似文献   

2.
In this paper, we present a new framework for three-dimensional (3D) reconstruction of multiple rigid objects from dynamic scenes. Conventional 3D reconstruction from multiple views is applicable to static scenes, in which the configuration of objects is fixed while the images are taken. In our framework, we aim to reconstruct the 3D models of multiple objects in a more general setting where the configuration of the objects varies among views. We solve this problem by object-centered decomposition of the dynamic scenes using unsupervised co-recognition approach. Unlike conventional motion segmentation algorithms that require small motion assumption between consecutive views, co-recognition method provides reliable accurate correspondences of a same object among unordered and wide-baseline views. In order to segment each object region, we benefit from the 3D sparse points obtained from the structure-from-motion. These points are reliable and serve as automatic seed points for a seeded-segmentation algorithm. Experiments on various real challenging image sequences demonstrate the effectiveness of our approach, especially in the presence of abrupt independent motions of objects.  相似文献   

3.
毛凌  解梅 《计算机应用研究》2013,30(11):3514-3517
图像语义分割方法大多基于点对条件随机场模型, 不能定位到单个目标, 并且难以利用全局形状特征, 造成误识。针对这些问题, 提出一种新的高阶条件随机场模型, 将基于全局形状特征的目标检测结果和点对条件随机场模型统一在一个概率模型框架中, 同时完成图像分割、目标检测与识别的任务。利用目标检测器和前背景分割算法获取图像中目标区域, 在目标区域上定义新的高阶能量项。新的高阶条件随机场模型就是高阶能量项和点对条件随机场模型的加权混合模型, 其最优解即为图像语义分割结果。在MSRC-21类数据库上进行的实验验证了该模型能够显著提升图像语义分割性能, 并定位到单个目标。  相似文献   

4.
提出了一种利用视频图像对运动目标进行实时检测与跟踪的新方法.该方法利用基于改进的时间片的运动历史图像(tMHI)的灰度阶梯轮廓方法对多个运动目标进行检测,通过卡尔曼滤波器对多目标进行跟踪,并得到了各个运动目标的轨迹曲线,进而实现了对视频图像中多目标的跟踪.同时,该方法对多个目标的遮挡问题获得了明显的改善效果.实验结果表明,该方法能够对复杂场景下的多个目标进行有效的识别和准确的跟踪,系统的实时性强,识别率高,而且该方法对于复杂视频监视系统场景中的光照变化、雨雾等干扰具有较强的稳健性.  相似文献   

5.
In this work, we formulate the interaction between image segmentation and object recognition in the framework of the Expectation-Maximization (EM) algorithm. We consider segmentation as the assignment of image observations to object hypotheses and phrase it as the E-step, while the M-step amounts to fitting the object models to the observations. These two tasks are performed iteratively, thereby simultaneously segmenting an image and reconstructing it in terms of objects. We model objects using Active Appearance Models (AAMs) as they capture both shape and appearance variation. During the E-step, the fidelity of the AAM predictions to the image is used to decide about assigning observations to the object. For this, we propose two top-down segmentation algorithms. The first starts with an oversegmentation of the image and then softly assigns image segments to objects, as in the common setting of EM. The second uses curve evolution to minimize a criterion derived from the variational interpretation of EM and introduces AAMs as shape priors. For the M-step, we derive AAM fitting equations that accommodate segmentation information, thereby allowing for the automated treatment of occlusions. Apart from top-down segmentation results, we provide systematic experiments on object detection that validate the merits of our joint segmentation and recognition approach.  相似文献   

6.
We propose incorporating semantic topic information into a hierarchical conditional random fields (CRFs) framework to promote object recognition and retrieval accuracy. Specially, we devise convenient yet effective methods based on multiple segmentations to perform accurate image retrieval tasks for rigid and amorphous man-made objects. Through a robust topic consistency potential (RTCP) modelling approach, we perform accurate multi-class segmentation on high-resolution remote-sensing images. The generated segments can be readily used for object recognition and discovery. We report satisfactory the performance on two sets of high-resolution remote-sensing images that cover a highly populated urban area and a rural area, respectively. Experimental results demonstrate that our approach outperforms the state-of-the-art CRF models, due to its ability to capture inherent semantic information for efficient object recognition and boundary discovery.  相似文献   

7.
In this work we present an improvement to the popular Active Appearance Model (AAM) algorithm, that we call the Multiple-Levelset AAM (MLA). The MLA can simultaneously segment multiple objects, and makes use of multiple levelsets, rather than anatomical landmarks, to define the shapes. AAMs traditionally define the shape of each object using a set of anatomical landmarks. However, landmarks can be difficult to identify, and AAMs traditionally only allow for segmentation of a single object of interest. The MLA, which is a landmark independent AAM, allows for levelsets of multiple objects to be determined and allows for them to be coupled with image intensities. This gives the MLA the flexibility to simulataneously segmentation multiple objects of interest in a new image.In this work we apply the MLA to segment the prostate capsule, the prostate peripheral zone (PZ), and the prostate central gland (CG), from a set of 40 endorectal, T2-weighted MRI images. The MLA system we employ in this work leverages a hierarchical segmentation framework, so constructed as to exploit domain specific attributes, by utilizing a given prostate segmentation to help drive the segmentations of the CG and PZ, which are embedded within the prostate. Our coupled MLA scheme yielded mean Dice accuracy values of .81, .79 and .68 for the prostate, CG, and PZ, respectively using a leave-one-out cross validation scheme over 40 patient studies. When only considering the midgland of the prostate, the mean DSC values were .89, .84, and .76 for the prostate, CG, and PZ respectively.  相似文献   

8.
In this paper, we propose a general framework for fusing bottom-up segmentation with top-down object behavior inference over an image sequence. This approach is beneficial for both tasks, since it enables them to cooperate so that knowledge relevant to each can aid in the resolution of the other, thus enhancing the final result. In particular, the behavior inference process offers dynamic probabilistic priors to guide segmentation. At the same time, segmentation supplies its results to the inference process, ensuring that they are consistent both with prior knowledge and with new image information. The prior models are learned from training data and they adapt dynamically, based on newly analyzed images. We demonstrate the effectiveness of our framework via particular implementations that we have employed in the resolution of two hand gesture recognition applications. Our experimental results illustrate the robustness of our joint approach to segmentation and behavior inference in challenging conditions involving complex backgrounds and occlusions of the target object.  相似文献   

9.
Segmentation is an important problem in various applications. There exist many effective models designed to locate all features and their boundaries in an image. However such global models are not suitable for automatically detecting a single object among many objects of an image, because nearby objects are often selected as well. Several recent works can provide selective segmentation capability but unfortunately when generalized to three dimensions, they are not yet effective or efficient. This paper presents a selective segmentation model which is inherently suited for efficient implementation. With the added solver by a fast nonlinear multigrid method for the inside domain of a zero level set function, the over methodology leads to an effective and efficient algorithm for 3D selective segmentation. Numerical experiments show that our model can produce efficient results in terms of segmentation quality and reliability for a large class of 3D images.  相似文献   

10.
目的 图像协同分割技术是通过多幅参考图像以实现前景目标与背景区域的分离,并已被广泛应用于图像分类和目标识别等领域中。不过,现有多数的图像协同分割算法只适用于背景变化较大且前景几乎不变的环境。为此,提出一种新的无监督协同分割算法。方法 本文方法是无监督式的,在分级图像分割的基础上通过渐进式优化框架分别实现前景和背景模型的更新估计,同时结合图像内部和不同图像之间的分级区域相似度关联进一步增强上述模型估计的鲁棒性。该无监督的方法不需要进行预先样本学习,能够同时处理两幅或多幅图像且适用于同时存在多个前景目标的情况,并且能够较好地适应前景物体类的变化。结果 通过基于iCoseg和MSRC图像集的实验证明,该算法无需图像间具有显著的前景和背景差异这一约束,与现有的经典方法相比更适用于前景变化剧烈以及同时存在多个前景目标等更为一般化的图像场景中。结论 该方法通过对分级图像分割得到的超像素外观分布分别进行递归式估计来实现前景和背景的有效区分,并同时融合了图像内部以及不同图像区域之间的区域关联性来增加图像前景和背景分布估计的一致性。实验表明当前景变化显著时本文方法相比于现有方法具有更为鲁棒的表现。  相似文献   

11.
We present a framework for segmentation of multiple objects whose shapes are similar but image qualities are different. Our framework is based on the snake or active contour method, in which a new kind of energy called “group energy” is introduced. The group energy is used to handle the sharing of properties across multiple objects and also to allow contours of objects with good image qualities to be used as reference contours for remaining objects during optimization. In this framework, we also deal with rotations among similar objects by applying group energy after removing the rotation offset. Comprehensive testing has been performed on synthetic and real images, demonstrating that our framework has significantly better performance of segmentation compared to the original (individual) snake.  相似文献   

12.

This work presents the design of a real-time system to model visual objects with the use of self-organising networks. The architecture of the system addresses multiple computer vision tasks such as image segmentation, optimal parameter estimation and object representation. We first develop a framework for building non-rigid shapes using the growth mechanism of the self-organising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from information-theoretic considerations. We present experimental results for hands and faces, and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product. The proposed method is easily extensible to 3D objects, as it offers similar features for efficient mesh reconstruction.

  相似文献   

13.
This paper proposes a dynamic conditional random field (DCRF) model for foreground object and moving shadow segmentation in indoor video scenes. Given an image sequence, temporal dependencies of consecutive segmentation fields and spatial dependencies within each segmentation field are unified by a dynamic probabilistic framework based on the conditional random field (CRF). An efficient approximate filtering algorithm is derived for the DCRF model to recursively estimate the segmentation field from the history of observed images. The foreground and shadow segmentation method integrates both intensity and gradient features. Moreover, models of background, shadow, and gradient information are updated adaptively for nonstationary background processes. Experimental results show that the proposed approach can accurately detect moving objects and their cast shadows even in monocular grayscale video sequences.  相似文献   

14.
15.
16.
Transformer模型在自然语言处理领域取得了很好的效果,同时因其能够更好地连接视觉和语言,也激发了计算机视觉界的极大兴趣。本文总结了视觉Transformer处理多种识别任务的百余种代表性方法,并对比分析了不同任务内的模型表现,在此基础上总结了每类任务模型的优点、不足以及面临的挑战。根据识别粒度的不同,分别着眼于诸如图像分类、视频分类的基于全局识别的方法,以及目标检测、视觉分割的基于局部识别的方法。考虑到现有方法在3种具体识别任务的广泛流行,总结了在人脸识别、动作识别和姿态估计中的方法。同时,也总结了可用于多种视觉任务或领域无关的通用方法的研究现状。基于Transformer的模型实现了许多端到端的方法,并不断追求准确率与计算成本的平衡。全局识别任务下的Transformer模型对补丁序列切分和标记特征表示进行了探索,局部识别任务下的Transformer模型因能够更好地捕获全局信息而取得了较好的表现。在人脸识别和动作识别方面,注意力机制减少了特征表示的误差,可以处理丰富多样的特征。Transformer可以解决姿态估计中特征错位的问题,有利于改善基于回归的方法性能,还减少了三维估计时深度映射所产生的歧义。大量探索表明视觉Transformer在识别任务中的有效性,并且在特征表示或网络结构等方面的改进有利于提升性能。  相似文献   

17.
18.
Object segmentation is essential for systems that acquire object models online for robotic grasping. However, it remains a major technical challenge in visually complex and uncontrolled environments. Segmentation algorithms that rely on image features alone can perform poorly under certain lighting conditions, or if the object and the background have similar appearance. In parallel, known object segmentation algorithms that rely exclusively on three dimensional (3D) geometric data are derived under strong assumptions about the geometry of the scene. A promising approach to performing object segmentation is to use a combination of appearance and 3D features. In this paper, an object segmentation algorithm is presented that combines multiple appearance and geometric cues. The segmentation is formulated as a binary labeling problem. The Conditional Random Fields (CRF) framework is used to model the conditional probability of the labeling given the appearance and geometric data. The maximum a posteriori estimation of the labeling is obtained by minimizing the energy function corresponding to the CRF using graph cuts. A simple and efficient method for initializing the proposed algorithm is also presented. Experimental results have demonstrated the effectiveness of the proposed algorithm.  相似文献   

19.
We propose a method for converting a single image of a transparent object into multi-view photo that enables users observing the object from multiple new angles, without inputting any 3D shape. The complex light paths formed by refraction and reflection makes it challenging to compute the lighting effects of transparent objects from a new angle. We construct an encoder–decoder network for normal reconstruction and texture extraction, which enables synthesizing novel views of transparent object from a set of new views and new environment maps using only one RGB image. By simultaneously considering the optical transmission and perspective variation, our network learns the characteristics of optical transmission and the change of perspective as guidance to the conversion from RGB colours to surface normals. A texture extraction subnetwork is proposed to alleviate the contour loss phenomenon during normal map generation. We test our method using 3D objects within and without our training data, including real 3D objects that exists in our lab, and completely new environment maps that we take using our phones. The results show that our method performs better on view synthesis of transparent objects in complex scenes using only a single-view image.  相似文献   

20.
改进K-means活动轮廓模型   总被引:1,自引:0,他引:1       下载免费PDF全文
目的 通过对C-V模型能量泛函的Euler-Lagrange方程进行变形,建立其与K-means方法的等价关系,提出一种新的基于水平集函数的改进K-means活动轮廓模型。方法 该模型包含局部自适应权重矩阵函数,它根据像素点所在邻域的局部统计信息自适应地确定各个像素点的分割阈值,排除灰度非同质对分割目标的影响,进而实现对灰度非同质图像的精确分割。结果 通过分析对合成以及自然图像的分割结果,与传统及最新经典的活动轮廓模型相比,新模型不仅能较准确地分割灰度非同质图像,而且降低了对初始曲线选取的敏感度。结论 提出了包含权重矩阵函数的新活动轮廓模型,根据分割目的和分割图像性质,制定不同的权重函数,该模型具有广泛的适用性。文中给出的一种具有局部统计特性的权重函数,对灰度非同质图像的效果较好,且对初始曲线位置具有稳定性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号