首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes a new approach for multi-object 3D scene modeling. Scenes with multiple objects are characterized by object occlusions under several views, complex illumination conditions due to multiple reflections and shadows, as well as a variety of object shapes and surface properties. These factors raise huge challenges when attempting to model real 3D multi-object scene by using existing approaches which are designed mainly for single object modeling. The proposed method relies on the initialization provided by a rough 3D model of the scene estimated from the given set of multi-view images. The contributions described in this paper consists of two new methods for identifying and correcting errors in the reconstructed 3D scene. The first approach corrects the location of 3D patches from the scene after detecting the disparity between pairs of their projections into images. The second approach is called shape-from-contours and identifies discrepancies between projections of 3D objects and their corresponding contours, segmented from images. Both unsupervised and supervised segmentations are used to define the contours of objects.  相似文献   

2.
This paper presents an efficient image-based approach to navigate a scene based on only three wide-baseline uncalibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images, an accurate trifocal plane is extracted from the trifocal tensor of these three images. Next, based on a small number of feature marks using a friendly GUI, the correct dense disparity maps are obtained by using our trinocular-stereo algorithm. Employing the barycentric warping scheme with the computed disparity, we can generate an arbitrary novel view within a triangle spanned by three camera centers. Furthermore, after self-calibration of the cameras, 3D objects can be correctly augmented into the virtual environment synthesized by the tri-view morphing algorithm. Three applications of the tri-view morphing algorithm are demonstrated. The first one is 4D video synthesis, which can be used to fill in the gap between a few sparsely located video cameras to synthetically generate a video from a virtual moving camera. This synthetic camera can be used to view the dynamic scene from a novel view instead of the original static camera views. The second application is multiple view morphing, where we can seamlessly fly through the scene over a 2D space constructed by more than three cameras. The last one is dynamic scene synthesis using three still images, where several rigid objects may move in any orientation or direction. After segmenting three reference frames into several layers, the novel views in the dynamic scene can be generated by applying our algorithm. Finally, the experiments are presented to illustrate that a series of photo-realistic virtual views can be generated to fly through a virtual environment covered by several static cameras.  相似文献   

3.
This paper presents a novel method for virtual view synthesis that allows viewers to virtually fly through real soccer scenes, which are captured by multiple cameras in a stadium. The proposed method generates images of arbitrary viewpoints by view interpolation of real camera images near the chosen viewpoints. In this method, cameras do not need to be strongly calibrated since projective geometry between cameras is employed for the interpolation. For avoiding the complex and unreliable process of 3-D recovery, object scenes are segmented into several regions according to the geometric property of the scene. Dense correspondence between real views, which is necessary for intermediate view generation, is automatically obtained by applying projective geometry to each region. By superimposing intermediate images for all regions, virtual views for the entire soccer scene are generated. The efforts for camera calibration are reduced and correspondence matching requires no manual operation; hence, the proposed method can be easily applied to dynamic events in a large space. An application for fly-through observations of soccer match replays is introduced along with the algorithm of view synthesis and experimental results. This is a new approach for providing arbitrary views of an entire dynamic event.  相似文献   

4.
Multi-view object class recognition can be achieved using existing approaches for single-view object class recognition, by treating different views as entirely independent classes. This strategy requires a large amount of training data for many viewpoints, which can be costly to obtain. We describe a method for constructing a weak three-dimensional model from as few as two views of an object of the target class, and using that model to transform images of objects from one view to several other views, effectively multiplying their value for class recognition. Our approach can be coupled with any 2D image-based recognition system. We show that automatically transformed images dramatically decrease the data requirements for multi-view object class recognition.  相似文献   

5.
Presents an approach to free-form object modeling from multiple range images. In most conventional approaches, successive views are registered sequentially. In contrast to the sequential approaches, we propose an integral approach which reconstructs statistically optimal object models by simultaneously aggregating all data from multiple views into a weighted least-squares (WLS) formulation. The integral approach has two components. First, a global resampling algorithm constructs partial representations of the object from individual views, so that correspondence can be established among different views. Second, a weighted least-squares algorithm integrates resampled partial representations of multiple views, using the techniques of principal component analysis with missing data (PCAMD). Experiments show that our approach is robust against noise and mismatch  相似文献   

6.
The explosion of the Internet provides us with a tremendous resource of images shared online. It also confronts vision researchers the problem of finding effective methods to navigate the vast amount of visual information. Semantic image understanding plays a vital role towards solving this problem. One important task in image understanding is object recognition, in particular, generic object categorization. Critical to this problem are the issues of learning and dataset. Abundant data helps to train a robust recognition system, while a good object classifier can help to collect a large amount of images. This paper presents a novel object recognition algorithm that performs automatic dataset collecting and incremental model learning simultaneously. The goal of this work is to use the tremendous resources of the web to learn robust object category models for detecting and searching for objects in real-world cluttered scenes. Humans contiguously update the knowledge of objects when new examples are observed. Our framework emulates this human learning process by iteratively accumulating model knowledge and image examples. We adapt a non-parametric latent topic model and propose an incremental learning framework. Our algorithm is capable of automatically collecting much larger object category datasets for 22 randomly selected classes from the Caltech 101 dataset. Furthermore, our system offers not only more images in each object category but also a robust object category model and meaningful image annotation. Our experiments show that OPTIMOL is capable of collecting image datasets that are superior to the well known manually collected object datasets Caltech 101 and LabelMe.  相似文献   

7.
Point Signatures: A New Representation for 3D Object Recognition   总被引:12,自引:1,他引:11  
Few systems capable of recognizing complex objects with free-form (sculptured) surfaces have been developed. The apparent lack of success is mainly due to the lack of a competent modelling scheme for representing such complex objects. In this paper, a new form of point representation for describing 3D free-form surfaces is proposed. This representation, which we call the point signature, serves to describe the structural neighbourhood of a point in a more complete manner than just using the 3D coordinates of the point. Being invariant to rotation and translation, the point signature can be used directly to hypothesize the correspondence to model points with similar signatures. Recognition is achieved by matching the signatures of data points representing the sensed surface to the signatures of data points representing the model surface.The use of point signatures is not restricted to the recognition of a single-object scene to a small library of models. Instead, it can be extended naturally to the recognition of scenes containing multiple partially-overlapping objects (which may also be juxtaposed with each other) against a large model library. No preliminary phase of segmenting the scene into the component objects is required. In searching for the appropriate candidate model, recognition need not proceed in a linear order which can become prohibitive for a large model library. For a given scene, signatures are extracted at arbitrarily spaced seed points. Each of these signatures is used to vote for models that contain points having similar signatures. Inappropriate models with low votes can be rejected while the remaining candidate models are ordered according to the votes they received. In this way, efficient verification of the hypothesized candidates can proceed by testing the most likely model first. Experiments using real data obtained from a range finder have shown fast recognition from a library of fifteen models whose complexities vary from that of simple piecewise quadric shapes to complicated face masks. Results from the recognition of both single-object and multiple-object scenes are presented.  相似文献   

8.
Xinbo  Chunna   《Neurocomputing》2009,72(16-18):3742
This paper aims to address the face recognition problem with a wide variety of views. We proposed a tensor subspace analysis and view manifold modeling based multi-view face recognition algorithm by improving the TensorFace based one. Tensor subspace analysis is applied to separate the identity and view information of multi-view face images. To model the nonlinearity in view subspace, a novel view manifold is introduced to TensorFace. Thus, a uniform multi-view face model is achieved to deal with the linearity in identity subspace as well as the nonlinearity in view subspace. Meanwhile, a parameter estimation algorithm is developed to solve the view and identity factors automatically. The new face model yields improved facial recognition rates against the traditional TensorFace based method.  相似文献   

9.
Convolutional neural networks (CNNs) have had great success with regard to the object classification problem. For character classification, we found that training and testing using accurately segmented character regions with CNNs resulted in higher accuracy than when roughly segmented regions were used. Therefore, we expect to extract complete character regions from scene images. Text in natural scene images has an obvious contrast with its attachments. Many methods attempt to extract characters through different segmentation techniques. However, for blurred, occluded, and complex background cases, those methods may result in adjoined or over segmented characters. In this paper, we propose a scene word recognition model that integrates words from small pieces to entire after-cluster-based segmentation. The segmented connected components are classified as four types: background, individual character proposals, adjoined characters, and stroke proposals. Individual character proposals are directly inputted to a CNN that is trained using accurately segmented character images. The sliding window strategy is applied to adjoined character regions. Stroke proposals are considered as fragments of entire characters whose locations are estimated by a stroke spatial distribution system. Then, the estimated characters from adjoined characters and stroke proposals are classified by a CNN that is trained on roughly segmented character images. Finally, a lexicondriven integration method is performed to obtain the final word recognition results. Compared to other word recognition methods, our method achieves a comparable performance on Street View Text and the ICDAR 2003 and ICDAR 2013 benchmark databases. Moreover, our method can deal with recognizing text images of occlusion and improperly segmented text images.  相似文献   

10.
We present an appearance-based virtual view generation method that allows viewers to fly through a real dynamic scene. The scene is captured by multiple synchronized cameras. Arbitrary views are generated by interpolating two original camera-views near the given viewpoint. The quality of the generated synthetic view is determined by the precision, consistency and density of correspondences between the two images. All or most of previous work that uses interpolation extracts the correspondences from these two images. However, not only is it difficult to do so reliably (the task requires a good stereo algorithm), but also the two images alone sometimes do not have enough information, due to problems such as occlusion. Instead, we take advantage of the fact that we have many views, from which we can extract much more reliable and comprehensive 3D geometry of the scene as a 3D model. Dense and precise correspondences between the two images, to be used for interpolation, are obtained using this constructed 3D model.  相似文献   

11.
This paper proposes two approaches for utilizing the information in multiple entity groups and multiple views to reduce the number of hypotheses passed to the verification stage in a model-based object recognition system employing invariant feature indexing (P. J. Flynn and A. K. Jain, CVGIP: Image Understand. 55(2), 1992, 119-129). The first approach is based on a majority voting scheme that keeps track of the number of consistent votes cast by prototype hypotheses for particular object models. The second approach examines the consistency of estimated object pose from multiple groups of entities (surfaces) in one or more views. A salient feature of our system and experiment design compared to most existing 3D object recognition systems is our use of a large object database and a large number of test images. Monte Carlo experiments employing 585 single-view synthetic range images and 117 pairs of synthetic range images with a large CAD-based 3D object database (P. J. Flynn and A. K. Jain, IEEE Trans. Pattern Anal. Mach. Intell. 13(2), 1991, 114-132) show that a large number of hypotheses (about 60% for single views and 90% for multiple views on average) can be eliminated through use of these approaches. The techniques have also been tested on several real 3D objects sensed by a Technical Arts 100X range scanner to demonstrate a substantial improvement in recognition time.  相似文献   

12.
Evidence-based recognition of 3-D objects   总被引:1,自引:0,他引:1  
An evidence-based recognition technique is defined that identifies 3-D objects by looking for their notable features. This technique makes use of an evidence rule base, which is a set of salient or evidence conditions with corresponding evidence weights for various objects in the database. A measure of similarity between the set of observed features and the set of evidence conditions for a given object in the database is used to determine the identity of an object in the scene or reject the object(s) in the scene as unknown. This procedure has polynomial time complexity and correctly identifies a variety of objects in both synthetic and real range images. A technique for automatically deriving the evidence rule base from training views of objects is shown to generate evidence conditions that successfully identify new views of those objects  相似文献   

13.
Distinctive Image Features from Scale-Invariant Keypoints   总被引:517,自引:6,他引:517  
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.  相似文献   

14.
基于区域分割的水下目标实时识别系统   总被引:2,自引:2,他引:0  
王猛  杨杰  白洪亮 《计算机仿真》2005,22(8):101-105
在真实水下环境中,检测和识别水下日标一致是研究的重点。介绍了一种基于最优阈值分割算法的水下目标自动实时识别系统。首先运用去噪、图像均衡等方法对实时摄取的水下图像进行预处理,接着运用基于遗传算法优化的Otsu(即大津方法)最优阈值分割算法对所得图像进行区域分割,提取图像的特征向量,最后采用BP神经网络对提取的特征向量进行自动分类从而最终确定了水下目标的类型。水槽仿真试验表明系统能够在恶劣的环境下自动地检测水下目标,而且该方法具有较强的抗光线干扰能力和较高的准确度。  相似文献   

15.
A novel method for representing 3D objects that unifies viewer and model centered object representations is presented. A unified 3D frequency-domain representation, called volumetric frequency representation (VFR), encapsulates both the spatial structure of the object and a continuum of its views in the same data structure. The frequency-domain image of an object viewed from any direction can be directly extracted employing an extension of the projection slice theorem, where each Fourier-transformed view is a planar slice of the volumetric frequency representation. The VFR is employed for pose-invariant recognition of complex objects, such as faces. The recognition and pose estimation is based on an efficient matching algorithm in a four-dimensional Fourier space. Experimental examples of pose estimation and recognition of faces in various poses are also presented  相似文献   

16.
A spherical representation for recognition of free-form surfaces   总被引:3,自引:0,他引:3  
Introduces a new surface representation for recognizing curved objects. The authors approach begins by representing an object by a discrete mesh of points built from range data or from a geometric model of the object. The mesh is computed from the data by deforming a standard shaped mesh, for example, an ellipsoid, until it fits the surface of the object. The authors define local regularity constraints that the mesh must satisfy. The authors then define a canonical mapping between the mesh describing the object and a standard spherical mesh. A surface curvature index that is pose-invariant is stored at every node of the mesh. The authors use this object representation for recognition by comparing the spherical model of a reference object with the model extracted from a new observed scene. The authors show how the similarity between reference model and observed data can be evaluated and they show how the pose of the reference object in the observed scene can be easily computed using this representation. The authors present results on real range images which show that this approach to modelling and recognizing 3D objects has three main advantages: (1) it is applicable to complex curved surfaces that cannot be handled by conventional techniques; (2) it reduces the recognition problem to the computation of similarity between spherical distributions; in particular, the recognition algorithm does not require any combinatorial search; and (3) even though it is based on a spherical mapping, the approach can handle occlusions and partial views  相似文献   

17.
18.
We refer to the task of recovering the 3D structure of an object or a scene using 2D images as image-based modeling. In this paper, we formulate the task of recovering the 3D structure as a discrete optimization problem solved via energy minimization. In this standard framework of a Markov random field (MRF) defined over the image we present algorithms that allow the user to intuitively interact with the algorithm. We introduce an algorithm where the user guides the process of image-based modeling to find and model the object of interest by manually interacting with the nodes of the graph. We develop end user applications using this algorithm that allow object of interest 3D modeling on a mobile device and 3D printing of the object of interest. We also propose an alternate active learning algorithm that guides the user input. An initial attempt is made at reconstructing the scene without supervision. Given the reconstruction, an active learning algorithm uses intuitive cues to quantify the uncertainty of the algorithm and suggest regions, querying the user to provide support for the uncertain regions via simple scribbles. These constraints are used to update the unary and the pairwise energies that, when solved, lead to better reconstructions. We show through machine experiments and a user study that the proposed approach intelligently queries the users for constraints, and users achieve better reconstructions of the scene faster, especially for scenes with textureless surfaces lacking strong textural or structural cues that algorithms typically require.  相似文献   

19.
The focus of this paper is to design and implement a system capable of automatically reconstructing a prototype 3D model from a minimum number of range images of an object. Given an ideal 3D object model, the system iteratively renders range and intensity images of the model from a specified position, assimilates the range information into a prototype model, and determines the sensor pose (position and orientation) from which an optimal amount of previously unrecorded information may be acquired. Reconstruction is terminated when the model meets a given threshold of accuracy. Such a system has applications in the context of robot navigation, manufacturing, or hazardous materials handling. The system has been tested successfully on several synthetic data models, and each set of results was found to be reasonably consistent with an intuitive human search. The number of views necessary to reconstruct an adequate 3D prototype depends on the complexity of the object or scene and the initial data collected. The prototype models which the system recovers compare well with the ideal models  相似文献   

20.
结合纹理和分布特征的遥感图像群目标识别方法   总被引:4,自引:0,他引:4  
针对航空图像中的群目标提出结合纹理和分布特征的目标识别方法。该方法可以分为两步: 首先选取一组纹理特征,采用最大似然分类算法完成子目标区域的分割;然后基于分布特征从子目标区域中快速定位和识别群目标。实验表明,所选的纹理特征可以有效区分防护掩体与各种自然背景;提出的基于分布特征定位和识别目标的剪枝算法,与同类算法相比速度获得较大提高。对于多幅航空图像进行识别实验均得到满意的结果,表明这种方法可以有效的从复杂自然场景中快速识别出感兴趣的群目标。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号