首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The role of perceptual organization in motion analysis has heretofore been minimal. In this work we present a simple but powerful computational model and associated algorithms based on the use of perceptual organizational principles, such as temporal coherence (or common fate) and spatial proximity, for motion segmentation. The computational model does not use the traditional frame by frame motion analysis; rather it treats an image sequence as a single 3D spatio-temporal volume. It endeavors to find organizations in this volume of data over three levels—signal, primitive, and structural. The signal level is concerned with detecting individual image pixels that are probably part of a moving object. The primitive level groups these individual pixels into planar patches, which we call the temporal envelopes. Compositions of these temporal envelopes describe the spatio-temporal surfaces that result from object motion. At the structural level, we detect these compositions of temporal envelopes by utilizing the structure and organization among them. The algorithms employed to realize the computational model include 3D edge detection, Hough transformation, and graph based methods to group the temporal envelopes based on Gestalt principles. The significance of the Gestalt relationships between any two temporal envelopes is expressed in probabilistic terms. One of the attractive features of the adopted algorithm is that it does not require the detection of special 2D features or the tracking of these features across frames. We demonstrate that even with simple grouping strategies, we can easily handle drastic illumination changes, occlusion events, and multiple moving objects, without the use of training and specific object or illumination models. We present results on a large variety of motion sequences to demonstrate this robustness.  相似文献   

2.
In this paper, we derive new geometric invariants for structured 3D points and lines from single image under projective transform, and we propose a novel model-based 3D object recognition algorithm using them. Based on the matrix representation of the transformation between space features (points and lines) and the corresponding projected image features, new geometric invariants are derived via the determinant ratio technique. First, an invariant for six points on two adjacent planes is derived, which is shown to be equivalent to Zhu's result [1], but in simpler formulation. Then, two new geometric invariants for structured lines are investigated: one for five lines on two adjacent planes and the other for six lines on four planes. By using the derived invariants, a novel 3D object recognition algorithm is developed, in which a hashing technique with thresholds and multiple invariants for a model are employed to overcome the over-invariant and false alarm problems. Simulation results on real images show that the derived invariants remain stable even in a noisy environment, and the proposed 3D object recognition algorithm is quite robust and accurate.  相似文献   

3.
We present a new single-chip texture classifier based on the cellular neural network (CNN) architecture. Exploiting the dynamics of a locally interconnected 2D cell array of CNNs we have developed a theoretically new method for texture classification and segmentation. This technique differs from other convolution-based feature extraction methods since we utilize feedback convolution, and we use a genetic learning algorithm to determine the optimal kernel matrices of the network. The CNN operators we have found for texture recognition may combine different early vision effects. We show how the kernel matrices can be derived from the state equations of the network for convolution/deconvolution and nonlinear effects. The whole process includes histogram equalization of the textured images, filtering with the trained kernel matrices, and decision-making based on average gray-scale or texture energy of the filtered images. We present experimental results using digital CNN simulation with sensitivity analysis for noise, rotation, and scale. We also report a tested application performed on a programmable 22 × 20 CNN chip with optical inputs and an execution time of a few microseconds. We have found that this CNN chip with a simple 3 × 3 CNN kernel can reliably classify four textures. Using more templates for decision-making, we believe that more textures can be separated and adequate texture segmentation (< 1% error) can be achieved.  相似文献   

4.
This article proposes a method for the tracking of human limbs from multiocular sequences of perspective images. These limbs and the associated articulations must first be modelled. During the learning stage, we model the texture linked to the limbs. The lack of characteristic points on the skin is compensated by the wearing of nonrepetitive texture tights. The principle of the method is based on the interpretation of image textured patterns as the 3D perspective projections of points of the textured articulated model. An iterative Levenberg–Marquardt process is used to compute the model pose in accordance with the analyzed image. The calculated attitude is filtered (Kalman filter) to predict the model pose in the following image of the sequence. The image patterns are extracted locally according to the textured articulated model in the predicted attitude. Tracking experiments, illustrated in this paper by cycling sequences, demonstrate the validity of the approach.  相似文献   

5.
Thedistance transform(DT) is an image computation tool which can be used to extract the information about the shape and the position of the foreground pixels relative to each other. It converts a binary image into a grey-level image, where each pixel has a value corresponding to the distance to the nearest foreground pixel. The time complexity for computing the distance transform is fully dependent on the different distance metrics. Especially, the more exact the distance transform is, the worse execution time reached will be. Nowadays, quite often thousands of images are processed in a limited time. It seems quite impossible for a sequential computer to do such a computation for the distance transform in real time. In order to provide efficient distance transform computation, it is considerably desirable to develop a parallel algorithm for this operation. In this paper, based on the diagonal propagation approach, we first provide anO(N2) time sequential algorithm to compute thechessboard distance transform(CDT) of anN×Nimage, which is a DT using the chessboard distance metrics. Based on the proposed sequential algorithm, the CDT of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N2/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N2/log logN) processors, andO(logN) time on the hypercube computer usingO(N2/logN) processors. Following the mapping as proposed by Lee and Horng, the algorithm for the medial axis transform is also efficiently derived. The medial axis transform of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N2/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N2/log logN) processors, andO(logN) time on the hypercube computer usingO(N2/logN) processors. The proposed parallel algorithms are composed of a set of prefix operations. In each prefix operation phase, only increase (add-one) operation and minimum operation are employed. So, the algorithms are especially efficient in practical applications.  相似文献   

6.
This paper proposes a compression scheme for face profile images based on three stages, modelling, transformation, and the partially predictive classified vector quantization (CVQ) stage. The modelling stage employs deformable templates in the localisation of salient features of face images and in the normalization of the image content. The second stage uses a dictionary of feature-bases trained for profile face images to diagonalize the image blocks. At this stage, all normalized training and test images are spatially clustered (objectively) into four subregions according to their energy content, and the residuals of the most important clusters are further clustered (subjectively) in the spectral domain, to exploit spectral redundancies. The feature-basis functions are established with the region-based Karhunen–Loeve transform (RKLT) of clustered image blocks. Each image block is matched with a representative of near-best basis functions. A predictive approach is employed for mid-energy clusters, in both stages of search for a basis and for a codeword from the range of its cluster. The proposed scheme employs one stage of a cascaded region-based KLT-SVD and CVQ complex, followed by residual VQ stages for subjectively important regions. The first dictionary of feature-bases is dedicated to the main content of the image and the second is dedicated to the residuals. The proposed scheme is experimented in a set of human face images.  相似文献   

7.
The use of hypothesis verification is recurrent in the model-based recognition literature. Verification consists in measuring how many model features transformed by a pose coincide with some image features. When data involved in the computation of the pose are noisy, the pose is inaccurate and difficult to verify, especially when the objects are partially occluded. To address this problem, the noise in image features is modeled by a Gaussian distribution. A probabilistic framework allows the evaluation of the probability of a matching, knowing that the pose belongs to a rectangular volume of the pose space. It involves quadratic programming, if the transformation is affine. This matching probability is used in an algorithm computing the best pose. It consists in a recursive multiresolution exploration of the pose space, discarding outliers in the match data while the search is progressing. Numerous experimental results are described. They consist of 2D and 3D recognition experiments using the proposed algorithm.  相似文献   

8.
It is often difficult to come up with a well-principled approach to the selection of low-level features for characterizing images for content-based retrieval. This is particularly true for medical imagery, where gross characterizations on the basis of color and other global properties do not work. An alternative for medical imagery consists of the “scattershot” approach that first extracts a large number of features from an image and then reduces the dimensionality of the feature space by applying a feature selection algorithm such as the Sequential Forward Selection method.This contribution presents a better alternative to initial feature extraction for medical imagery. The proposed new approach consists of (i) eliciting from the domain experts (physicians, in our case) the perceptual categories they use to recognize diseases in images; (ii) applying a suite of operators to the images to detect the presence or the absence of these perceptual categories; (iii) ascertaining the discriminatory power of the perceptual categories through statistical testing; and, finally, (iv) devising a retrieval algorithm using the perceptual categories. In this paper we will present our proposed approach for the domain of high-resolution computed tomography (HRCT) images of the lung. Our empirical evaluation shows that feature extraction based on physicians' perceptual categories achieves significantly higher retrieval precision than the traditional scattershot approach. Moreover, the use of perceptually based features gives the system the ability to provide an explanation for its retrieval decisions, thereby instilling more confidence in its users.  相似文献   

9.
This paper presents an original method for analyzing, in an unsupervised way, images supplied by high resolution sonar. We aim at segmenting the sonar image into three kinds of regions: echo areas (due to the reflection of the acoustic wave on the object), shadow areas (corresponding to a lack of acoustic reverberation behind an object lying on the sea-bed), and sea-bottom reverberation areas. This unsupervised method estimates the parameters of noise distributions, modeled by a Weibull probability density function (PDF), and the label field parameters, modeled by a Markov random field (MRF). For the estimation step, we adopt a maximum likelihood technique for the noise model parameters and a least-squares method to estimate the MRF prior model. Then, in order to obtain an accurate segmentation map, we have designed a two-step process that finds the shadow and the echo regions separately, using the previously estimated parameters. First, we introduce a scale-causal and spatial model called SCM (scale causal multigrid), based on a multigrid energy minimization strategy, to find the shadow class. Second, we propose a MRF monoscale model using a priori information (at different level of knowledge) based on physical properties of each region, which allows us to distinguish echo areas from sea-bottom reverberation. This technique has been successfully applied to real sonar images and is compatible with automatic processing of massive amounts of data.  相似文献   

10.
A Continuous Probabilistic Framework for Image Matching   总被引:1,自引:0,他引:1  
In this paper we describe a probabilistic image matching scheme in which the image representation is continuous and the similarity measure and distance computation are also defined in the continuous domain. Each image is first represented as a Gaussian mixture distribution and images are compared and matched via a probabilistic measure of similarity between distributions. A common probabilistic and continuous framework is applied to the representation as well as the matching process, ensuring an overall system that is theoretically appealing. Matching results are investigated and the application to an image retrieval system is demonstrated.  相似文献   

11.
In this paper we present a novel approach for building detection from multiple aerial images in dense urban areas. The approach is based on accurate surface reconstruction, followed by extraction of building façades that are used as a main cue for building detection. For the façade detection, a simple but nevertheless flexible and robust algorithm is proposed. It is based on the observation that building façades correspond to the accumulation of 3D data, available from different views, in object space. Knowledge-driven thresholding of 3D data accumulators followed by Hough transform-based segment detection results in the extraction of façade positions. Three-dimensional planar regions resulting from surface reconstruction procedure and bounded by the extracted façades are detected as building hypotheses through testing a set of spatial criteria. Then, a set of verification criteria is proposed for the hypothesis confirmation.  相似文献   

12.
We present a method for automatically estimating the motion of an articulated object filmed by two or more fixed cameras. We focus our work on the case where the quality of the images is poor, and where only an approximation of a geometric model of the tracked object is available. Our technique uses physical forces applied to each rigid part of a kinematic 3D model of the object we are tracking. These forces guide the minimization of the differences between the pose of the 3D model and the pose of the real object in the video images. We use a fast recursive algorithm to solve the dynamical equations of motion of any 3D articulated model. We explain the key parts of our algorithms: how relevant information is extracted from the images, how the forces are created, and how the dynamical equations of motion are solved. A study of what kind of information should be extracted in the images and of when our algorithms fail is also presented. Finally we present some results about the tracking of a person. We also show the application of our method to the tracking of a hand in sequences of images, showing that the kind of information to extract from the images depends on their quality and of the configuration of the cameras.  相似文献   

13.
Aiming at the use of hand gestures for human–computer interaction, this paper presents a real-time approach to the spotting, representation, and recognition of hand gestures from a video stream. The approach exploits multiple cues including skin color, hand motion, and shape. Skin color analysis and coarse image motion detection are joined to perform reliable hand gesture spotting. At a higher level, a compact spatiotemporal representation is proposed for modeling appearance changes in image sequences containing hand gestures. The representation is extracted by combining robust parameterized image motion regression and shape features of a segmented hand. For efficient recognition of gestures made at varying rates, a linear resampling technique for eliminating the temporal variation (time normalization) while maintaining the essential information of the original gesture representations is developed. The gesture is then classified according to a training set of gestures. In experiments with a library of 12 gestures, the recognition rate was over 90%. Through the development of a prototype gesture-controlled panoramic map browser, we demonstrate that a vocabulary of predefined hand gestures can be used to interact successfully with applications running on an off-the-shelf personal computer equipped with a home video camera.  相似文献   

14.
One of the most interesting goals of computer vision is the 3D structure recovery of scenes. Traditionally, two cues are used: structure from motion and structure from stereo, two subfields with complementary sets of assumptions and techniques. This paper introduces a new general framework of cooperation between stereo and motion. This framework combines the advantages of both cues: (i) easy correspondence from motion and (ii) accurate 3D reconstruction from stereo. First, we show how the stereo matching can be recovered from motion correspondences using only geometric constraints. Second, we propose a method of 3D reconstruction of both binocular and monocular features using all stereo pairs in the case of a calibrated stereo rig. Third, we perform an analysis of the performance of the proposed framework as well as a comparison with an affine method. Experiments involving real and synthetic stereo pairs indicate that rich and reliable information can be derived from the proposed framework. They also indicate that robust 3D reconstruction can be obtained even with short image sequences.  相似文献   

15.
Convexity-Based Visual Camouflage Breaking   总被引:1,自引:0,他引:1  
Camouflage is frequently used by animals and humans (usually for military purposes) in order to conceal objects from visual surveillance or inspection. Most camouflage methods are based on superposing multiple edges on the object that is supposed to be hidden, such that its familiar contours and texture are masked. In this work, we present an operator, Darg, that is applied directly to the intensity image in order to detect 3D smooth convex (or equivalently: concave) objects. The operator maximally responds to a local intensity configuration that corresponds to curved 3D objects, and thus, is used to detect curved objects on a relatively flat background, regardless of image edges, contours, and texture. In that regard, we show that a typical camouflage found in some animal species seems to be a “counter measure” taken against detection that might be based on our method. Detection by Darg is shown to be very robust, from both theoretic considerations and practical examples of real-life images. As a part of the camouflage breaking demonstration, Darg, which is non-edge-based, is compared with a representative edge-based operator. Better performance is maintained by Darg for both animal and military camouflage breaking.  相似文献   

16.
This paper introduces formative processes, composed by transitive partitions. Given a family of sets, a formative process ending in the Venn partition Σ of is shown to exist. Sufficient criteria are also singled out for a transitive partition to model (via a function from set variables to unions of sets in the partition) all set-literals modeled by Σ. On the basis of such criteria a procedure is designed that mimics a given formative process by another where sets have finite rank bounded by C(|Σ|), with C a specific computable function. As a by-product, one of the core results on decidability in computable set theory is rediscovered, namely the one that regards the satisfiability of unquantified set-theoretic formulae involving Boolean operators, the singleton-former, and the powerset operator. The method described (which is able to exhibit a set-solution when the answer is affirmative) can be extended to solve the satisfiability problem for broader fragments of set theory.  相似文献   

17.
We present a simple algorithm for the Euclidean distance transform of a binary image that runs more efficiently than other algorithms in the literature. We show that our algorithm runs in optimal time for many architectures and has optimal cost for the RAM and EREW PRAM.  相似文献   

18.
This paper proposes a new method for reduction of the number of gray-levels in an image. The proposed approach achieves gray-level reduction using both the image gray-levels and additional local spatial features. Both gray-level and local feature values feed a self-organized neural network classifier. After training, the neurons of the output competition layer of the SOFM define the gray-level classes. The final image has not only the dominant image gray-levels, but also has a texture approaching the image local characteristics used. To split the initial classes further, the proposed technique can be used in an adaptive mode. To speed up the entire multithresholding algorithm and reduce memory requirements, a fractal scanning subsampling technique is adopted. The method is applicable to any type of gray-level image and can be easily modified to accommodate any type of spatial characteristic. Several experimental and comparative results, exhibiting the performance of the proposed technique, are presented.  相似文献   

19.
For decades, there has been an intensive research effort in the Computer Vision community to deal with video sequences. In this paper, we present a new method for recovering a maximum of information on displacement and projection parameters in monocular video sequences without calibration. This work follows previous studies on particular cases of displacement, scene geometry, and camera analysis and focuses on the particular forms of homographic matrices. It is already known that the number of particular cases involved in a complete study precludes an exhaustive test. To lower the algorithmic complexity, some authors propose to decompose all possible cases in a hierarchical tree data structure but these works are still in development (T. Viéville and D. Lingrand, Internat. J. Comput. Vision31, 1999, 5–L29). In this paper, we propose a new way to deal with the huge number of particular cases: (i) we use simple rules in order to eliminate some redundant cases and some physically impossible cases, and (ii) we divide the cases into subsets corresponding to particular forms determined by simple rules leading to a computationally efficient discrimination method. Finally, some experiments were performed on image sequences acquired either using a robotic system or manually in order to demonstrate that when several models are valid, the model with the fewer parameters gives the best estimation, regarding the free parameters of the problem. The experiments presented in this paper show that even if the selected case is an approximation of reality, the method is still robust.  相似文献   

20.
This paper describes the mathematical basis and application of a probabilistic model for recovering the direction of camera translation (heading) from optical flow. According to the theorem that heading cannot lie between two converging points in a stationary environment, one can compute the posterior probability distribution of heading across the image and choose the heading with maximum a posteriori (MAP). The model requires very simple computation, provides confidence level of the judgments, applies to both linear and curved trajectories, functions in the presence of camera rotations, and exhibited high accuracy up to 0.1°–0.2° in random dot simulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号