首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The role of perceptual organization in motion analysis has heretofore been minimal. In this work we present a simple but powerful computational model and associated algorithms based on the use of perceptual organizational principles, such as temporal coherence (or common fate) and spatial proximity, for motion segmentation. The computational model does not use the traditional frame by frame motion analysis; rather it treats an image sequence as a single 3D spatio-temporal volume. It endeavors to find organizations in this volume of data over three levels—signal, primitive, and structural. The signal level is concerned with detecting individual image pixels that are probably part of a moving object. The primitive level groups these individual pixels into planar patches, which we call the temporal envelopes. Compositions of these temporal envelopes describe the spatio-temporal surfaces that result from object motion. At the structural level, we detect these compositions of temporal envelopes by utilizing the structure and organization among them. The algorithms employed to realize the computational model include 3D edge detection, Hough transformation, and graph based methods to group the temporal envelopes based on Gestalt principles. The significance of the Gestalt relationships between any two temporal envelopes is expressed in probabilistic terms. One of the attractive features of the adopted algorithm is that it does not require the detection of special 2D features or the tracking of these features across frames. We demonstrate that even with simple grouping strategies, we can easily handle drastic illumination changes, occlusion events, and multiple moving objects, without the use of training and specific object or illumination models. We present results on a large variety of motion sequences to demonstrate this robustness.  相似文献   

2.
The use of hypothesis verification is recurrent in the model-based recognition literature. Verification consists in measuring how many model features transformed by a pose coincide with some image features. When data involved in the computation of the pose are noisy, the pose is inaccurate and difficult to verify, especially when the objects are partially occluded. To address this problem, the noise in image features is modeled by a Gaussian distribution. A probabilistic framework allows the evaluation of the probability of a matching, knowing that the pose belongs to a rectangular volume of the pose space. It involves quadratic programming, if the transformation is affine. This matching probability is used in an algorithm computing the best pose. It consists in a recursive multiresolution exploration of the pose space, discarding outliers in the match data while the search is progressing. Numerous experimental results are described. They consist of 2D and 3D recognition experiments using the proposed algorithm.  相似文献   

3.
In this paper, we derive new geometric invariants for structured 3D points and lines from single image under projective transform, and we propose a novel model-based 3D object recognition algorithm using them. Based on the matrix representation of the transformation between space features (points and lines) and the corresponding projected image features, new geometric invariants are derived via the determinant ratio technique. First, an invariant for six points on two adjacent planes is derived, which is shown to be equivalent to Zhu's result [1], but in simpler formulation. Then, two new geometric invariants for structured lines are investigated: one for five lines on two adjacent planes and the other for six lines on four planes. By using the derived invariants, a novel 3D object recognition algorithm is developed, in which a hashing technique with thresholds and multiple invariants for a model are employed to overcome the over-invariant and false alarm problems. Simulation results on real images show that the derived invariants remain stable even in a noisy environment, and the proposed 3D object recognition algorithm is quite robust and accurate.  相似文献   

4.
In computer vision, motion analysis is a fundamental problem. Applying the concepts of congruence checking in computational geometry and geometric hashing, which is a technique used for the recognition of partially occluded objects from noisy data, we present a new random sampling approach for the estimation of the motion parameters in two- and three-dimensional Euclidean spaces of both a completely measured rigid object and a partially occluded rigid object. We assume that the two- and three-dimensional positions of the vertices of the object in each image frame are determined using appropriate methods such as a range sensor or stereo techniques. We also analyze the relationships between the quantization errors and the errors in the estimation of the motion parameters by random sampling, and we show that the solutions obtained using our algorithm converge to the true solutions if the resolution of the digitalization is increased.  相似文献   

5.
In recent years there has been an increased interest in the modeling and recognition of human activities involving highly structured and semantically rich behavior such as dance, aerobics, and sign language. A novel approach for automatically acquiring stochastic models of the high-level structure of an activity without the assumption of any prior knowledge is presented. The process involves temporal segmentation into plausible atomic behavior components and the use of variable-length Markov models for the efficient representation of behaviors. Experimental results that demonstrate the synthesis of realistic sample behaviors and the performance of models for long-term temporal prediction are presented.  相似文献   

6.
Aiming at the use of hand gestures for human–computer interaction, this paper presents a real-time approach to the spotting, representation, and recognition of hand gestures from a video stream. The approach exploits multiple cues including skin color, hand motion, and shape. Skin color analysis and coarse image motion detection are joined to perform reliable hand gesture spotting. At a higher level, a compact spatiotemporal representation is proposed for modeling appearance changes in image sequences containing hand gestures. The representation is extracted by combining robust parameterized image motion regression and shape features of a segmented hand. For efficient recognition of gestures made at varying rates, a linear resampling technique for eliminating the temporal variation (time normalization) while maintaining the essential information of the original gesture representations is developed. The gesture is then classified according to a training set of gestures. In experiments with a library of 12 gestures, the recognition rate was over 90%. Through the development of a prototype gesture-controlled panoramic map browser, we demonstrate that a vocabulary of predefined hand gestures can be used to interact successfully with applications running on an off-the-shelf personal computer equipped with a home video camera.  相似文献   

7.
8.
The major challenge that faces American Sign Language (ASL) recognition now is developing methods that will scale well with increasing vocabulary size. Unlike in spoken languages, phonemes can occur simultaneously in ASL. The number of possible combinations of phonemes is approximately 1.5×109, which cannot be tackled by conventional hidden Markov model-based methods. Gesture recognition, which is less constrained than ASL recognition, suffers from the same problem. In this paper we present a novel framework to ASL recognition that aspires to being a solution to the scalability problems. It is based on breaking down the signs into their phonemes and modeling them with parallel hidden Markov models. These model the simultaneous aspects of ASL independently. Thus, they can be trained independently, and do not require consideration of the different combinations at training time. We show in experiments with a 22-sign-vocabulary how to apply this framework in practice. We also show that parallel hidden Markov models outperform conventional hidden Markov models.  相似文献   

9.
A generic integrated line detection algorithm (GILDA) is presented and demonstrated. GILDA is based on the generic graphics recognition approach, which abstracts the graphics recognition as a stepwise recovery of the multiple components of the graphic objects and is specified by the object–process methodology. We define 12 classes of lines which appear in engineering drawings and use them to construct a class inheritance hierarchy. The hierarchy highly abstracts the line features that are relevant to the line detection process. Based on the “Hypothesis and Test” paradigm, lines are detected by a stepwise extension to both ends of a selected first key component. In each extension cycle, one new component which best meets the current line's shape and style constraints is appended to the line. Different line classes are detected by controlling the line attribute values. As we show in the experiments, the algorithm demonstrates high performance on clear synthetic drawings as well as on noisy, complex, real-world drawings.  相似文献   

10.
It is often difficult to come up with a well-principled approach to the selection of low-level features for characterizing images for content-based retrieval. This is particularly true for medical imagery, where gross characterizations on the basis of color and other global properties do not work. An alternative for medical imagery consists of the “scattershot” approach that first extracts a large number of features from an image and then reduces the dimensionality of the feature space by applying a feature selection algorithm such as the Sequential Forward Selection method.This contribution presents a better alternative to initial feature extraction for medical imagery. The proposed new approach consists of (i) eliciting from the domain experts (physicians, in our case) the perceptual categories they use to recognize diseases in images; (ii) applying a suite of operators to the images to detect the presence or the absence of these perceptual categories; (iii) ascertaining the discriminatory power of the perceptual categories through statistical testing; and, finally, (iv) devising a retrieval algorithm using the perceptual categories. In this paper we will present our proposed approach for the domain of high-resolution computed tomography (HRCT) images of the lung. Our empirical evaluation shows that feature extraction based on physicians' perceptual categories achieves significantly higher retrieval precision than the traditional scattershot approach. Moreover, the use of perceptually based features gives the system the ability to provide an explanation for its retrieval decisions, thereby instilling more confidence in its users.  相似文献   

11.
A Continuous Probabilistic Framework for Image Matching   总被引:1,自引:0,他引:1  
In this paper we describe a probabilistic image matching scheme in which the image representation is continuous and the similarity measure and distance computation are also defined in the continuous domain. Each image is first represented as a Gaussian mixture distribution and images are compared and matched via a probabilistic measure of similarity between distributions. A common probabilistic and continuous framework is applied to the representation as well as the matching process, ensuring an overall system that is theoretically appealing. Matching results are investigated and the application to an image retrieval system is demonstrated.  相似文献   

12.
Constructive Hypervolume Modeling   总被引:1,自引:0,他引:1  
This paper deals with modeling point sets with attributes. A point set in a geometric space of an arbitrary dimension is a geometric model of a real/abstract object or process under consideration. An attribute is a mathematical model of an object property of arbitrary nature (material, photometric, physical, statistical, etc.) defined at any point of the point set. We provide a brief survey of different modeling techniques related to point sets with attributes. It spans such different areas as solid modeling, heterogeneous objects modeling, scalar fields or “implicit surface” modeling and volume graphics. Then, on the basis of this survey we formulate requirements to a general model of hypervolumes (multidimensional point sets with multiple attributes). A general hypervolume model and its components such as objects, operations, and relations are introduced and discussed. A function representation (FRep) is used as the basic model for the point set geometry and attributes represented independently using real-valued scalar functions of several variables. Each function defining the geometry or an attribute is evaluated at the given point by a procedure traversing a constructive tree structure with primitives in the leaves and operations in the nodes of the tree. This reflects the constructive nature of the symmetric approach to modeling geometry and associated attributes in multidimensional space. To demonstrate a particular application of the proposed general model, we consider in detail the problem of texturing, introduce a model of constructive hypervolume texture, and then discuss its implementation, as well as the special modeling language we used for modeling hypervolume objects.  相似文献   

13.
Assuming planar 4-connectivity and spatial 6-connectivity, we first introduce the curvature indices of the boundary of a discrete object, and, using these indices of points, we define the vertex angles of discrete surfaces as an extension of the chain codes of digital curves. Second, we prove the relation between the number of point indices and the numbers of holes, genus, and cavities of an object. This is the angular Euler characteristic of a discrete object. Third, we define quasi-objects as the connected simplexes. Geometric relations between discrete quasi-objects and discrete objects permit us to define the Euler characteristic for the planar 8-connected, and the spatial 18- and 26-connected objects using these for the planar 4-connected and the spatial 6-connected objects. Our results show that the planar 4-connectivity and the spatial 6-connectivity define the Euler characteristics of point sets in a discrete space. Finally, we develop an algorithm for the computation of these characteristics of discrete objects.  相似文献   

14.
This paper describes the mathematical basis and application of a probabilistic model for recovering the direction of camera translation (heading) from optical flow. According to the theorem that heading cannot lie between two converging points in a stationary environment, one can compute the posterior probability distribution of heading across the image and choose the heading with maximum a posteriori (MAP). The model requires very simple computation, provides confidence level of the judgments, applies to both linear and curved trajectories, functions in the presence of camera rotations, and exhibited high accuracy up to 0.1°–0.2° in random dot simulations.  相似文献   

15.
This paper describes the theory and algorithms of distance transform for fuzzy subsets, called fuzzy distance transform (FDT). The notion of fuzzy distance is formulated by first defining the length of a path on a fuzzy subset and then finding the infimum of the lengths of all paths between two points. The length of a path π in a fuzzy subset of the n-dimensional continuous space n is defined as the integral of fuzzy membership values along π. Generally, there are infinitely many paths between any two points in a fuzzy subset and it is shown that the shortest one may not exist. The fuzzy distance between two points is defined as the infimum of the lengths of all paths between them. It is demonstrated that, unlike in hard convex sets, the shortest path (when it exists) between two points in a fuzzy convex subset is not necessarily a straight line segment. For any positive number θ≤1, the θ-support of a fuzzy subset is the set of all points in n with membership values greater than or equal to θ. It is shown that, for any fuzzy subset, for any nonzero θ≤1, fuzzy distance is a metric for the interior of its θ-support. It is also shown that, for any smooth fuzzy subset, fuzzy distance is a metric for the interior of its 0-support (referred to as support). FDT is defined as a process on a fuzzy subset that assigns to a point its fuzzy distance from the complement of the support. The theoretical framework of FDT in continuous space is extended to digital cubic spaces and it is shown that for any fuzzy digital object, fuzzy distance is a metric for the support of the object. A dynamic programming-based algorithm is presented for computing FDT of a fuzzy digital object. It is shown that the algorithm terminates in a finite number of steps and when it does so, it correctly computes FDT. Several potential applications of fuzzy distance transform in medical imaging are presented. Among these are the quantification of blood vessels and trabecular bone thickness in the regime of limited special resolution where these objects become fuzzy.  相似文献   

16.
Face Detection: A Survey   总被引:5,自引:0,他引:5  
In this paper we present a comprehensive and critical survey of face detection algorithms. Face detection is a necessary first-step in face recognition systems, with the purpose of localizing and extracting the face region from the background. It also has several applications in areas such as content-based image retrieval, video coding, video conferencing, crowd surveillance, and intelligent human–computer interfaces. However, it was not until recently that the face detection problem received considerable attention among researchers. The human face is a dynamic object and has a high degree of variability in its apperance, which makes face detection a difficult problem in computer vision. A wide variety of techniques have been proposed, ranging from simple edge-based algorithms to composite high-level approaches utilizing advanced pattern recognition methods. The algorithms presented in this paper are classified as either feature-based or image-based and are discussed in terms of their technical approach and performance. Due to the lack of standardized tests, we do not provide a comprehensive comparative evaluation, but in cases where results are reported on common datasets, comparisons are presented. We also give a presentation of some proposed applications and possible application areas.  相似文献   

17.
In this paper we present a novel approach for building detection from multiple aerial images in dense urban areas. The approach is based on accurate surface reconstruction, followed by extraction of building façades that are used as a main cue for building detection. For the façade detection, a simple but nevertheless flexible and robust algorithm is proposed. It is based on the observation that building façades correspond to the accumulation of 3D data, available from different views, in object space. Knowledge-driven thresholding of 3D data accumulators followed by Hough transform-based segment detection results in the extraction of façade positions. Three-dimensional planar regions resulting from surface reconstruction procedure and bounded by the extracted façades are detected as building hypotheses through testing a set of spatial criteria. Then, a set of verification criteria is proposed for the hypothesis confirmation.  相似文献   

18.
This paper presents a general information-theoretic approach for obtaining lower bounds on the number of examples required for Probably Approximately Correct (PAC) learning in the presence of noise. This approach deals directly with the fundamental information quantities, avoiding a Bayesian analysis. The technique is applied to several different models, illustrating its generality and power. The resulting bounds add logarithmic factors to (or improve the constants in) previously known lower bounds.  相似文献   

19.
We present a method for automatically estimating the motion of an articulated object filmed by two or more fixed cameras. We focus our work on the case where the quality of the images is poor, and where only an approximation of a geometric model of the tracked object is available. Our technique uses physical forces applied to each rigid part of a kinematic 3D model of the object we are tracking. These forces guide the minimization of the differences between the pose of the 3D model and the pose of the real object in the video images. We use a fast recursive algorithm to solve the dynamical equations of motion of any 3D articulated model. We explain the key parts of our algorithms: how relevant information is extracted from the images, how the forces are created, and how the dynamical equations of motion are solved. A study of what kind of information should be extracted in the images and of when our algorithms fail is also presented. Finally we present some results about the tracking of a person. We also show the application of our method to the tracking of a hand in sequences of images, showing that the kind of information to extract from the images depends on their quality and of the configuration of the cameras.  相似文献   

20.
We generalize here the use of the 1D Boolean model for the analysis of grey level textures. Each grey image is first split into eight binary images using different criteria. Each of these binary images is separately analysed with the help of the 1D Boolean model and features are extracted from it. The final grey texture recognition is performed on the basis of these features using several classification criteria. Experiments have been carried out using an image database of 30 grey level textures, all of them with 512×512 pixels in size, obtaining correct classification rates between 95% and 100%, according to the classification criterion used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号