首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The role of perceptual organization in motion analysis has heretofore been minimal. In this work we present a simple but powerful computational model and associated algorithms based on the use of perceptual organizational principles, such as temporal coherence (or common fate) and spatial proximity, for motion segmentation. The computational model does not use the traditional frame by frame motion analysis; rather it treats an image sequence as a single 3D spatio-temporal volume. It endeavors to find organizations in this volume of data over three levels—signal, primitive, and structural. The signal level is concerned with detecting individual image pixels that are probably part of a moving object. The primitive level groups these individual pixels into planar patches, which we call the temporal envelopes. Compositions of these temporal envelopes describe the spatio-temporal surfaces that result from object motion. At the structural level, we detect these compositions of temporal envelopes by utilizing the structure and organization among them. The algorithms employed to realize the computational model include 3D edge detection, Hough transformation, and graph based methods to group the temporal envelopes based on Gestalt principles. The significance of the Gestalt relationships between any two temporal envelopes is expressed in probabilistic terms. One of the attractive features of the adopted algorithm is that it does not require the detection of special 2D features or the tracking of these features across frames. We demonstrate that even with simple grouping strategies, we can easily handle drastic illumination changes, occlusion events, and multiple moving objects, without the use of training and specific object or illumination models. We present results on a large variety of motion sequences to demonstrate this robustness.  相似文献   

2.
3.
4.
An atomic representation of a Herbrand model (ARM) is a finite set of (not necessarily ground) atoms over a given Herbrand universe. Each ARM represents a possibly infinite Herbrand interpretation. This concept has emerged independently in different branches of computer science as a natural and useful generalization of the concept of finite Herbrand interpretation. It was shown that several recursively decidable problems on finite Herbrand models (or interpretations) remain decidable on ARMs.The following problems are essential when working with ARMs: Deciding the equivalence of two ARMs, deciding subsumption between ARMs, and evaluating clauses over ARMs. These problems were shown to be decidable, but their computational complexity has remained obscure so far. The previously published decision algorithms require exponential space. In this paper, we prove that all mentioned problems are coNP-complete.  相似文献   

5.
We present a method for automatically estimating the motion of an articulated object filmed by two or more fixed cameras. We focus our work on the case where the quality of the images is poor, and where only an approximation of a geometric model of the tracked object is available. Our technique uses physical forces applied to each rigid part of a kinematic 3D model of the object we are tracking. These forces guide the minimization of the differences between the pose of the 3D model and the pose of the real object in the video images. We use a fast recursive algorithm to solve the dynamical equations of motion of any 3D articulated model. We explain the key parts of our algorithms: how relevant information is extracted from the images, how the forces are created, and how the dynamical equations of motion are solved. A study of what kind of information should be extracted in the images and of when our algorithms fail is also presented. Finally we present some results about the tracking of a person. We also show the application of our method to the tracking of a hand in sequences of images, showing that the kind of information to extract from the images depends on their quality and of the configuration of the cameras.  相似文献   

6.
The major challenge that faces American Sign Language (ASL) recognition now is developing methods that will scale well with increasing vocabulary size. Unlike in spoken languages, phonemes can occur simultaneously in ASL. The number of possible combinations of phonemes is approximately 1.5×109, which cannot be tackled by conventional hidden Markov model-based methods. Gesture recognition, which is less constrained than ASL recognition, suffers from the same problem. In this paper we present a novel framework to ASL recognition that aspires to being a solution to the scalability problems. It is based on breaking down the signs into their phonemes and modeling them with parallel hidden Markov models. These model the simultaneous aspects of ASL independently. Thus, they can be trained independently, and do not require consideration of the different combinations at training time. We show in experiments with a 22-sign-vocabulary how to apply this framework in practice. We also show that parallel hidden Markov models outperform conventional hidden Markov models.  相似文献   

7.
For decades, there has been an intensive research effort in the Computer Vision community to deal with video sequences. In this paper, we present a new method for recovering a maximum of information on displacement and projection parameters in monocular video sequences without calibration. This work follows previous studies on particular cases of displacement, scene geometry, and camera analysis and focuses on the particular forms of homographic matrices. It is already known that the number of particular cases involved in a complete study precludes an exhaustive test. To lower the algorithmic complexity, some authors propose to decompose all possible cases in a hierarchical tree data structure but these works are still in development (T. Viéville and D. Lingrand, Internat. J. Comput. Vision31, 1999, 5–L29). In this paper, we propose a new way to deal with the huge number of particular cases: (i) we use simple rules in order to eliminate some redundant cases and some physically impossible cases, and (ii) we divide the cases into subsets corresponding to particular forms determined by simple rules leading to a computationally efficient discrimination method. Finally, some experiments were performed on image sequences acquired either using a robotic system or manually in order to demonstrate that when several models are valid, the model with the fewer parameters gives the best estimation, regarding the free parameters of the problem. The experiments presented in this paper show that even if the selected case is an approximation of reality, the method is still robust.  相似文献   

8.
Vector-City Vector Distance Transform   总被引:1,自引:0,他引:1  
This paper will examine the current chamfer and vector distance transforms for encoding objects as distance fields. A new vector distance transform is introduced which uses the city-block chamfer distance transform as a basis. Detailed error analysis using real CT data is presented, demonstrating the improved accuracy of the new approach over existing methods. The production of a subvoxel accurate distance field is also demonstrated by employing an improved classification. Distance fields are shown for skull and chess piece datasets.  相似文献   

9.
This paper proposes a new method for reduction of the number of gray-levels in an image. The proposed approach achieves gray-level reduction using both the image gray-levels and additional local spatial features. Both gray-level and local feature values feed a self-organized neural network classifier. After training, the neurons of the output competition layer of the SOFM define the gray-level classes. The final image has not only the dominant image gray-levels, but also has a texture approaching the image local characteristics used. To split the initial classes further, the proposed technique can be used in an adaptive mode. To speed up the entire multithresholding algorithm and reduce memory requirements, a fractal scanning subsampling technique is adopted. The method is applicable to any type of gray-level image and can be easily modified to accommodate any type of spatial characteristic. Several experimental and comparative results, exhibiting the performance of the proposed technique, are presented.  相似文献   

10.
We propose a sculpture metaphor based on a multiresolution volumetric representation. It allows the user to model both precise and coarse features while maintaining interactive updates and display rates. The modelled surface is an iso-surface of a scalar field, which is sampled on an adaptive hierarchical grid that dynamically subdivides or undivides itself. Field modifications are transparent to the user: The user feels as if he were directly interacting with the surface via a tool that either adds or removes “material.” Meanwhile, the tool modifies the scalar field around the surface, its size and shape automatically guiding the underlying grid subdivision. In order to give an interactive feedback whatever the tool's size, tools are applied in an adaptive way, the grid being always updated from coarse to fine levels. This maintains interactive rates even for large tool sizes. It also enables the user to continuously apply a tool, with an immediate coarse-scale feedback of the multiple actions being provided. A dynamic level-of-detail (LOD) mechanism ensures that the iso-surface is displayed at interactive rates regardeless of the zoom value; surface elements, generated and stored at each level of resolution, are displayed depending on their size on the screen. The system may switch to a coarser surface display during user actions, thus always ensuring interactive visual feedback. Two applications illustrate the use of this system: First, complex shapes with both coarse and fine features can be sculpted from scratch. Second, we show that the system can be used to edit models that have been converted from a mesh representation.  相似文献   

11.
Aiming at the use of hand gestures for human–computer interaction, this paper presents a real-time approach to the spotting, representation, and recognition of hand gestures from a video stream. The approach exploits multiple cues including skin color, hand motion, and shape. Skin color analysis and coarse image motion detection are joined to perform reliable hand gesture spotting. At a higher level, a compact spatiotemporal representation is proposed for modeling appearance changes in image sequences containing hand gestures. The representation is extracted by combining robust parameterized image motion regression and shape features of a segmented hand. For efficient recognition of gestures made at varying rates, a linear resampling technique for eliminating the temporal variation (time normalization) while maintaining the essential information of the original gesture representations is developed. The gesture is then classified according to a training set of gestures. In experiments with a library of 12 gestures, the recognition rate was over 90%. Through the development of a prototype gesture-controlled panoramic map browser, we demonstrate that a vocabulary of predefined hand gestures can be used to interact successfully with applications running on an off-the-shelf personal computer equipped with a home video camera.  相似文献   

12.
We present an approach to attention in active computer vision. The notion of attention plays an important role in biological vision. In recent years, and especially with the emerging interest in active vision, computer vision researchers have been increasingly concerned with attentional mechanisms as well. The basic principles behind these efforts are greatly influenced by psychophysical research. That is the case also in the work presented here, which adapts to the model of Treisman (1985, Comput. Vision Graphics Image Process. Image Understanding31, 156–177), with an early parallel stage with preattentive cues followed by a later serial stage where the cues are integrated. The contributions in our approach are (i) the incorporation of depth information from stereopsis, (ii) the simple implementation of low level modules such as disparity and flow by local phase, and (iii) the cue integration along pursuit and saccade mode that allows us a proper target selection based on nearness and motion. We demonstrate the technique by experiments in which a moving observer selectively masks out different moving objects in real scenes.  相似文献   

13.
This paper presents a general information-theoretic approach for obtaining lower bounds on the number of examples required for Probably Approximately Correct (PAC) learning in the presence of noise. This approach deals directly with the fundamental information quantities, avoiding a Bayesian analysis. The technique is applied to several different models, illustrating its generality and power. The resulting bounds add logarithmic factors to (or improve the constants in) previously known lower bounds.  相似文献   

14.
This paper describes the theory and algorithms of distance transform for fuzzy subsets, called fuzzy distance transform (FDT). The notion of fuzzy distance is formulated by first defining the length of a path on a fuzzy subset and then finding the infimum of the lengths of all paths between two points. The length of a path π in a fuzzy subset of the n-dimensional continuous space n is defined as the integral of fuzzy membership values along π. Generally, there are infinitely many paths between any two points in a fuzzy subset and it is shown that the shortest one may not exist. The fuzzy distance between two points is defined as the infimum of the lengths of all paths between them. It is demonstrated that, unlike in hard convex sets, the shortest path (when it exists) between two points in a fuzzy convex subset is not necessarily a straight line segment. For any positive number θ≤1, the θ-support of a fuzzy subset is the set of all points in n with membership values greater than or equal to θ. It is shown that, for any fuzzy subset, for any nonzero θ≤1, fuzzy distance is a metric for the interior of its θ-support. It is also shown that, for any smooth fuzzy subset, fuzzy distance is a metric for the interior of its 0-support (referred to as support). FDT is defined as a process on a fuzzy subset that assigns to a point its fuzzy distance from the complement of the support. The theoretical framework of FDT in continuous space is extended to digital cubic spaces and it is shown that for any fuzzy digital object, fuzzy distance is a metric for the support of the object. A dynamic programming-based algorithm is presented for computing FDT of a fuzzy digital object. It is shown that the algorithm terminates in a finite number of steps and when it does so, it correctly computes FDT. Several potential applications of fuzzy distance transform in medical imaging are presented. Among these are the quantification of blood vessels and trabecular bone thickness in the regime of limited special resolution where these objects become fuzzy.  相似文献   

15.
This paper describes a technique to animate three-dimensional sampled volumes. The technique gives the animator the ability to treat volumes as if they were standard polygonal models and to use all of the standard animation/motion capture tools on volumetric data. A volumetric skeleton is computed from a volumetric model using a multi-resolution thinning procedure. The volumetric skeleton is centered in the object and accurately represents the shape of the object. The thinning process is reversible in that the volumetric model can be reconstructed from the volumetric skeleton. The volumetric skeleton is then connected and imported into a standard graphics animation package for animation. The animated skeleton is used for reconstruction, which essentially recreates a deformed volume around the deformed skeleton. Polygons are never computed and the entire process remains in the volumetric domain. This technique is demonstrated on one of the most complex 3D datasets, the Visible Male, resulting in actual “human animation”.  相似文献   

16.
While deterministic finite automata seem to be well understood, surprisingly many important problems concerning nondeterministic finite automata (nfa's) remain open. One such problem area is the study of different measures of nondeterminism in finite automata and the estimation of the sizes of minimal nondeterministic finite automata. In this paper the concept of communication complexity is applied in order to achieve progress in this problem area. The main results are as follows:1. Deterministic communication complexity provides lower bounds on the size of nfa's with bounded unambiguity. Applying this fact, the proofs of several results about nfa's with limited ambiguity can be simplified and presented in a uniform way.2. There is a family of languages KONk2 with an exponential size gap between nfa's with polynomial leaf number/ambiguity and nfa's with ambiguity k. This partially provides an answer to the open problem posed by B. Ravikumar and O. Ibarra (1989, SIAM J. Comput.18, 1263–1282) and H. Leung (1998, SIAM J. Comput.27, 1073–1082).  相似文献   

17.
Recently, the author introduced a nonprobabilistic mathematical model of discrete channels, the BEE channels, that involve the error-types substitution, insertion, and deletion. This paper defines an important class of BEE channels, the SID channels, which include channels that permit a bounded number of scattered errors and, possibly at the same time, a bounded burst of errors in any segment of predefined length of a message. A formal syntax is defined for generating channel expressions, and appropriate semantics is provided for interpreting a given channel expression as a communication channel (SID channel) that permits combinations of substitutions, insertions, and deletions of symbols. Our framework permits one to generalize notions such as error correction and unique decodability, and express statements of the form “The code K can correct all errors of type ξ” and “it is decidable whether the code K is uniquely decodable for the channel described by ξ”, where ξ is any SID channel expression.  相似文献   

18.
Face Detection: A Survey   总被引:5,自引:0,他引:5  
In this paper we present a comprehensive and critical survey of face detection algorithms. Face detection is a necessary first-step in face recognition systems, with the purpose of localizing and extracting the face region from the background. It also has several applications in areas such as content-based image retrieval, video coding, video conferencing, crowd surveillance, and intelligent human–computer interfaces. However, it was not until recently that the face detection problem received considerable attention among researchers. The human face is a dynamic object and has a high degree of variability in its apperance, which makes face detection a difficult problem in computer vision. A wide variety of techniques have been proposed, ranging from simple edge-based algorithms to composite high-level approaches utilizing advanced pattern recognition methods. The algorithms presented in this paper are classified as either feature-based or image-based and are discussed in terms of their technical approach and performance. Due to the lack of standardized tests, we do not provide a comprehensive comparative evaluation, but in cases where results are reported on common datasets, comparisons are presented. We also give a presentation of some proposed applications and possible application areas.  相似文献   

19.
The first general decomposition theorem for the k-server problem is presented. Whereas all previous theorems are for the case of a finite metric with k+1 points, the theorem given here allows an arbitrary number of points in the underlying metric space. This theorem implies O(polylog(k))-competitive randomized algorithms for certain metric spaces consisting of a polylogarithmic number of widely separated subspaces and takes a first step toward a general O(polylog(k))-competitive algorithm. The only other cases for which polylogarithmic competitive randomized algorithms are known are the uniform metric space and the weighted cache metric space with two weights.  相似文献   

20.
A contribution to the automatic 3-D reconstruction of complex urban scenes from aerial stereo pairs is proposed. It consists of segmenting the scene into two different kinds of components: the ground and the above-ground objects. The above-ground objects are classified either as buildings or as vegetation. The idea is to define appropriate regions of interest in order to achieve a relevant 3-D reconstruction. For that purpose, a digital elevation model of the scene is first computed and segmented into above-ground regions using a Markov random field model. Then a radiometric analysis is used to classify above-ground regions as building or vegetation, leading to the determination of the final above-ground objects. The originality of the method is its ability to cope with extended above-ground areas, even in case of a sloping ground surface. This characteristic is necessary in a urban environment. Results are very robust to image and scene variability, and they enable the utilization of appropriate local 3-D reconstruction algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号