首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Dynamic human shape in video contains rich perceptual information, such as the body posture, identity, and even the emotional state of a person. Human locomotion activities, such as walking and running, have familiar spatiotemporal patterns that can easily be detected in arbitrary views. We present a framework for detecting shape outliers for human locomotion using a dynamic shape model that factorizes the body posture, the viewpoint, and the individual’s shape style. The model uses a common embedding of the kinematic manifold of the motion and factorizes the shape variability with respect to different viewpoints and shape styles in the space of the coefficients of the nonlinear mapping functions that are used to generate the shapes from the kinematic manifold representation. Given a corrupted input silhouette, an iterative procedure is used to recover the body posture, viewpoint, and shape style. We use the proposed outlier detection approach to fill in the holes in the input silhouettes, and detect carried objects, shadows, and abnormal motions.  相似文献   

2.
Data points with small variations between them are assumed to lie close to each other on a smooth varying manifold in the feature space. Such data are hard to classify into separate classes . A sequence of face pose images with closely varying pose angles can be considered as such data. The pose angles when large enough create images that are largely differing from each other, and thus, the sequence of face images can be assumed to be on or near a nonlinear manifold. In this paper, we propose an unsupervised pose estimation method for face images based on clustered locally linear manifolds using discriminant analysis. We divide the data into multiple disjointed, locally linear and separable clusters. The problem of identifying which cluster to use is solved by dividing the entire process into two steps. The first step or projection using the entire smooth manifold identifies a rough region of interest. We use clustering techniques on entire data to form the pose-dependent classes which are then used to find the first set of discriminant functions. The second step or second projection uses trained cluster(s) from this neighbourhood to obtain a second set of discriminant functions. The idea behind such an approach is that the local neighbourhood would be linear and provide better between-class separation, and hence, the classification problem would now be simpler.  相似文献   

3.
In this paper, we present the concept of a “shape manifold” designed for reduced order representation of complex “shapes” encountered in mechanical problems, such as design optimization, springback or image correlation. The overall idea is to define the shape space within which evolves the boundary of the structure. The reduced representation is obtained by means of determining the intrinsic dimensionality of the problem, independently of the original design parameters, and by approximating a hyper surface, i.e. a shape manifold, connecting all admissible shapes represented using level set functions. Also, an optimal parameterization may be obtained for arbitrary shapes, where the parameters have to be defined a posteriori. We also developed the predictor-corrector optimization manifold walking algorithms in a reduced shape space that guarantee the admissibility of the solution with no additional constraints. We illustrate the approach on three diverse examples drawn from the field of computational and applied mechanics.  相似文献   

4.
We are given a set of points in a space of high dimension. For instance, this set may represent many visual appearances of an object, a face, or a hand. We address the problem of approximating this set by a manifold in order to have a compact representation of the object appearance. When the scattering of this set is approximately an ellipsoid, then the problem has a well-known solution given by principal components analysis (PCA). However, in some situations like object displacement learning or face learning, this linear technique may be ill-adapted and nonlinear approximation has to be introduced. The method we propose can be seen as a nonlinear PCA (NLPCA), the main difficulty being that the data are not ordered. We propose an index which favors the choice of axes preserving the closest point neighborhoods. These axes determine an order for visiting all the points when smoothing. Finally, a new criterion, called “generalization error”, is introduced to determine the smoothing rate, that is, the knot number for the spline fitting. Experimental results conclude this paper: The method is tested on artificial data and on two databases used in visual learning  相似文献   

5.
In this paper, we consider the problem of optimization of a cost function on a Grassmann manifold. This problem appears in system identification in the behavioral setting, which is a structured low-rank approximation problem. We develop an optimization approach based on switching coordinate charts. This method reduces the optimization problem on the manifold to an optimization problem in a bounded domain of a Euclidean space. We compare the proposed approach with state-of-the-art methods based on data-driven local coordinates and Riemannian geometry, and show the connections between the methods. Compared to the methods based on the local coordinates, the proposed approach allows to use arbitrary optimization methods for solving the corresponding subproblems in the Euclidean space.  相似文献   

6.
This paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of dissimilarity space that is adapted to the asymmetric classification problem, and in turn to the query-by-example and relevance feedback paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (ie TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a real-time framework.  相似文献   

7.
Transformation invariance is an important property in pattern recognition, where different observations of the same object typically receive the same label. This paper focuses on a transformation-invariant distance measure that represents the minimum distance between the transformation manifolds spanned by patterns of interest. Since these manifolds are typically nonlinear, the computation of the manifold distance (MD) becomes a nonconvex optimization problem. We propose representing a pattern of interest as a linear combination of a few geometric functions extracted from a structured and redundant basis. Transforming the pattern results in the transformation of its constituent parts. We show that, when the transformation is restricted to a synthesis of translations, rotations, and isotropic scalings, such a pattern representation results in a closed-form expression of the manifold equation with respect to the transformation parameters. The MD computation can then be formulated as a minimization problem whose objective function is expressed as the difference of convex functions (DC). This interesting property permits optimally solving the optimization problem with DC programming solvers that are globally convergent. We present experimental evidence which shows that our method is able to find the globally optimal solution, outperforming existing methods that yield suboptimal solutions.  相似文献   

8.
The problem of representing the visual signal in the harmonic space guaranteeing a complete characterization of its 2D local structure is investigated. Specifically, the efficacy of anisotropic versus isotropic filtering is analyzed with respect to general phase-based metrics for early vision attributes. We verified that the spectral information content gathered through channeled oriented frequency bands is characterized by high compactness and flexibility, since a wide range of visual attributes emerge from different hierarchical combinations of the same channels. We observed that constructing a multichannel, multiorientation representation is preferable than using a more compact one based on an isotropic generalization of the analytic signal. Maintaining a channeled (i.e., distributed) representation of the harmonic content results in a more complete structural analysis of the visual signal, and allows us to enable a set of “constraints” that are often essential to disambiguate the perception of the different features. The complete harmonic content is then combined in the phase-orientation space at the final stage, only, to come up with the ultimate perceptual decisions, thus avoiding an “early condensation” of basic features. The resulting algorithmic solutions reach high performance in real-world situations at an affordable computational cost.  相似文献   

9.
In this paper, the problem of robust relative positioning between a 6-DOF robot camera and an object of interest is considered. Assuming weak perspective camera model and local linear approximation of visible object's surface, an image-based state space representation of robot camera–object interaction model is derived, based on the matrix of 2-D affine transformations. Dynamic extension of the visual model permits to estimate 3-D parameters directly as functions of state variables. The proposed nonlinear robust control law ensures asymptotic stability but at image singularities, assuming exact model and state measurements. In the presence of bounded uncertainties, under appropriate choice of control gains, ultimate boundedness of the state error is also formally proved. Simulation results validate the theoretical framework both in terms of system convergence and control robustness.  相似文献   

10.
非线性控制系统与状态空间的几何结构   总被引:10,自引:0,他引:10  
首行从整体化的观点定义了一种建立在黎曼流形上的非线性控制系统,给出了系统的状态方程在黎曼流形的局部坐标系下的表达式,讨论了黎曼流形的几何结构对非线性系统的影响,研究了非线性系统的能控性和能观测性,其次,利用对合分布与全测地子流形的性质,给出了建立在黎曼流形上的非线性系统的局部能控结构分解,局部能观结构分解和Kalman分解,第三,分别利用彼此正交的对合分布族和递增对合分布族与全测地子流形族的性质。研究了建立在黎曼流形上的非线性控制系统平等解耦问题和级联解耦问题,以及仿射非线性控制系统的局部干扰解耦问题。  相似文献   

11.
We consider the problem of converting boundary representations of isothetic polyhedra into constructive solid geometry (CSG) representations. The CSG representation is a boolean formula based on the half-spaces supporting the faces of the polyhedron. This boolean formula exhibits two important features: no term is complemented (it is monotone) and each supporting half-space appears in the formula once and only once. It is known that such formulas do not always exist for general polyhedra in the three-dimensional space. In this work first we give a procedure that extends the domain of polyhedra for which such a nice representation can be computed. Then we prove that not all cyclic isothetic polyhedra have a CSG representation of the style given above.  相似文献   

12.
Tracking People on a Torus   总被引:1,自引:0,他引:1  
We present a framework for monocular 3D kinematic pose tracking and viewpoint estimation of periodic and quasi-periodic human motions from an uncalibrated camera. The approach we introduce here is based on learning both the visual observation manifold and the kinematic manifold of the motion using a joint representation. We show that the visual manifold of the observed shape of a human performing a periodic motion, observed from different viewpoints, is topologically equivalent to a torus manifold. The approach we introduce here is based on the supervised learning of both the visual and kinematic manifolds. Instead of learning an embedding of the manifold, we learn the geometric deformation between an ideal manifold (conceptual equivalent topological structure) and a twisted version of the manifold (the data). Experimental results show accurate estimation of the 3D body posture and the viewpoint from a single uncalibrated camera.  相似文献   

13.
ABSTRACT

Motor-skill learning for complex robotic tasks is a challenging problem due to the high task variability. Robotic clothing assistance is one such challenging problem that can greatly improve the quality-of-life for the elderly and disabled. In this study, we propose a data-efficient representation to encode task-specific motor-skills of the robot using Bayesian nonparametric latent variable models. The effectivity of the proposed motor-skill representation is demonstrated in two ways: (1) through a real-time controller that can be used as a tool for learning from demonstration to impart novel skills to the robot and (2) by demonstrating that policy search reinforcement learning in such a task-specific latent space outperforms learning in the high-dimensional joint configuration space of the robot. We implement our proposed framework in a practical setting with a dual-arm robot performing clothing assistance tasks.  相似文献   

14.
Motion estimation via dynamic vision   总被引:2,自引:0,他引:2  
Estimating the three-dimensional motion of an object from a sequence of projections is of paramount importance in a variety of applications in control and robotics, such as autonomous navigation, manipulation, servo, tracking, docking, planning, and surveillance. Although “visual motion estimation” is an old problem (the first formulations date back to the beginning of the century), only recently have tools from nonlinear systems estimation theory hinted at acceptable solutions. In this paper the authors formulate the visual motion estimation problem in terms of identification of nonlinear implicit systems with parameters on a topological manifold and propose a dynamic solution either in the local coordinates or in the embedding space of the parameter manifold. Such a formulation has structural advantages over previous recursive schemes, since the estimation of motion is decoupled from the estimation of the structure of the object being viewed, and therefore it is possible to handle occlusions in a principled way  相似文献   

15.
In blind source separation, there are M sources that produce sounds independently and continuously over time. These sounds are then recorded by m receivers. The sound recorded by each receiver at each time point is a linear superposition of the sounds produced by the M sources at the same time point. The problem of blind source separation is to recover the sounds of the sources from the sounds recorded by the receivers, without knowledge of the m×M mixing matrix that transforms the sounds of the sources to the sounds of the receivers at each time point. Over-complete separation refers to the situation where the number of sources M is greater than the number of receivers m, so that the source sounds cannot be uniquely solved from the receiver sounds even if the mixing matrix is known. In this paper, we propose a null space representation for the over-complete blind source separation problem. This representation explicitly identifies the solution space of the source sounds in terms of the null space of the mixing matrix using singular value decomposition. Under this representation, the problem can be posed in the framework of Bayesian latent variable model, where the mixing matrix and the source sounds can be inferred based on their posterior distributions. We then propose a null space algorithm for Markov chain Monte Carlo posterior sampling. We illustrate the algorithm using several examples under two different statistical assumptions about the independent source sounds. The blind source separation problem is mathematically equivalent to the independent component analysis problem. So our method can be equally applied to over-complete independent component analysis for unsupervised learning of high-dimensional data.  相似文献   

16.
In this paper, the visual servoing problem is addressed by coupling nonlinear control theory with a convenient representation of the visual information used by the robot. The visual representation, which is based on a linear camera model, is extremely compact to comply with active vision requirements. The devised control law is proven to ensure global asymptotic stability in the Lyapunov sense, assuming exact model and state measurements. It is also shown that, in the presence of bounded uncertainties, the closed-loop behavior is characterized by a global attractor. The well known pose ambiguity arising from the use of linear camera models is solved at the control level by choosing a hybrid visual state vector including both image space (2D) information and 3D object parameters. A method is expounded for on-line visual state estimation that avoids camera calibration. Simulation and real-time experiments validate the theoretical framework in terms of both system convergence and control robustness.  相似文献   

17.
基于黎曼流形稀疏编码的图像检索算法   总被引:1,自引:0,他引:1  
针对视觉词袋(Bag-of-visual-words,BOVW)模型直方图量化误差大的缺点,提出基于稀疏编码的图像检索算法.由于大多数图像特征属于非线性流形结构,传统稀疏编码使用向量空间对其度量必然导致不准确的稀疏表示.考虑到图像特征空间的流形结构,选择对称正定矩阵作为特征描述子,构建黎曼流形空间.利用核技术将黎曼流形结构映射到再生核希尔伯特空间,非线性流形转换为线性稀疏编码,获得图像更准确的稀疏表示.实验在Corel1000和Caltech101两个数据集上进行,与已有的图像检索算法对比,提出的图像检索算法不仅提高了检索准确率,而且获得了更好的检索性能.  相似文献   

18.
19.
Interaction and integration of multimodality media types such as visual, audio, and textual data in video are the essence of video semantic analysis. Contextual information propagation is useful for both intra- and inter-shot correlations. However, the traditional concatenated vector representation of videos weakens the power of the propagation and compensation among the multiple modalities. In this paper, we introduce a higher-order tensor framework for video analysis. We represent image frame, audio, and text in video shots as data points by the 3rd-order tensor. Then we propose a novel dimension reduction algorithm which explicitly considers the manifold structure of the tensor space from contextual temporal associated cooccurring multimodal media data. Our algorithm inherently preserves the intrinsic structure of the sub- manifold where tensorshots are sampled and is also able to map out-of-sample data points directly. We propose a new transductive support tensor machines algorithm to train effective classifier using large amount of unlabeled data together with the labeled data. Experiment results on TREVID 2005 data set show that our method improves the performance of video semantic concept detection.  相似文献   

20.
We study spatial learning and navigation for autonomous agents. A state space representation is constructed by unsupervised Hebbian learning during exploration. As a result of learning, a representation of the continuous two-dimensional (2-D) manifold in the high-dimensional input space is found. The representation consists of a population of localized overlapping place fields covering the 2-D space densely and uniformly. This space coding is comparable to the representation provided by hippocampal place cells in rats. Place fields are learned by extracting spatio-temporal properties of the environment from sensory inputs. The visual scene is modeled using the responses of modified Gabor filters placed at the nodes of a sparse Log-polar graph. Visual sensory aliasing is eliminated by taking into account self-motion signals via path integration. This solves the hidden state problem and provides a suitable representation for applying reinforcement learning in continuous space for action selection. A temporal-difference prediction scheme is used to learn sensorimotor mappings to perform goal-oriented navigation. Population vector coding is employed to interpret ensemble neural activity. The model is validated on a mobile Khepera miniature robot.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号