首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The present study employs deep learning methods to recognize repetitive assembly actions and estimate their operating times. It is intended to monitor the assembly process of workers and prevent assembly quality problems caused by the lack of key operational steps and the irregular operation of workers. Based on the characteristics of the repeatability and tool dependence of the assembly action, the recognition of the assembly action is considered as the tool object detection in the present study. Moreover, the YOLOv3 algorithm is initially applied to locate and judge the assembly tools and recognize the worker's assembly action. The present study shows that the accuracy of the action recognition is 92.8 %. Then, the pose estimation algorithm CPM based on deep learning is used to realize the recognition of human joint. Finally, the joint coordinates are extracted to judge the operating times of repetitive assembly actions. The accuracy rate of judging the operating times for repetitive assembly actions is 82.1 %.  相似文献   

2.
Reliable manipulation of everyday household objects is essential to the success of service robots. In order to accurately manipulate these objects, robots need to know objects’ full 6-DOF pose, which is challenging due to sensor noise, clutters, and occlusions. In this paper, we present a new approach for effectively guessing the object pose given an observation of just a small patch of the object, by leveraging the fact that many household objects can only keep stable on a planar surface under a small set of poses. In particular, for each stable pose of an object, we slice the object with horizontal planes and extract multiple cross-section 2D contours. The pose estimation is then reduced to find a stable pose whose contour matches best with that of the sensor data, and this can be solved efficiently by cross-correlation. Experiments on the manipulation tasks in the DARPA Robotics Challenge validate our approach. In addition, we also investigate our method’s performance on object recognition tasks raising in the challenge.  相似文献   

3.
从图像中获取目标物体的6D位姿信息在机器人操作和虚拟现实等领域有着广泛的应用,然而,基于深度学习的位姿估计方法在训练模型时通常需要大量的训练数据集来提高模型的泛化能力,一般的数据采集方法存在收集成本高同时缺乏3D空间位置信息等问题.鉴于此,提出一种低质量渲染图像的目标物体6D姿态估计网络框架.该网络中,特征提取部分以单张RGB图像作为输入,用残差网络提取输入图像特征;位姿估计部分的目标物体分类流用于预测目标物体的类别,姿态回归流在3D空间中回归目标物体的旋转角度和平移矢量.另外,采用域随机化方法以低收集成本方式构建大规模低质量渲染、带有物体3D空间位置信息的图像数据集Pose6DDR.在所建立的Pose6DDR数据集和LineMod公共数据集上的测试结果表明了所提出位姿估计方法的优越性以及大规模数据集域随机化生成数据方法的有效性.  相似文献   

4.
In this paper, we present a method called MODEEP (Motion-based Object DEtection and Estimation of Pose) to detect independently moving objects (IMOs) in forward-looking infrared (FLIR) image sequences taken from an airborne, moving platform. Ego-motion effects are removed through a robust multi-scale affine image registration process. Thereafter, areas with residual motion indicate potential object activity. These areas are detected, refined and selected using a Bayesian classifier. The resulting regions are clustered into pairs such that each pair represents one object's front and rear end. Using motion and scene knowledge, we estimate object pose and establish a region of interest (ROI) for each pair. Edge elements within each ROI are used to segment the convex cover containing the IMO. We show detailed results on real, complex, cluttered and noisy sequences. Moreover, we outline the integration of our fast and robust system into a comprehensive automatic target recognition (ATR) and action classification system.  相似文献   

5.
Geometric hashing (GH) and partial pose clustering are well-known algorithms for pattern recognition. However, the performance of both these algorithms degrades rapidly with an increase in scene clutter and the measurement uncertainty in the detected features. The primary contribution of this paper is the formulation of a framework that unifies the GH and the partial pose clustering paradigms for pattern recognition in cluttered scenes. The proposed scheme has a better discrimination capability as compared to the GA algorithm, thus improving recognition accuracy. The scheme is incorporated in a Bayesian MLE framework to make it robust to the presence of sensor noise. It is able to handle partial occlusions, is robust to measurement uncertainty in the data features and to the presence of spurious scene features (scene clutter). An efficient hash table representation of 3D features extracted from range images is also proposed. Simulations with real and synthetic 2D/3D objects show that the scheme performs better than the GH algorithm in scenes with a large amount of clutter.  相似文献   

6.
Recently, we developed a technique that allows semi-automatic estimation of anthropometry and pose from a single image. However, estimation was limited to a class of images for which an adequate number of human body segments were almost parallel to the image plane. In this paper, we present a generalization of that estimation algorithm that exploits pairwise geometric relationships of body segments to allow estimation from a broader class of images. In addition, we refine our search space by constructing a fully populated discrete hyper-ellipsoid of stick human body models in order to capture the variance of the statistical anthropometric information. As a result, a better initial estimate can be computed by our algorithm and thus the number of iterations needed during minimization are reduced tenfold. We present our results over a variety of images to demonstrate the broad coverage of our algorithm.Published online: 1 September 2003  相似文献   

7.
The 3D Morphable Model (3DMM) and the Structure from Motion (SfM) methods are widely used for 3D facial reconstruction from 2D single-view or multiple-view images. However, model-based methods suffer from disadvantages such as high computational costs and vulnerability to local minima and head pose variations. The SfM-based methods require multiple facial images in various poses. To overcome these disadvantages, we propose a single-view-based 3D facial reconstruction method that is person-specific and robust to pose variations. Our proposed method combines the simplified 3DMM and the SfM methods. First, 2D initial frontal Facial Feature Points (FFPs) are estimated from a preliminary 3D facial image that is reconstructed by the simplified 3DMM. Second, a bilateral symmetric facial image and its corresponding FFPs are obtained from the original side-view image and corresponding FFPs by using the mirroring technique. Finally, a more accurate the 3D facial shape is reconstructed by the SfM using the frontal, original, and bilateral symmetric FFPs. We evaluated the proposed method using facial images in 35 different poses. The reconstructed facial images and the ground-truth 3D facial shapes obtained from the scanner were compared. The proposed method proved more robust to pose variations than 3DMM. The average 3D Root Mean Square Error (RMSE) between the reconstructed and ground-truth 3D faces was less than 2.6 mm when 2D FFPs were manually annotated, and less than 3.5 mm when automatically annotated.  相似文献   

8.
Problem of relative pose estimation between a camera and rigid object, given an object model with feature points and image(s) with respective image points (hence known correspondence) has been extensively studied in the literature. We propose a “correspondenceless” method called gravitational pose estimation (GPE), which is inspired by classical mechanics. GPE can handle occlusion and uses only one image (i.e., perspective projection of the object). GPE creates a simulated gravitational field from the image and lets the object model move and rotate in that force field, starting from an initial pose. Experiments were carried out with both real and synthetic images. Results show that GPE is robust, consistent, and fast (runs in less than a minute). On the average (including up to 30% occlusion cases) it finds the orientation within 6° and the position within 17% of the object’s diameter. SoftPOSIT was so far the best correspondenceless method in the literature that works with a single image and point-based object model like GPE. However, SoftPOSIT’s convergence to a result is sensitive to the choice of initial pose. Even “random start SoftPOSIT,” which performs multiple runs of SoftPOSIT with different initial poses, can often fail. However, SoftPOSIT finds the pose with great precision when it is able to converge. We have also integrated GPE and SoftPOSIT into a single method called GPEsoftPOSIT, which finds the orientation within 3° and the position within 10% of the object’s diameter even under occlusion. In GPEsoftPOSIT, GPE finds a pose that is very close to the true pose, and then SoftPOSIT is used to enhance accuracy of the result. Unlike SoftPOSIT, GPE also has the ability to work with three points as well as planar object models.  相似文献   

9.
We introduce a generic structure-from-motion approach based on a previously introduced, highly general imaging model, where cameras are modeled as possibly unconstrained sets of projection rays. This allows to describe most existing camera types including pinhole cameras, sensors with radial or more general distortions, catadioptric cameras (central or non-central), etc. We introduce a structure-from-motion approach for this general imaging model, that allows to reconstruct scenes from calibrated images, possibly taken by cameras of different types (cross-camera scenarios). Structure-from-motion is naturally handled via camera independent ray intersection problems, solved via linear or simple polynomial equations. We also propose two approaches for obtaining optimal solutions using bundle adjustment, where camera motion, calibration and 3D point coordinates are refined simultaneously. The proposed methods are evaluated via experiments on two cross-camera scenarios—a pinhole used together with an omni-directional camera and a stereo system used with an omni-directional camera.  相似文献   

10.
A fast registration making use of implicit polynomial (IP) models is helpful for the real-time pose estimation from single clinical free-hand Ultrasound (US) image, because it is superior in the areas such as robustness against image noise, fast registration without enquiring correspondences, and fast IP coefficient transformation. However it might lead to the lack of accuracy or failure registration.In this paper, we present a novel registration method based on a coarse-to-fine IP representation. The approach starts from a high-speed and reliable registration with a coarse (of low degree) IP model and stops when the desired accuracy is achieved by a fine (of high degree) IP model. Over the previous IP-to-point based methods our contributions are: (i) keeping the efficiency without requiring pair-wised correspondences, (ii) enhancing the robustness, and (iii) improving the accuracy. The experimental result demonstrates the good performance of our registration method and its capabilities of overcoming the limitations of unconstrained freehand ultrasound data, resulting in fast, robust and accurate registration.  相似文献   

11.
Detecting objects, estimating their pose, and recovering their 3D shape are critical problems in many vision and robotics applications. This paper addresses the above needs using a two stages approach. In the first stage, we propose a new method called DEHV – Depth-Encoded Hough Voting. DEHV jointly detects objects, infers their categories, estimates their pose, and infers/decodes objects depth maps from either a single image (when no depth maps are available in testing) or a single image augmented with depth map (when this is available in testing). Inspired by the Hough voting scheme introduced in [1], DEHV incorporates depth information into the process of learning distributions of image features (patches) representing an object category. DEHV takes advantage of the interplay between the scale of each object patch in the image and its distance (depth) from the corresponding physical patch attached to the 3D object. Once the depth map is given, a full reconstruction is achieved in a second (3D modelling) stage, where modified or state-of-the-art 3D shape and texture completion techniques are used to recover the complete 3D model. Extensive quantitative and qualitative experimental analysis on existing datasets [2], [3], [4] and a newly proposed 3D table-top object category dataset shows that our DEHV scheme obtains competitive detection and pose estimation results. Finally, the quality of 3D modelling in terms of both shape completion and texture completion is evaluated on a 3D modelling dataset containing both in-door and out-door object categories. We demonstrate that our overall algorithm can obtain convincing 3D shape reconstruction from just one single uncalibrated image.  相似文献   

12.
Eigendecomposition-based techniques are popular for a number of computer vision problems, e.g., object and pose estimation, because they are purely appearance based and they require few on-line computations. Unfortunately, they also typically require an unobstructed view of the object whose pose is being detected. The presence of occlusion and background clutter precludes the use of the normalizations that are typically applied and significantly alters the appearance of the object under detection. This work presents an algorithm that is based on applying eigendecomposition to a quadtree representation of the image dataset used to describe the appearance of an object. This allows decisions concerning the pose of an object to be based on only those portions of the image in which the algorithm has determined that the object is not occluded. The accuracy and computational efficiency of the proposed approach is evaluated on 16 different objects with up to 50% of the object being occluded and on images of ships in a dockyard.
Anthony A. MaciejewskiEmail:

Chu-Yin Chang   received the B.S. degree in mechanical engineering from National Central University, Chung-Li, Taiwan, ROC, in 1988, the M.S. degree in electrical engineering from the University of California, Davis, in 1993, and the Ph.D. degree in electrical and computer engineering from Purdue University, West Lafayette, in 1999. From 1999--2002, he was a Machine Vision Systems Engineer with Semiconductor Technologies and Instruments, Inc., Plano, TX. He is currently the Vice President of Energid Technologies, Cambridge, MA, USA. His research interests include computer vision, computer graphics, and robotics. Anthony A. Maciejewski   received the BSEE, M.S., and Ph.D. degrees from Ohio State University in 1982, 1984, and 1987. From 1988 to 2001, he was a professor of Electrical and Computer Engineering at Purdue University, West Lafayette. He is currently the Department Head of Electrical and Computer Engineering at Colorado State University. He is a Fellow of the IEEE. A complete vita is available at: Venkataramanan Balakrishnan   is Professor and Associate Head of Electrical and Computer Engineering at Purdue University, West Lafayette, Indiana. He received the B.Tech degree in electronics and communication and the President of India Gold Medal from the Indian Institute of Technology, Madras, in 1985. He then attended Stanford University, where he received the M.S. degree in statistics and the Ph.D. degree in electrical engineering in 1992. He joined Purdue University in 1994 after post-doctoral research at Stanford, CalTech and the University of Maryland. His primary research interests are in convex optimization and large-scale numerical algebra, applied to engineering problems. Rodney G. Roberts   received B.S. degrees in Electrical Engineering and Mathematics from Rose-Hulman Institute of Technology in 1987 and an MSEE and Ph.D. in Electrical Engineering from Purdue University in 1988 and 1992, respectively. From 1992 until 1994, he was a National Research Council Fellow at Wright Patterson Air Force Base in Dayton, Ohio. Since 1994 he has been at the Florida A&M University---Florida State University College of Engineering where he is currently a Professor of Electrical and Computer Engineering. His research interests are in the areas of robotics and image processing. Kishor Saitwal   received the Bachelor of Engineering (B.E.) degree in Instrumentation and Controls from Vishwakarma Institute of Technology, Pune, India, in 1998. He was ranked Third in the Pune University and was recipient of National Talent Search scholarship. He received the M.S. and Ph.D. degrees from the Electrical and Computer Engineering department, Colorado State University, Fort Collins, in 2001 and 2006, respectively. He is currently with Behavioral Recognition Systems, Inc. performing research in computer aided video surveillance systems. His research interests include image/video processing, computer vision, and robotics.   相似文献   

13.
Supplying realistically textured 3D city models at ground level promises to be useful for pre-visualizing upcoming traffic situations in car navigation systems. Because this pre-visualization can be rendered from the expected future viewpoints of the driver, the required maneuver will be more easily understandable. 3D city models can be reconstructed from the imagery recorded by surveying vehicles. The vastness of image material gathered by these vehicles, however, puts extreme demands on vision algorithms to ensure their practical usability. Algorithms need to be as fast as possible and should result in compact, memory efficient 3D city models for future ease of distribution and visualization. For the considered application, these are not contradictory demands. Simplified geometry assumptions can speed up vision algorithms while automatically guaranteeing compact geometry models. In this paper, we present a novel city modeling framework which builds upon this philosophy to create 3D content at high speed. Objects in the environment, such as cars and pedestrians, may however disturb the reconstruction, as they violate the simplified geometry assumptions, leading to visually unpleasant artifacts and degrading the visual realism of the resulting 3D city model. Unfortunately, such objects are prevalent in urban scenes. We therefore extend the reconstruction framework by integrating it with an object recognition module that automatically detects cars in the input video streams and localizes them in 3D. The two components of our system are tightly integrated and benefit from each other’s continuous input. 3D reconstruction delivers geometric scene context, which greatly helps improve detection precision. The detected car locations, on the other hand, are used to instantiate virtual placeholder models which augment the visual realism of the reconstructed city model.  相似文献   

14.
A system capable of performing robust live ego-motion estimation for perspective cameras is presented. The system is powered by random sample consensus with preemptive scoring of the motion hypotheses. A general statement of the problem of efficient preemptive scoring is given. Then a theoretical investigation of preemptive scoring under a simple inlier–outlier model is performed. A practical preemption scheme is proposed and it is shown that the preemption is powerful enough to enable robust live structure and motion estimation.Prepared through collaborative participation in the Robotics Consortium sponsored by the U.S. Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement DAAD19-01-2-0012. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. David Nistér received PhD degree in computer vision, numerical analysis and computing science from the Royal Institute of Technology (KTH), Stockholm, Sweden, with the thesis ‘Automatic Dense Reconstruction from Uncalibrated Video Sequences’. He is currently an assistant professor at the Computer Science Department and the Center for Visualization and Virtual Environments, University of Kentucky, Lexington. Before joining UK, he was a researcher in the Vision Technologies Laboratory, Sarnoff Corporation, Princeton, and Visual Technology, Ericsson Research, Stockholm, Sweden. His research interests include computer vision, computer graphics, structure from motion, multiple view geometry, Bayesian formulations, tracking, recognition, image and video compression. He is a member of the IEEE and American Mensa.  相似文献   

15.
Detecting objects in complex scenes while recovering the scene layout is a critical functionality in many vision-based applications. In this work, we advocate the importance of geometric contextual reasoning for object recognition. We start from the intuition that objects' location and pose in the 3D space are not arbitrarily distributed but rather constrained by the fact that objects must lie on one or multiple supporting surfaces. We model such supporting surfaces by means of hidden parameters (i.e. not explicitly observed) and formulate the problem of joint scene reconstruction and object recognition as the one of finding the set of parameters that maximizes the joint probability of having a number of detected objects on K supporting planes given the observations. As a key ingredient for solving this optimization problem, we have demonstrated a novel relationship between object location and pose in the image, and the scene layout parameters (i.e. normal of one or more supporting planes in 3D and camera pose, location and focal length). Using a novel probabilistic formulation and the above relationship our method has the unique ability to jointly: i) reduce false alarm and false negative object detection rate; ii) recover object location and supporting planes within the 3D camera reference system; iii) infer camera parameters (view point and the focal length) from just one single uncalibrated image. Quantitative and qualitative experimental evaluation on two datasets (desk-top dataset [1] and LabelMe [2]) demonstrates our theoretical claims.  相似文献   

16.
Omnidirectional video enables direct surround immersive viewing of a scene by warping the original image into the correct perspective given a viewing direction. However, novel views from viewpoints off the camera path can only be obtained if we solve the three-dimensional motion and calibration problem. In this paper we address the case of a parabolic catadioptric camera – a paraboloidal mirror in front of an orthographic lens – and we introduce a new representation, called the circle space, for points and lines in such images. In this circle space, we formulate an epipolar constraint involving a 4×4 fundamental matrix. We prove that the intrinsic parameters can be inferred in closed form from the two-dimensional subspace of the new fundamental matrix from two views if they are constant or from three views if they vary. Three-dimensional motion and structure can then be estimated from the decomposition of the fundamental matrix.  相似文献   

17.
With the advancement of MEMS technologies, sensor networks have opened up broad application prospects. An important issue in wireless sensor networks is object detection and tracking, which typically involves two basic components, collaborative data processing and object location reporting. The former aims to have sensors collaborating in determining a concise digest of object location information, while the latter aims to transport a concise digest to sink in a timely manner. This issue has been intensively studied in individual objects, such as intruders. However, the characteristic of continuous objects has posed new challenges to this issue. Continuous objects can diffuse, increase in size, or split into multiple continuous objects, such as a noxious gas. In this paper, a scalable, topology-control-based approach for continuous object detection and tracking is proposed. Extensive simulations are conducted, which show a significant improvement over existing solutions.  相似文献   

18.
Eduardo  Refugio 《Pattern recognition》2003,36(12):2909-2926
This paper shows the analysis and design of feed-forward neural networks using the coordinate-free system of Clifford or geometric algebra. It is shown that real-, complex- and quaternion-valued neural networks are simply particular cases of the geometric algebra multidimensional neural networks and that they can be generated using Support Multi-Vector Machines. Particularly, the generation of RBF for neurocomputing in geometric algebra is easier using the SMVM that allows to find the optimal parameters automatically. The use of SVM in the geometric algebra framework expands its sphere of applicability for multidimensional learning.

We introduce a novel method of geometric preprocessing utilizing hypercomplex or Clifford moments. This method is applied together with geometric MLPs for tasks of 2D pattern recognition. Interesting examples of non-linear problems like the grasping of an object along a non-linear curve and the 3D pose recognition show the effect of the use of adequate Clifford or geometric algebras that alleviate the training of neural networks and that of Support Multi-Vector Machines.  相似文献   


19.
可视媒体中的基于学习的三维人体运动分析是计算机视觉领域中非常有挑战性的课题.本文在Gauss动态隐变量模型与共享隐结构的基础上,给出一种新的共享动态隐变量模型用于三维人体运动跟踪.该模型针对高维非线性动态系统,可以计算出高维状态向量和高维观测向量的共享动态低维隐变量,同时也能计算出隐变量对高维状态向量、高维观测向量的双向映射、以及隐变量自身的动态关系.使用该模型可以将传统的高维人体运动估计分层为先估计低维隐变量状态,再重建高维人体运动.在实验结果中,用仿真图像序列与真实图像序列证明了方法的有效性.  相似文献   

20.
Putting Objects in Perspective   总被引:2,自引:0,他引:2  
Image understanding requires not only individually estimating elements of the visual world but also capturing the interplay among them. In this paper, we provide a framework for placing local object detection in the context of the overall 3D scene by modeling the interdependence of objects, surface orientations, and camera viewpoint. Most object detection methods consider all scales and locations in the image as equally likely. We show that with probabilistic estimates of 3D geometry, both in terms of surfaces and world coordinates, we can put objects into perspective and model the scale and location variance in the image. Our approach reflects the cyclical nature of the problem by allowing probabilistic object hypotheses to refine geometry and vice-versa. Our framework allows painless substitution of almost any object detector and is easily extended to include other aspects of image understanding. Our results confirm the benefits of our integrated approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号