首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于流形学习的人体动作识别   总被引:5,自引:2,他引:3       下载免费PDF全文
目的 提出了一个基于流形学习的动作识别框架,用来识别深度图像序列中的人体行为。方法 从Kinect设备获得的深度信息中评估出人体的关节点信息,并用相对关节点位置差作为人体特征表达。在训练阶段,利用LE(Lalpacian eigenmaps)流形学习对高维空间下的训练集进行降维,得到低维隐空间下的运动模型。在识别阶段,用最近邻差值方法将测试序列映射到低维流形空间中去,然后进行匹配计算。在匹配过程中,通过使用改进的Hausdorff距离对低维空间下测试序列和训练运动集的吻合度和相似度进行度量。结果 用Kinect设备捕获的数据进行了实验,取得了良好的效果;同时也在MSR Action3D数据库上进行了测试,结果表明在训练样本较多情况下,本文方法识别效果优于以往方法。结论 实验结果表明本文方法适用于基于深度图像序列的人体动作识别。  相似文献   

2.
Several attempts have been made to grasp three‐dimensional (3D) ground shape from a 3D point cloud generated by aerial vehicles, which help fast situation recognition. However, identifying such objects on the ground from a 3D point cloud, which consists of 3D coordinates and color information, is not straightforward due to the gap between the low‐level point information (coordinates and colors) and high‐level context information (objects). In this paper, we propose a ground object recognition and segmentation method from a geo‐referenced point cloud. Basically, we rely on some existing tools to generate such a point cloud from aerial images, and our method tries to give semantics to each set of clustered points. In our method, firstly, such points that correspond to the ground surface are removed using the elevation data from the Geographical Survey Institute. Next, we apply an interpoint distance‐based clustering and color‐based clustering. Then, such clusters that share some regions are merged to correctly identify a cluster that corresponds to a single object. We have evaluated our method in several experiments in real fields. We have confirmed that our method can remove the ground surface within 20 cm error and can recognize most of the objects.  相似文献   

3.
本文提出了一个基于流形学习的动作识别框架,用来识别深度图像序列中的人体行为。本文从Kinect设备获得的深度信息中评估出人体的关节点信息,并用相对关节点位置差作为人体特征表达。在训练阶段,本文利用Lapacian eigenmaps(LE)流形学习对高维空间下的训练集进行降维,得到低维隐空间下的运动模型。在识别阶段,本文用最近邻差值方法将测试序列映射到低维流形空间中去,然后进行匹配计算。在匹配过程中,通过使用改进的Hausdorff距离对低维空间下测试序列和训练运动集的吻合度和相似度进行度量。本文用Kinect设备捕获的数据进行了实验,取得了良好的效果;同时本文也在MSR Action3D数据库上进行了测试,结果表明在训练样本较多情况下,本文识别效果优于以往方法。实验结果表明本文所提的方法适用于基于深度图像序列的人体动作识别。  相似文献   

4.
目的 人体行为识别在视频监控、环境辅助生活、人机交互和智能驾驶等领域展现出了极其广泛的应用前景。由于目标物体遮挡、视频背景阴影、光照变化、视角变化、多尺度变化、人的衣服和外观变化等问题,使得对视频的处理与分析变得非常困难。为此,本文利用时间序列正反演构造基于张量的线性动态模型,估计模型的参数作为动作序列描述符,构造更加完备的观测矩阵。方法 首先从深度图像提取人体关节点,建立张量形式的人体骨骼正反向序列。然后利用基于张量的线性动态系统和Tucker分解学习参数元组(AF,AI,C),其中C表示人体骨架信息的空间信息,AFAI分别描述正向和反向时间序列的动态性。通过参数元组构造观测矩阵,一个动作就可以表示为观测矩阵的子空间,对应着格拉斯曼流形上的一点。最后通过在格拉斯曼流形上进行字典学习和稀疏编码完成动作识别。结果 实验结果表明,在MSR-Action 3D数据集上,该算法比Eigenjoints算法高13.55%,比局部切从支持向量机(LTBSVM)算法高2.79%,比基于张量的线性动态系统(tLDS)算法高1%。在UT-Kinect数据集上,该算法的行为识别率比LTBSVM算法高5.8%,比tLDS算法高1.3%。结论 通过大量实验评估,验证了基于时间序列正反演构造出来的tLDS模型很好地解决了上述问题,提高了人体动作识别率。  相似文献   

5.
With the development of computer vision technologies, 3D reconstruction has become a hotspot. At present, 3D reconstruction relies heavily on expensive equipment and has poor real-time performance. In this paper, we aim at solving the problem of 3D reconstruction of an indoor scene with large vertical span. In this paper, we propose a novel approach for 3D reconstruction of indoor scenes with only a Kinect. Firstly, this method uses a Kinect sensor to get color images and depth images of an indoor scene. Secondly, the combination of scale-invariant feature transform and random sample consensus algorithm is used to determine the transformation matrix of adjacent frames, which can be seen as the initial value of iterative closest point (ICP). Thirdly, we establish the relative coordinate relation between pair-wise frames which are the initial point cloud data by using ICP. Finally, we achieve the 3D visual reconstruction model of indoor scene by the top-down image registration of point cloud data. This approach not only mitigates the sensor perspective restriction and achieves the indoor scene reconstruction of large vertical span, but also develops the fast algorithm of indoor scene reconstruction with large amount of cloud data. The experimental results show that the proposed algorithm has better accuracy, better reconstruction effect, and less running time for point cloud registration. In addition, the proposed method has great potential applied to 3D simultaneous location and mapping.  相似文献   

6.
针对传统的彩色视频中动作识别算法成本高,且二维信息不足导致动作识别效果不佳的问题,提出一种新的基于三维深度图像序列的动作识别方法。该算法在时间维度上提出了时间深度模型(TDM)来描述动作。在三个正交的笛卡尔平面上,将深度图像序列分成几个子动作,对所有子动作作帧间差分并累积能量,形成深度运动图来描述动作的动态特征。在空间维度上,用空间金字塔方向梯度直方图(SPHOG)对时间深度模型进行编码得到了最终的描述符。最后用支持向量机(SVM)进行动作的分类。在两个权威数据库MSR Action3D和MSRGesture3D上进行实验验证,该方法识别率分别达到了94.90%(交叉测试组)和94.86%。实验结果表明,该方法能够快速对深度图像序列进行计算并取得较高的识别率,并基本满足深度视频序列的实时性要求。  相似文献   

7.
When constructing a dense 3D model of an indoor static scene from a sequence of RGB-D images, the choice of the 3D representation (e.g. 3D mesh, cloud of points or implicit function) is of crucial importance. In the last few years, the volumetric truncated signed distance function (TSDF) and its extensions have become popular in the community and largely used for the task of dense 3D modelling using RGB-D sensors. However, as this representation is voxel based, it offers few possibilities for manipulating and/or editing the constructed 3D model, which limits its applicability. In particular, the amount of data required to maintain the volumetric TSDF rapidly becomes huge which limits possibilities for portability. Moreover, simplifications (such as mesh extraction and surface simplification) significantly reduce the accuracy of the 3D model (especially in the color space), and editing the 3D model is difficult. We propose a novel compact, flexible and accurate 3D surface representation based on parametric surface patches augmented by geometric and color texture images. Simple parametric shapes such as planes are roughly fitted to the input depth images, and the deviations of the 3D measurements to the fitted parametric surfaces are fused into a geometric texture image (called the Bump image). A confidence and color texture image are also built. Our 3D scene representation is accurate yet memory efficient. Moreover, updating or editing the 3D model becomes trivial since it is reduced to manipulating 2D images. Our experimental results demonstrate the advantages of our proposed 3D representation through a concrete indoor scene reconstruction application.  相似文献   

8.
行为识别是当前计算机视觉方向中视频理解领域的重要研究课题。从视频中准确提取人体动作的特征并识别动作,能为医疗、安防等领域提供重要的信息,是一个十分具有前景的方向。本文从数据驱动的角度出发,全面介绍了行为识别技术的研究发展,对具有代表性的行为识别方法或模型进行了系统阐述。行为识别的数据分为RGB模态数据、深度模态数据、骨骼模态数据以及融合模态数据。首先介绍了行为识别的主要过程和人类行为识别领域不同数据模态的公开数据集;然后根据数据模态分类,回顾了RGB模态、深度模态和骨骼模态下基于传统手工特征和深度学习的行为识别方法,以及多模态融合分类下RGB模态与深度模态融合的方法和其他模态融合的方法。传统手工特征法包括基于时空体积和时空兴趣点的方法(RGB模态)、基于运动变化和外观的方法(深度模态)以及基于骨骼特征的方法(骨骼模态)等;深度学习方法主要涉及卷积网络、图卷积网络和混合网络,重点介绍了其改进点、特点以及模型的创新点。基于不同模态的数据集分类进行不同行为识别技术的对比分析。通过类别内部和类别之间两个角度对比分析后,得出不同模态的优缺点与适用场景、手工特征法与深度学习法的区别和融合多模态的优势。最后,总结了行为识别技术当前面临的问题和挑战,并基于数据模态的角度提出了未来可行的研究方向和研究重点。  相似文献   

9.
与传统光学相机相比,能同步获取RGB图像和深度图像数据,对人体行为识别提供了新的解决方案。因此,分别对RGB和深度图像序列提取改进的时空兴趣点特征,并基于一定规则实现时空兴趣点特征的融合。由于融合后特征的冗余性,基于时空聚类的方法,对特征进行优化处理,并采用SVM分类器进行训练和测试。实验结果表明提出的RGB和深度图像特征联合方法的行为识别平均准确率为91%,相对于其他方法取得了更好的识别结果。  相似文献   

10.
目的 基于3维骨架的行为识别研究在计算机视觉领域一直是非常活跃的主题,在监控、视频游戏、机器人、人机交互、医疗保健等领域已取得了非常多的成果。现今的行为识别算法大多选择固定关节点作为坐标中心,导致动作识别率较低,为解决动作行为识别中识别精度低的问题,提出一种自适应骨骼中心的人体行为识别的算法。方法 该算法首先从骨骼数据集中获取三维骨架序列,并对其进行预处理,得到动作的原始坐标矩阵;再根据原始坐标矩阵提取特征,依据特征值的变化自适应地选择坐标中心,重新对原始坐标矩阵进行归一化;最后通过动态时间规划方法对动作坐标矩阵进行降噪处理,借助傅里叶时间金字塔表示的方法减少动作坐标矩阵时间错位和噪声问题,再使用支持向量机对动作坐标矩阵进行分类。论文使用国际上通用的数据集UTKinect-Action和MSRAction3D对算法进行验证。结果 结果表明,在UTKinect-Action数据集上,该算法的行为识别率比HO3D J2算法高4.28%,比CRF算法高3.48%。在MSRAction3D数据集上,该算法比HOJ3D算法高9.57%,比Profile HMM算法高2.07%,比Eigenjoints算法高6.17%。结论 本文针对现今行为识别算法的识别率低问题,探究出问题的原因是采用了固定关节坐标中心,提出了自适应骨骼中心的行为识别算法。经仿真验证,该算法能有效提高人体行为识别的精度。  相似文献   

11.
提供了一个较大规模的基于RGB-D摄像机的人体复杂行为数据库DMV (Dynamic and multi-view) action3D,从2个固定视角和一台移动机器人动态视角录制人体行为。数据库现有31个不同的行为类,包括日常行为、交互行为和异常行为类等三大类动作,收集了超过620个行为视频约60万帧彩色图像和深度图像,为机器人寻找最佳视角提供了可供验证的数据库。为验证数据集的可靠性和实用性,本文采取4种方法进行人体行为识别,分别是基于关节点信息特征、基于卷积神经网络(Convolutional neural networks,CNN)和条件随机场(Conditional random field,CRF)结合的CRFasRNN方法提取的彩色图像HOG3D特征,然后采用支持向量机(Support vector machine,SVM)方法进行了人体行为识别;基于3维卷积网络(C3D)和3D密集连接残差网络提取时空特征,通过softmax层以预测动作标签。实验结果表明:DMV action3D人体行为数据库由于场景多变、动作复杂等特点,识别的难度也大幅增大。DMV action3D数据集对于研究真实环境下的人体行为具有较大的优势,为服务机器人识别真实环境下的人体行为提供了一个较佳的资源。  相似文献   

12.
This paper presents an interactive system for quickly designing and previewing colored snapshots of indoor scenes. Different from high-quality 3D indoor scene rendering, which often takes several minutes to render a moderately complicated scene under a specific color theme with high-performance computing devices, our system aims at improving the effectiveness of color theme design of indoor scenes and employs an image colorization approach to efficiently obtain high-resolution snapshots with editable colors. Given several pre-rendered, multi-layer, gray images of the same indoor scene snapshot, our system is designed to colorize and merge them into a single colored snapshot. Our system also assists users in assigning colors to certain objects/components and infers more harmonious colors for the unassigned objects based on pre-collected priors to guide the colorization. The quickly generated snapshots of indoor scenes provide previews of interior design schemes with different color themes, making it easy to determine the personalized design of indoor scenes. To demonstrate the usability and effectiveness of this system, we present a series of experimental results on indoor scenes of different types, and compare our method with a state-of-the-art method for indoor scene material and color suggestion and offline/online rendering software packages.  相似文献   

13.
Chen  Yanfang  Wang  Liwei  Li  Chuankun  Hou  Yonghong  Li  Wanqing 《Multimedia Tools and Applications》2020,79(3-4):1707-1725

With the advance of deep learning, deep learning based action recognition is an important research topic in computer vision. The skeleton sequence is often encoded into an image to better use Convolutional Neural Networks (ConvNets) such as Joint Trajectory Maps (JTM). However, this encoding method cannot effectively capture long temporal information. In order to solve this problem, This paper presents an effective method to encode spatial-temporal information into color texture images from skeleton sequences, referred to as Temporal Pyramid Skeleton Motion Maps (TPSMMs), and Convolutional Neural Networks (ConvNets) are applied to capture the discriminative features from TPSMMs for human action recognition. The TPSMMs not only capture short temporal information, but also embed the long dynamic information over the period of an action. The proposed method has been verified and achieved the state-of-the-art results on the widely used UTD-MHAD, MSRC-12 Kinect Gesture and SYSU-3D datasets.

  相似文献   

14.
目的 由于室内点云场景中物体的密集性、复杂性以及多遮挡等带来的数据不完整和多噪声问题,极大地限制了室内点云场景的重建工作,无法保证场景重建的准确度。为了更好地从无序点云中恢复出完整的场景,提出了一种基于语义分割的室内场景重建方法。方法 通过体素滤波对原始数据进行下采样,计算场景三维尺度不变特征变换(3D scale-invariant feature transform,3D SIFT)特征点,融合下采样结果与场景特征点从而获得优化的场景下采样结果;利用随机抽样一致算法(random sample consensus,RANSAC)对融合采样后的场景提取平面特征,将该特征输入PointNet网络中进行训练,确保共面的点具有相同的局部特征,从而得到每个点在数据集中各个类别的置信度,在此基础上,提出了一种基于投影的区域生长优化方法,聚合语义分割结果中同一物体的点,获得更精细的分割结果;将场景物体的分割结果划分为内环境元素或外环境元素,分别采用模型匹配的方法、平面拟合的方法从而实现场景的重建。结果 在S3DIS (Stanford large-scale 3D indoor space dataset)数据集上进行实验,本文融合采样算法对后续方法的效率和效果有着不同程度的提高,采样后平面提取算法的运行时间仅为采样前的15%;而语义分割方法在全局准确率(overall accuracy,OA)和平均交并比(mean intersection over union,mIoU)两个方面比PointNet网络分别提高了2.3%和4.2%。结论 本文方法能够在保留关键点的同时提高计算效率,在分割准确率方面也有着明显提升,同时可以得到高质量的重建结果。  相似文献   

15.
目的 利用深度图序列进行人体行为识别是机器视觉和人工智能中的一个重要研究领域,现有研究中存在深度图序列冗余信息过多以及生成的特征图中时序信息缺失等问题。针对深度图序列中冗余信息过多的问题,提出一种关键帧算法,该算法提高了人体行为识别算法的运算效率;针对时序信息缺失的问题,提出了一种新的深度图序列特征表示方法,即深度时空能量图(depth spatial-temporal energy map,DSTEM),该算法突出了人体行为特征的时序性。方法 关键帧算法根据差分图像序列的冗余系数剔除深度图序列的冗余帧,得到足以表述人体行为的关键帧序列。DSTEM算法根据人体外形及运动特点建立能量场,获得人体能量信息,再将能量信息投影到3个正交轴获得DSTEM。结果 在MSR_Action3D数据集上的实验结果表明,关键帧算法减少冗余量,各算法在关键帧算法处理后运算效率提高了20% 30%。对DSTEM提取的方向梯度直方图(histogram of oriented gradient,HOG)特征,不仅在只有正序行为的数据库上识别准确率达到95.54%,而且在同时具有正序和反序行为的数据库上也能保持82.14%的识别准确率。结论 关键帧算法减少了深度图序列中的冗余信息,提高了特征图提取速率;DSTEM不仅保留了经过能量场突出的人体行为的空间信息,而且完整地记录了人体行为的时序信息,在带有时序信息的行为数据上依然保持较高的识别准确率。  相似文献   

16.
We present an algorithm to model 3D workspace and to understand test scene for mobile robot’s navigation or human computer interaction. This has done by line-based modeling and recognition algorithm. Line-based recognition using 3D lines has been tried by many researchers however its reliability still needs improvement due to ambiguity of 3D line feature information from original images. To improve the outcome, we approach firstly to find real planes using given 3D lines and then to implement recognition process. The methods we use are principle component analysis (PCA), plane sweep, occlusion query, and iterative closest point (ICP). During the implementation, we also use 3D map information for localization. We apply this algorithm to real test scene images and find out our result can be useful to identify doors or walls in indoor environment with better efficiency.  相似文献   

17.
This paper presents a 2D to 3D conversion scheme to generate a 3D human model using a single depth image with several color images. In building a complete 3D model, no prior knowledge such as a pre-computed scene structure and photometric and geometric calibrations is required since the depth camera can directly acquire the calibrated geometric and color information in real time. The proposed method deals with a self-occlusion problem which often occurs in images captured by a monocular camera. When an image is obtained from a fixed view, it may not have data for a certain part of an object due to occlusion. The proposed method consists of following steps to resolve this problem. First, the noise in a depth image is reduced by using a series of image processing techniques. Second, a 3D mesh surface is constructed using the proposed depth image-based modeling method. Third, the occlusion problem is resolved by removing the unwanted triangles in the occlusion region and filling the corresponding hole. Finally, textures are extracted and mapped to the 3D surface of the model to provide photo-realistic appearance. Comparison results with the related work demonstrate the efficiency of our method in terms of visual quality and computation time. It can be utilized in creating 3D human models in many 3D applications.  相似文献   

18.
We present an approach which exploits the coupling between human actions and scene geometry to use human pose as a cue for single-view 3D scene understanding. Our method builds upon recent advances in still-image pose estimation to extract functional and geometric constraints on the scene. These constraints are then used to improve single-view 3D scene understanding approaches. The proposed method is validated on monocular time-lapse sequences from YouTube and still images of indoor scenes gathered from the Internet. We demonstrate that observing people performing different actions can significantly improve estimates of 3D scene geometry.  相似文献   

19.
人体三维运动实时跟踪与建模系统   总被引:1,自引:0,他引:1  
提出了一种新的人体三维运动实时跟踪与建模系统设计方法,并基于此实现了一套鲁棒的参考应用系统.针对人机交互等对跟踪精度要求不是很高的应用场合,系统在跟踪精确性和简易性与可推广性之间做了很好的折中.系统使用多个摄像头采集图像,实时计算场景深度信息,然后结合使用深度和颜色信息进行人体跟踪.应用一个简易的人体上半身三维模型,并使用基于颜色直方图的粒子滤波算法对头部和手部进行跟踪,从而恢复出模型的各个参数.系统以人脸检测和人手肤色聚类算法为初始化方法.大量实验证明,该系统能在复杂背景下进行人体上半身的跟踪和三维模型恢复,能进行完全自动的初始化,有较强的抗干扰能力和自动错误恢复能力.系统在2.4GHz PC机上能以25帧/秒的速度运行.  相似文献   

20.
近年来,基于人体动作识别的应用场景越来越广泛。为了更好的识别效果,提出了一种基于人体三维骨骼节点的动作识别方法。用Kinect等设备获取人体骨骼关节点三维数据信息,以人体臀部为原点重新建立人体坐标系;提取人体关键骨骼的数据信息,定义人体动作特征向量;根据动作表达式用行为树构造动作序列,实现识别。通过对5种定义的动作与其他算法做比较实验,表明提出的方法识别率较高,推广性较强。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号