首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Detecting and recognizing human action in natural scenarios, such as indoor and outdoor, is a significant technique in computer vision and intelligent systems, which is widely applied in video surveillance, pedestrian tracking and human-computer interaction. Conventional approaches have been proposed based on various features and achieved impressive performance. However, these methods failed to cope with partial occlusion and changes of posture. In order to address these limitations, we propose a novel human action recognition method. More specifically, in order to capture image spatial composition, we leverage a three-level spatial pyramid feature extraction scheme, where each pyramid is encoded by local features. Thereafter, regions generated by a proposal algorithm are fed into a dual-aggregation net for deep representation extraction. Afterwards, both local features and deep features are fused to describe each image. To describe human action category, we design a metric CXQDA based on Cosine measure and Cross-view Quadratic Discriminant Analysis (XQDA) to calculate the similarity among different action categories. Experimental results demonstrate that our proposed method can effectively cope with object scale variations, partial occlusion and achieve competitive performance.  相似文献   

2.
Image information may be distorted during acquisition, processing, compression, and transmission. It is necessary to propose an intelligent image quality assessment model toward big data environment to quantify the degree of distortion of the image. This paper proposes a quality assessment model for human motion images. In complex scenes, the human body's action posture can be taken as an important feature point. Usually, in different scenes, the parts that affect the quality of the human body's posture are different. In other words, the weights of feature points that affect quality are different in different scenarios. However, due to the categorization of human movements, we can learn the quality assessment methods of different types of movements through sample training. Inspired by feature learning in the field of machine learning, we propose a hierarchical quality learning approach. We cast quality assessment as quality feature learning and layer by layer. The hierarchical quality learning method is based on deep reinforcement learning. The key part is that the method focuses on the region that containing more information on the features of the quality and enlarges the region layer by layer. Finally, we can determine the part of the body that affects the quality assessment. We compare this method with the subjective quality assessment results of the human observers and find that the proposed method achieves effective performance in big data environment to evaluate human motion quality.  相似文献   

3.
4.
5.
Human action recognition plays an important role in modern intelligent systems, such as human–computer interaction (HCI), sport analysis, and somatosensory game. Compared with conventional 2-D based human action analysis, using Kinect sensor can obtain depth information of human action, which is significant for human action recognition. In this paper, we propose a joint angle sequence model for recognizing human actions, where depth images are acquired by using Kinect sensor. We design an improved DTW method to improve the matching accuracy. Comprehensive experiments show the effectiveness and robustness of our proposed method.  相似文献   

6.
Accurately recognizing human hand gestures is a useful component in many modern intelligent systems, such as identification authentication, human–computer interaction, and sign language recognition. Conventional approaches are typically based on shallow visual features and relatively simple backgrounds, which cannot readily recognize partially occluded hand gestures with sophisticated backgrounds. In this work, we propose a unified hand gesture recognition framework by optimally fusing a set of shallow/deep finger-level image attributes, based on which a weakly-supervised ranking algorithm is designed to select semantically salient regions for gesture understanding. More specifically, given a rich number of hand gesture images, we employ the well-known BING object proposal generator to extract hundreds of object patches that potentially draw human visual attention. Since the hundreds of object patches are still too many for building an effective recognition system, a weakly-supervised metric is proposed to rank them by extracting multiple shallow/deep features. And visual semantics are encoded at region-level by transferring the image-level semantic tags into various human gesture image regions by a weakly-supervised learning paradigm Apparently, the top-ranking highly salient object patches are highly indicative to human visual perception of human hand gesture, Thus we extract their ImageNet-CNN features and further concatenate them. Finally, the concatenated deep feature is fed into a multi-class SVM for classifying each hand gesture image into a particular type. Comprehensive experimental validations have demonstrated the effectiveness and robustness of our proposed hybrid-feature-based hand gesture categorization.  相似文献   

7.
Detecting and understanding human action under sophisticated lighting condition and backgrounds, also known as human action recognition in real-world context, is an indispensable component in modern intelligent systems and has becoming a hot research topic currently. Nowadays, human action recognition is still a tough challenge due to intra-class and inter-class, environment and temporal-level differences of the same action. Algorithms based on the single visual channel cannot achieve satisfactory performance. Thus, in this paper, we propose a novel action recognition framework towards sophisticated activity understanding, focusing on intelligently combining multimodel quality-related action features. Specifically, we first design a multi-channel feature fusion (MCFF) algorithm to capture visual appearance, motion and acoustic patterns from each video frame, where image-level labels are characterized by choosing high quality multimodel features. Subsequently, we design an adaptive key frame selection algorithm that can be applied to characterize human action from human action video stream. Thereafter, we engineer a multimodel feature based on an auxiliary human action retrieval system to achieve sophisticated activity understanding. Extensive experimental evaluations have demonstrated that the effectiveness and robustness of our proposed method.  相似文献   

8.
Taking fully into consideration the fact that one human action can be intuitively considered as a sequence of key poses and atomic motions in a particular order, a human action recognition method using multi-layer codebooks of key poses and atomic motions is proposed in this paper. Inspired by the dynamics models of human joints, normalized relative orientations are computed as features for each limb of human body. In order to extract key poses and atomic motions precisely, feature sequences are segmented into pose feature segments and motion feature segments dynamically, based on the potential differences of feature sequences. Multi-layer codebooks of each human action are constructed with the key poses extracted from pose feature segments and the atomic motions extracted from motion feature segments associated with each two key poses. The multi-layer codebooks represent action patterns of each human action, which can be used to recognize human actions with the proposed pattern-matching method. Three classification methods are employed for action recognition based on the multi-layer codebooks. Two public action datasets, i.e., CAD-60 and MSRC-12 datasets, are used to demonstrate the advantages of the proposed method. The experimental results show that the proposed method can obtain a comparable or better performance compared with the state-of-the-art methods.  相似文献   

9.
针对单个RGB图像,人体姿态估计通过对人体关键点定位来估计人体的位置和关节点位置。球类比赛是一种快速的运动,用主观观察对运动员的技术合法性进行判决无法避免错误。因此,文中利用基于人体姿态估计的运动员姿态分析技术进行辅助训练和辅助判罚,有效避免了传统系统中由于人的主观判断对运动员姿态的错误定位。目前,针对人体姿态估计的研究被分为基于传统算法和基于深度学习算法两种主要方式。在基于深度学习算法的基础上又分为单人人体姿态检测和多人人体姿态检测。基于深度学习算法的人体姿态估计通过构建神经网络,运用机器学习的方法提取图片特征读取图片信息,并在用于人体姿态估计的主流数据集上进行性能对比和分析。将人体姿态估计应用到球类运动中,为运动员的日常训练提供了一定的科学参考,同时也最大程度上保证了运动员比赛中的公平与公正。  相似文献   

10.
康书宁  张良 《信号处理》2020,36(11):1897-1905
基于深度学习的人体动作识别近几年取得了良好的识别效果,尤其是二维卷积神经网络可以较充分的学习人体动作的空间特征,但在捕获长时间的运动信息上仍存在问题。针对此问题,提出了基于语义特征立方体切片的人体动作识别模型来联合地学习动作的表观和运动特征。该模型在时序分割网络(Temporal Segment Networks,TSN)的基础上,选取InceptionV4作为骨干网络提取人体动作的表观特征,将得到的三维特征图立方体分为二维的空间上和时间上的特征图切片。另外设计一个时空特征融合模块协同的学习多维度切片的权重分配,从而得到人体动作的时空特征,由此实现了网络的端到端训练。与TSN模型相比,该模型在UCF101和 HMDB51数据集上的准确率均有所提升。实验结果表明,该模型在不显著增加网络参数量的前提下,能够捕获更丰富的运动信息,使人体动作的识别结果提高。   相似文献   

11.
Traditional image coding are mainly designed for human vision. While for collaborative intelligence, deep feature coding is specific for machine vision, which includes feature extraction and compression. Actually, deep features can build a bridge between human and machine vision. Therefore, we focus on generalized deep feature extraction and compression for multitask, which includes image reconstruction task for human vision and computer visual tasks for machine vision. After analyzing correlation among multitask, a reconstruction guided feature extraction strategy and feature fusion based network are proposed to get more generalized intermediate deep feature, which contains sufficient information friendly for human and machine vision. Besides, a non-uniform quantization method based on importance and a compact representation method for feature distribution information protection are proposed for high efficiency feature coding. Eventually, we come up with an entire intermediate deep feature coding framework including feature extraction and compression. Experimental results indicate the performance gains with our framework.  相似文献   

12.
Human activity recognition is one of the most studied topics in the field of computer vision. In recent years, with the availability of RGB-D sensors and powerful deep learning techniques, research on human activity recognition has gained momentum. From simple human atomic actions, the research has advanced towards recognizing more complex human activities using RGB-D data. This paper presents a comprehensive survey of the advanced deep learning based recognition methods and categorizes them in human atomic action, human–human interaction, human–object interaction. The reviewed methods are further classified based on the individual modality used for recognition i.e. RGB based, depth based, skeleton based, and hybrid. We also review and categorize recent challenging RGB-D datasets for the same. In addition, the paper also briefly reviews RGB-D datasets and methods for online activity recognition. The paper concludes with a discussion on limitations, challenges, and recent trends for promising future directions.  相似文献   

13.
视觉跟踪系统中,高效的特征表达是决定跟踪鲁棒性的关键,而多线索融合是解决复杂跟踪问题的有效手段。该文首先提出一种基于多网络并行、自适应触发的感知深度神经网络;然后,建立一个基于深度学习的、多线索融合的分块目标模型。目标分块的实现成倍地减少了网络输入的维度,从而大幅降低了网络训练时的计算复杂度;在跟踪过程中,模型能够根据各子块的置信度动态调整权重,提高对目标姿态变化、光照变化、遮挡等复杂情况的适应性。在大量的测试数据上进行了实验,通过对跟踪结果进行定性和定量分析表明,所提出算法具有很强的鲁棒性,能够比较稳定地跟踪目标。  相似文献   

14.
Accurate retinal vessel segmentation is a challenging problem in color fundus image analysis. An automatic retinal vessel segmentation system can effectively facilitate clinical diagnosis and ophthalmological research. In general, this problem suffers from various degrees of vessel thickness, perception of details, and contextual feature fusion in technique. For addressing these challenges, a deep learning based method has been proposed and several customized modules have been integrated into the well-known U-net with encoder–decoder architecture, which is widely employed in medical image segmentation. In the network structure, cascaded dilated convolutional modules have been integrated into the intermediate layers, for obtaining larger receptive field and generating denser encoded feature maps. Also, the advantages of the pyramid module with spatial continuity have been taken for multi-thickness perception, detail refinement, and contextual feature fusion. Additionally, the effectiveness of different normalization approaches has been discussed on different datasets with specific properties. Finally, sufficient comparative experiments have been enforced on three retinal vessel segmentation datasets, DRIVE, CHASE_DB1, and the STARE dataset with unhealthy samples. As a result, the proposed method outperforms the work of predecessors and achieves state-of-the-art performance.  相似文献   

15.
Human-computer interaction is the way in which humans and machines communicate information. With the rapid development of deep learning technology, the technology of human-computer interaction has also made a corresponding breakthrough. In the past, the way human-computer interaction was mostly relied on hardware devices. Through the coordinated work of multiple sensors, people and machines can realize information interaction. However, as theoretical technology continues to mature, algorithms for human-computer interaction are also being enriched. The popularity of convolutional neural networks has made image processing problems easier to solve. Therefore, real-time human-computer interaction can be performed by using image processing, and intelligent of human-computer interaction can be realized. The main idea of this paper is to use the real-time capture of face images and video information to image the face image information. We perform feature point positioning based on the feature points of the face image. We perform expression recognition based on the feature points that are located. At the same time, we perform ray tracing for the identified human eye area. The feature points of the face and the corresponding expressions and implementation movements represent the user's use appeal. Therefore, we can analyze the user's use appeal by locating the face feature area. We define the corresponding action information for specific user face features. We extract the user's corresponding information according to the user's face features, and perform human-computer interaction according to the user's information.  相似文献   

16.
刘礼才  李锐光  殷丽华  郭云川  项菲 《电子学报》2016,44(11):2713-2719
隐式鉴别机制在解决移动智能设备的安全性与易用性冲突方面具有重要而独特的作用.然而,已有工作通常基于单一特征或动作进行隐式鉴别,仅适合于特定动作、场景和范围.为了解决此问题,本文利用用户使用设备时存在位置、环境、状态、生物和行为特征,提出了一种基于多特征融合的隐式鉴别方案.该方案采集设备内置传感器、生物和行为数据,通过支持向量机方法训练和提取特征,设计多特征融合模型和构建隐式鉴别框架,计算用户身份信任水平,设计差异化安全策略并持续透明地鉴别用户身份.实验验证了该方案的有效性,并且能够平衡安全性与易用性和资源消耗.  相似文献   

17.
Video may be subject to various distortions during acquisition, processing, compression, storage, transmission, and reproduction, and it results in reduced visual quality. In complex sports scenes under big data environment, the human body's movements are even more so. The quality of human motion can intuitively affect the human visual experience. Therefore, it is necessary to determine an intelligent quality assessment model to evaluate human motion in complex motion scenarios under big data environment. It can be used to dynamically monitor and adjust video quality, and it can be used for algorithms and parameter settings in motion image processing systems. With the popularity of deep learning, convolutional neural networks have become a very important method in the field of computer vision research. Based on the 2D-CNN algorithm, we propose a 3D convolutional neural network model for human motion quality assessment in complex motion scenarios. The model captures the pose characteristics, motion trajectory, video brightness and contrast in time and space. The model feeds back the reference and distorted video pairs into the network, with each output layer acting as a feature map. The local similarity between the feature maps obtained from the reference video and the distorted video is then calculated and combined to obtain a global image quality score. Experiments show that the model can achieve competitive performance in big data environment for video quality assessment.  相似文献   

18.
李嘉懿  戴声奎  定志锋 《通信技术》2012,45(4):92-94,101
通过分析人体轮廓的性质,结合Sobel算子、HOG和LBP算子,提出一种基于脊模型的局部显著性特征提取方法,该特征将人体轮廓的方向特性和结构特性有机结合,属于一种多属性融合的人体描述方法。在人体梯度空间中,通过使用改进的LBP算子和简化的梯度方向直方图进行特征的提取、训练和识别。实验表明,使用该特征可有效的减少计算复杂度,提高行人检测速度;但与HOG相比,由于描述人体的特征维数较少,因此人体识别的检测率还有待提高。  相似文献   

19.
针对舞蹈视频图像中动作复杂多变、连贯性强、遮挡问题严重等问题,文中结合先进的舞蹈动作识别技术发展方向及其应用场景,同时考虑到彩色图像处理中计算机处理负载过重的问题,设计了一种基于2D姿势估计的高动态舞蹈动作识别方法。该方法主要分为模板建立与姿势估计两个部分,主要涉及的处理操作有图像预处理、模板特征提取和模板匹配这三种。验证测试结果表明,训练集图像经过灰度化与阈值化后,即可获得图像中前景舞者的图形,再利用Kinect人体模型提取动作特征信息。由于考虑到拍摄角度等原因导致的特征差异,将描述同一动作的多张训练图的特征信息保存在同一信息矩阵中,可进一步提高动作识别的准确性。  相似文献   

20.
Human action recognition from skeletal data is one of the most popular topics in computer vision which has been widely studied in the literature, occasionally with some very promising results. However, being supervised, most of the existing methods suffer from two major drawbacks; (1) too much reliance on massive labeled data and (2) high sensitivity to outliers, which in turn hinder their applications in such real-world scenarios as recognizing long-term and complex movements. In this paper, we propose a novel unsupervised 3D action recognition method called Sparseness Embedding in which the spatiotemporal representation of action sequences is nonlinearly projected into an unwarped feature representation medium, where unlike the original curved space, one can easily apply the Euclidean metrics. Our strategy can simultaneously integrate the characteristics of nonlinearity, sparsity, and space curvature of sequences into a single objective function, leading to a more robust and highly compact representation of discriminative attributes without any need to label information. Moreover, we propose a joint learning strategy for dealing with the heterogeneity of the temporal and spatial characteristics of action sequences. A set of extensive experiments on six publicly available databases, including UTKinect, TST fall, UTD-MHAD, CMU, Berkeley MHAD, and NTU RGB+D demonstrates the superiority of our method compared with the state-of-the-art algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号