期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Human action segmentation and classification based on the Isomap algorithm

Yu-Ming Liang Sheng-Wen Shih Arthur Chun-Chieh Shih 《Multimedia Tools and Applications》2013,62(3):561-580

Visual analysis of human behavior has attracted a great deal of attention in the field of computer vision because of the wide variety of potential applications. Human behavior can be segmented into atomic actions, each of which indicates a single, basic movement. To reduce human intervention in the analysis of human behavior, unsupervised learning may be more suitable than supervised learning. However, the complex nature of human behavior analysis makes unsupervised learning a challenging task. In this paper, we propose a framework for the unsupervised analysis of human behavior based on manifold learning. First, a pairwise human posture distance matrix is derived from a training action sequence. Then, the isometric feature mapping (Isomap) algorithm is applied to construct a low-dimensional structure from the distance matrix. Consequently, the training action sequence is mapped into a manifold trajectory in the Isomap space. To identify the break points between the trajectories of any two successive atomic actions, we represent the manifold trajectory in the Isomap space as a time series of low-dimensional points. A temporal segmentation technique is then applied to segment the time series into sub series, each of which corresponds to an atomic action. Next, the dynamic time warping (DTW) approach is used to cluster atomic action sequences. Finally, we use the clustering results to learn and classify atomic actions according to the nearest neighbor rule. If the distance between the input sequence and the nearest mean sequence is greater than a given threshold, it is regarded as an unknown atomic action. Experiments conducted on real data demonstrate the effectiveness of the proposed method. 相似文献

2.

Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences.

Maja Pantic Ioannis Patras 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2006,36(2):433-449

Automatic analysis of human facial expression is a challenging problem with many applications. Most of the existing automated systems for facial expression analysis attempt to recognize a few prototypic emotional expressions, such as anger and happiness. Instead of representing another approach to machine analysis of prototypic facial expressions of emotion, the method presented in this paper attempts to handle a large range of human facial behavior by recognizing facial muscle actions that produce expressions. Virtually all of the existing vision systems for facial muscle action detection deal only with frontal-view face images and cannot handle temporal dynamics of facial actions. In this paper, we present a system for automatic recognition of facial action units (AUs) and their temporal models from long, profile-view face image sequences. We exploit particle filtering to track 15 facial points in an input face-profile sequence, and we introduce facial-action-dynamics recognition from continuous video input using temporal rules. The algorithm performs both automatic segmentation of an input video into facial expressions pictured and recognition of temporal segments (i.e., onset, apex, offset) of 27 AUs occurring alone or in a combination in the input face-profile video. A recognition rate of 87% is achieved. 相似文献

3.

基于姿态校正与姿态融合的2D/3D骨架动作识别方法

曾胜强李琳《计算机应用研究》2022,39(3):900-905

针对现有的人体骨架动作识别方法对肢体信息挖掘不足以及时间特征提取不足的问题,提出了一种基于姿态校正模块与姿态融合模块的模型PTF-SGN,实现了对骨架图关键时空信息的充分利用。首先,对骨架图数据进行预处理,挖掘肢体和关节点的位移信息并提取特征;然后,姿态校正模块通过无监督学习的方式获取姿态调整因子,并对人体姿态进行自适应调整,增强了模型在不同环境下的鲁棒性;其次,提出一种基于时间注意力机制的姿态融合模块,学习骨架图中的短时刻特征与长时刻特征并融合长短时刻特征,加强了对时间特征的表征能力;最后,将骨架图的全局时空特征输入到分类网络中得到动作识别结果。在NTU60 RGB+D、NTU120 RGB+D两个3D骨架数据集和Penn-Action、HARPET两个2D骨架数据集上的实验结果表明,该模型能够有效地识别骨架时序数据的动作。相似文献

4.

Human action learning via hidden Markov model

Jie Yang Yangsheng Xu Chen C.S. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》1997,27(1):34-44

To successfully interact with and learn from humans in cooperative modes, robots need a mechanism for recognizing, characterizing, and emulating human skills. In particular, it is our interest to develop the mechanism for recognizing and emulating simple human actions, i.e., a simple activity in a manual operation where no sensory feedback is available. To this end, we have developed a method to model such actions using a hidden Markov model (HMM) representation. We proposed an approach to address two critical problems in action modeling: classifying human action-intent, and learning human skill, for which we elaborated on the method, procedure, and implementation issues in this paper. This work provides a framework for modeling and learning human actions from observations. The approach can be applied to intelligent recognition of manual actions and high-level programming of control input within a supervisory control paradigm, as well as automatic transfer of human skills to robotic systems 相似文献

5.

An efficient attention module for 3d convolutional neural networks in action recognition

Jiang Guanghao Jiang Xiaoyan Fang Zhijun Chen Shanshan 《Applied Intelligence》2021,51(10):7043-7057

Due to illumination changes, varying postures, and occlusion, accurately recognizing actions in videos is still a challenging task. A three-dimensional convolutional neural network (3D CNN), which can simultaneously extract spatio-temporal features from sequences, is one of the mainstream models for action recognition. However, most of the existing 3D CNN models ignore the importance of individual frames and spatial regions when recognizing actions. To address this problem, we propose an efficient attention module (EAM) that contains two sub-modules, that is, a spatial efficient attention module (EAM-S) and a temporal efficient attention module (EAM-T). Specifically, without dimensionality reduction, EAM-S concentrates on mining category-based correlation by local cross-channel interaction and assigns high weights to important image regions, while EAM-T estimates the importance score of different frames by cross-frame interaction between each frame and its neighbors. The proposed EAM module is lightweight yet effective, and it can be easily embedded into 3D CNN-based action recognition models. Extensive experiments on the challenging HMDB-51 and UCF-101 datasets showed that our proposed module achieves state-of-the-art performance and can significantly improve the recognition accuracy of 3D CNN-based action recognition methods.

相似文献

6.

基于足底压力的人体姿态检测和行为分析方法

强家辉张为公王东《测控技术》2018,37(1):1-4

人体的运动姿态检测和行为分析具有广泛应用价值.设计了一种基于足底压力的人体姿态检测和行为分析系统,系统由压力采集模块、数据处理模块、无线通信模块和行为分析软件组成.姿态检测系统获取足底压力数据,采用蓝牙方式进行数据通信,行为分析软件使用支持向量机多分类方法实现坐、站、走、跑和爬楼5个经典人体姿态的区分.实验证明该系统对人体姿态具有较好的识别精度和可靠性,可以用于人体姿态检测和行为分析. 相似文献

7.

MUSER: A prototype musical score recognition system using mathematical morphology

Bharath R. Modayur Visvanathan Ramesh Robert M. Haralick Linda G. Shapiro 《Machine Vision and Applications》1993,6(2-3):140-150

Music representation utilizes a fairly rich repertoire of symbols. These symbols appear on a score sheet with relatively little shape distortion, differing from the prototype symbol shapes mainly by a positional translation and scale change. The prototype system we describe in this article is aimed at recognizing printed music notation from digitized music score images. The recognition system is composed of two parts: a low-level vision module that uses morphological algorithms for symbol detection and a high-level module that utilizes prior knowledge of music notation to reason about spatial positions and spatial sequences of these symbols. The high-level module also employs verification procedures to check the veracity of the output of the morphological symbol recognizer. The system produces an ASCII representation of music scores that can be input to a music-editing system. Mathematical morphology provides us the theory and the tools to analyze shapes. This characteristic of mathematical morphology lends itself well to analyzing and subsequently recognizing music scores that are rich in well-defined musical symbols. Since morphological operations can be efficiently implemented in machine vision systems that have special hardware support, the recognition task can be performed in near real-time. The system achieves accuracy in excess of 95% on the sample scores processed so far with a peak accuracy of 99.7% for the quarter and eighth notes, demonstrating the efficacy of morphological techniques for shape extraction. 相似文献

8.

基于手势特征融合的操作动作识别

下载免费PDF全文

周小静陈俊洪杨振国刘文印《计算机工程与应用》2021,57(14):169-175

针对动态复杂场景下的操作动作识别,提出一种基于手势特征融合的动作识别框架,该框架主要包含RGB视频特征提取模块、手势特征提取模块与动作分类模块。其中RGB视频特征提取模块主要使用I3D网络提取RGB视频的时间和空间特征;手势特征提取模块利用Mask R-CNN网络提取操作者手势特征;动作分类模块融合上述特征,并输入到分类器中进行分类。在EPIC-Kitchens数据集上,提出的方法识别抓取手势的准确性高达89.63%,识别综合动作的准确度达到了74.67%。相似文献

9.

Facial expression recognition using tracked facial actions: Classifier performance analysis

Fadi Dornaika Abdelmalik Moujahid Bogdan Raducanu 《Engineering Applications of Artificial Intelligence》2013,26(1):467-477

In this paper, we address the analysis and recognition of facial expressions in continuous videos. More precisely, we study classifiers performance that exploit head pose independent temporal facial action parameters. These are provided by an appearance-based 3D face tracker that simultaneously provides the 3D head pose and facial actions. The use of such tracker makes the recognition pose- and texture-independent. Two different schemes are studied. The first scheme adopts a dynamic time warping technique for recognizing expressions where training data are given by temporal signatures associated with different universal facial expressions. The second scheme models temporal signatures associated with facial actions with fixed length feature vectors (observations), and uses some machine learning algorithms in order to recognize the displayed expression. Experiments quantified the performance of different schemes. These were carried out on CMU video sequences and home-made video sequences. The results show that the use of dimension reduction techniques on the extracted time series can improve the classification performance. Moreover, these experiments show that the best recognition rate can be above 90%. 相似文献

10.

Classification and recognition of dynamical models: the role of phase, independent components, kernels and optimal transport

Bissacco A Chiuso A Soatto S 《IEEE transactions on pattern analysis and machine intelligence》2007,29(11):1958-1972

We address the problem of performing decision tasks, and in particular classification and recognition, in the space of dynamical models in order to compare time series of data. Motivated by the application of recognition of human motion in image sequences, we consider a class of models that include linear dynamics, both stable and marginally stable (periodic), both minimum and non-minimum phase, driven by non-Gaussian processes. This requires extending existing learning and system identification algorithms to handle periodic modes and nonminimum phase behavior, while taking into account higher-order statistics of the data. Once a model is identified, we define a kernel-based cord distance between models that includes their dynamics, their initial conditions as well as input distribution. This is made possible by a novel kernel defined between two arbitrary (non-Gaussian) distributions, which is computed by efficiently solving an optimal transport problem. We validate our choice of models, inference algorithm, and distance on the tasks of human motion synthesis (sample paths of the learned models), and recognition (nearest-neighbor classification in the computed distance). However, our work can be applied more broadly where one needs to compare historical data while taking into account periodic trends, non-minimum phase behavior, and non-Gaussian input distributions. 相似文献

11.

基于深度数据的人体动作识别方法

下载免费PDF全文

王鑫沃波海管秋陈胜勇《中国图象图形学报》2014,19(6)

本文提出了一个基于流形学习的动作识别框架,用来识别深度图像序列中的人体行为。本文从Kinect设备获得的深度信息中评估出人体的关节点信息,并用相对关节点位置差作为人体特征表达。在训练阶段,本文利用Lapacian eigenmaps(LE)流形学习对高维空间下的训练集进行降维,得到低维隐空间下的运动模型。在识别阶段,本文用最近邻差值方法将测试序列映射到低维流形空间中去,然后进行匹配计算。在匹配过程中,通过使用改进的Hausdorff距离对低维空间下测试序列和训练运动集的吻合度和相似度进行度量。本文用Kinect设备捕获的数据进行了实验,取得了良好的效果;同时本文也在MSR Action3D数据库上进行了测试,结果表明在训练样本较多情况下,本文识别效果优于以往方法。实验结果表明本文所提的方法适用于基于深度图像序列的人体动作识别。相似文献

12.

基于人体关节点的多人吸烟动作识别算法

下载免费PDF全文

刘婧杨旭刘董经典牛强《计算机工程与应用》2021,57(1):234-241

吸烟检测已成为公共场所禁烟的重要措施,基于视频图像的吸烟动作识别已广泛用于吸烟检测中。使用深度学习的方法进行图像处理,需要大量数据集训练模型。现有的吸烟动作识别方法的准确率和实时性不够理想,且多只针对一个人进行动作识别。为解决这些问题,提出了一种通过检测周期性动作来识别多人吸烟动作的方法。在进行了大量的实验后发现吸烟行为是有节奏和周期性的,对此具体分析了吸烟行为的周期性并制定了吸烟行为规范;利用人体关节点信息,关注关节点的运动轨迹,检测运动轨迹是否符合周期性规律从而实现吸烟动作识别;同时跟踪多人关节点的信息,以实现多个人实时吸烟行为的识别。实验结果表明,该方法可以达到91%的准确率,在各种情况下都可以保持较高准确率和鲁棒性。相似文献

13.

Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos

《Image and vision computing》2017

相似文献

14.

Video-Based Human Movement Analysis and Its Application to Surveillance Systems

Jun-Wei Hsieh Yung-Tai Hsu Liao H.-Y.M. Chih-Chiang Chen 《Multimedia, IEEE Transactions on》2008,10(3):372-384

相似文献

15.

时空域融合的骨架动作识别与交互研究

下载免费PDF全文

钟秋波郑彩明朴松昊《智能系统学报》2020,15(3):601-608

在人体骨架结构动作识别方法中,很多研究工作在提取骨架结构上的空间信息和运动信息后进行融合,没有对具有复杂时空关系的人体动作进行高效表达。本文提出了基于姿态运动时空域融合的图卷积网络模型(PM-STFGCN)。对于在时域上存在大量的干扰信息,定义了一种基于局部姿态运动的时域关注度模块(LPM-TAM),用于抑制时域上的干扰并学习运动姿态的表征。设计了基于姿态运动的时空域融合模块(PM-STF),融合时域运动和空域姿态特征并进行自适应特征增强。通过实验验证,本文提出的方法是有效性的,与其他方法相比,在识别效果上具有很好的竞争力。设计的人体动作交互系统,验证了在实时性和准确率上优于语音交互系统。相似文献

16.

Structured learning of local features for human action classification and localization

Tuan Hue Thi Li Cheng Jian Zhang Li Wang Shinichi Satoh 《Image and vision computing》2012

Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods. 相似文献

17.

Modeling individual and group actions in meetings with layered HMMs 总被引：2，自引：0，他引：2

《Multimedia, IEEE Transactions on》2006,8(3):509-520

We address the problem of recognizing sequences of human interaction patterns in meetings, with the goal of structuring them in semantic terms. The investigated patterns are inherently group-based (defined by the individual activities of meeting participants, and their interplay), and multimodal (as captured by cameras and microphones). By defining a proper set of individual actions, group actions can be modeled as a two-layer process, one that models basic individual activities from low-level audio-visual (AV) features,and another one that models the interactions. We propose a two-layer hidden Markov model (HMM) framework that implements such concept in a principled manner, and that has advantages over previous works. First, by decomposing the problem hierarchically, learning is performed on low-dimensional observation spaces, which results in simpler models. Second, our framework is easier to interpret, as both individual and group actions have a clear meaning, and thus easier to improve. Third, different HMMs can be used in each layer, to better reflect the nature of each subproblem. Our framework is general and extensible, and we illustrate it with a set of eight group actions, using a public 5-hour meeting corpus. Experiments and comparison with a single-layer HMM baseline system show its validity. 相似文献

18.

基于流形学习的人体动作识别 总被引：5，自引：2，他引：3

下载免费PDF全文

王鑫沃波海管秋陈胜勇《中国图象图形学报》2014,19(6):914-923

目的提出了一个基于流形学习的动作识别框架,用来识别深度图像序列中的人体行为。方法从Kinect设备获得的深度信息中评估出人体的关节点信息,并用相对关节点位置差作为人体特征表达。在训练阶段,利用LE（Lalpacian eigenmaps）流形学习对高维空间下的训练集进行降维,得到低维隐空间下的运动模型。在识别阶段,用最近邻差值方法将测试序列映射到低维流形空间中去,然后进行匹配计算。在匹配过程中,通过使用改进的Hausdorff距离对低维空间下测试序列和训练运动集的吻合度和相似度进行度量。结果用Kinect设备捕获的数据进行了实验,取得了良好的效果;同时也在MSR Action3D数据库上进行了测试,结果表明在训练样本较多情况下,本文方法识别效果优于以往方法。结论实验结果表明本文方法适用于基于深度图像序列的人体动作识别。相似文献

19.

基于分块注意力机制和交互位置关系的群组活动识别

刘博卿粼波王正勇刘美姜雪《计算机应用》2022,42(7):2052-2057

复杂场景下的群体活动识别是一项具有挑战性的任务,它涉及一组人在场景中的相互作用和相对空间位置关系。针对当前复杂场景下群组行为识别方法缺乏精细化设计以及没有充分利用个体间交互式特征的问题,提出了基于分块注意力机制和交互位置关系的网络框架,进一步考虑个体肢体语义特征,同时挖掘个体间交互特征相似性与行为一致性的关系。首先,采用原始视频序列和光流图像序列作为网络的输入,并引入一种分块注意力模块来细化个体的肢体运动特征;然后,将空间位置和交互式距离作为个体的交互特征;最后,将个体运动特征和空间位置关系特征融合为群体场景无向图的节点特征,并利用图卷积网络（GCN）进一步捕获全局场景下的活动交互,从而识别群体活动。实验结果表明,此框架在两个群组行为识别数据集（CAD和CAE）上分别取得了92.8%和97.7%的识别准确率,在CAD数据集上与成员关系图（ARG）和置信度能量循环网络（CERN）相比识别准确率分别提高了1.8个百分点和5.6个百分点,同时结合消融实验结果验证了所提算法有较高的识别精度。相似文献

20.

An expert system for general symbol recognition 总被引：3，自引：0，他引：3

Maher Rabab Kreidieh 《Pattern recognition》2000,33(12):1975-1988

An expert system for analysis and recognition of general symbols is introduced. The system uses the structural pattern recognition technique for modeling symbols by a set of straight lines referred to as segments. The system rotates, scales and thins the symbol, then extracts the symbol strokes. Each stroke is transferred into segments (straight lines). The system is shown to be able to map similar styles of the symbol to the same representation. When the system had some stored models for each symbol (an average of 97 models/symbol), the rejection rate was 16.1% and the recognition rate was 83.9% of which 95% was recognized correctly. The system is tested by 5726 handwritten characters from the Center of Excellence for Document Analysis and Recognition (CEDAR) database. The system is capable of learning new symbols by simply adding their models to the system knowledge base. 相似文献