首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Predefined sequences of eye movements, or ‘gaze gestures’, can be consciously performed by humans and monitored non-invasively using remote video oculography. Gaze gestures hold great potential in human–computer interaction, HCI, as long as they can be easily assimilated by potential users, monitored using low cost gaze tracking equipment and machine learning algorithms are able to distinguish the spatio-temporal structure of intentional gaze gestures from typical gaze activity performed during standard HCI. In this work, an evaluation of the performance of a bioinspired Bayesian pattern recognition algorithm known as Hierarchical Temporal Memory (HTM) on the real time recognition of gaze gestures is carried out through a user study. To improve the performance of traditional HTM during real time recognition, an extension of the algorithm is proposed in order to adapt HTM to the temporal structure of gaze gestures. The extension consists of an additional top node in the HTM topology that stores and compares sequences of input data by sequence alignment using dynamic programming. The spatio-temporal codification of a gesture in a sequence serves the purpose of handling the temporal evolution of gaze gestures instances. The extended HTM allows for reliable discrimination of intentional gaze gestures from otherwise standard human–machine gaze interaction reaching up to 98% recognition accuracy for a data set of 10 categories of gaze gestures, acceptable completion speeds and a low rate of false positives during standard gaze–computer interaction. These positive results despite the low cost hardware employed supports the notion of using gaze gestures as a new HCI paradigm for the fields of accessibility and interaction with smartphones, tablets, projected displays and traditional desktop computers.  相似文献   

2.
Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class and a substitute learning experience for students unable to attend. Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes while they are lip-reading or watching a sign-language interpreter. Synchronising the speech with text captions can ensure deaf students are not disadvantaged and assist all learners to search for relevant specific parts of the multimedia recording by means of the synchronised text. Automatic speech recognition has been used to provide real-time captioning directly from lecturers’ speech in classrooms but it has proved difficult to obtain accuracy comparable to stenography. This paper describes the development, testing and evaluation of a system that enables editors to correct errors in the captions as they are created by automatic speech recognition and makes suggestions for future possible improvements.  相似文献   

3.
Conventional human action recognition algorithms cannot work well when the amount of training videos is insufficient. We solve this problem by proposing a transfer topic model (TTM), which utilizes information extracted from videos in the auxiliary domain to assist recognition tasks in the target domain. The TTM is well characterized by two aspects: 1) it uses the bag-of-words model trained from the auxiliary domain to represent videos in the target domain; and 2) it assumes each human action is a mixture of a set of topics and uses the topics learned from the auxiliary domain to regularize the topic estimation in the target domain, wherein the regularization is the summation of Kullback-Leibler divergences between topic pairs of the two domains. The utilization of the auxiliary domain knowledge improves the generalization ability of the learned topic model. Experiments on Weizmann and KTH human action databases suggest the effectiveness of the proposed TTM for cross-domain human action recognition.  相似文献   

4.
Towards action refinement for true concurrent real time   总被引:2,自引:0,他引:2  
Action refinement is an essential operation in the design of concurrent systems, real-time or not. In this paper we develop a theory of action refinement in a real-time non-interleaving causality based setting, a timed extension of bundle event structures that allows for urgent interactions to model timeout. The syntactic action refinement operation is presented in a timed process algebra as incorporated in the internationally standardised specification language LOTOS. We show that the behaviour of the refined system can be inferred compositionally from the behaviour of the original system and from the behaviour of the processes substituted for actions with explicitly represented start points, that the timed versions of a linear-time equivalence, termed pomset trace equivalence, and a branching-time equivalence, termed history preserving bisimulation equivalence, are both congruences under the refinement, and that the syntactic and semantic action refinements developed coincide under these equivalence relations with respect to a metric denotational semantics. Therefore, our refinement operations behave well. They meet the commonly expected properties.Received: 9 January 2003  相似文献   

5.
实时矩形交通限速标志识别系统   总被引:1,自引:0,他引:1  
交通限速标志识别系统是汽车辅助驾驶系统的一项重要组成部分,本文提出了一种实时矩形交通标志识别系统.首先采用多尺度多区域的局部二值模式(LBP)特征训练Adaboost分类器进行交通限速标志的检测,然后利用线性预测的算法进行标志跟踪.识别预处理阶段,首先采用投影分析的方法对交通标志进行旋转校正,然后采用基于积分图的自适应二值化方法将图像进行二值化,再利用连通区域标记方法得到包含限速标志数字的最小矩形区域.识别时首先采用主元分析(PCA)进行特征向量提取,然后用聚类的方法构建二叉树的线性支持向量机进行分类识别.在普通笔记本电脑系统配置下,通过大量的实际道路场景的视频数据测试,系统取得了98.3%的正确识别率,平均处理速度达16帧/s.  相似文献   

6.
目的 在人体行为识别算法的研究领域,通过视频特征实现零样本识别的研究越来越多。但是,目前大部分研究是基于单模态数据展开的,关于多模态融合的研究还较少。为了研究多种模态数据对零样本人体动作识别的影响,本文提出了一种基于多模态融合的零样本人体动作识别(zero-shot human action recognition framework based on multimodel fusion, ZSAR-MF)框架。方法 本文框架主要由传感器特征提取模块、分类模块和视频特征提取模块组成。具体来说,传感器特征提取模块使用卷积神经网络(convolutional neural network, CNN)提取心率和加速度特征;分类模块利用所有概念(传感器特征、动作和对象名称)的词向量生成动作类别分类器;视频特征提取模块将每个动作的属性、对象分数和传感器特征映射到属性—特征空间中,最后使用分类模块生成的分类器对每个动作的属性和传感器特征进行评估。结果 本文实验在Stanford-ECM数据集上展开,对比结果表明本文ZSAR-MF模型比基于单模态数据的零样本识别模型在识别准确率上提高了4 %左右。结论 本文所提出的基于多模态融合的零样本人体动作识别框架,有效地融合了传感器特征和视频特征,并显著提高了零样本人体动作识别的准确率。  相似文献   

7.
Multimedia Tools and Applications - Facial emotion is a significant way of understanding or interpreting one’s inner thoughts. Real time video at any instant exhibits the emotion which serves...  相似文献   

8.
We present a neural network based system for the visual recognition of human hand pointing gestures from stereo pairs of video camera images. The accuracy of the current system allows to estimate the pointing target to an accuracy of 2 cm in a workspace area of 50×50 cm. The system consists of several neural networks that perform the tasks of image segmentation, estimation of hand location, estimation of 3D-pointing direction and necessary coordinate transforms. Drawing heavily on the use of learning algorithms, the functions of all network modules were created from data examples only.  相似文献   

9.
This paper presents a novel method that leverages reasoning capabilities in a computer vision system dedicated to human action recognition. The proposed methodology is decomposed into two stages. First, a machine learning based algorithm – known as bag of words – gives a first estimate of action classification from video sequences, by performing an image feature analysis. Those results are afterward passed to a common-sense reasoning system, which analyses, selects and corrects the initial estimation yielded by the machine learning algorithm. This second stage resorts to the knowledge implicit in the rationality that motivates human behaviour. Experiments are performed in realistic conditions, where poor recognition rates by the machine learning techniques are significantly improved by the second stage in which common-sense knowledge and reasoning capabilities have been leveraged. This demonstrates the value of integrating common-sense capabilities into a computer vision pipeline.  相似文献   

10.
Silhouette-based human action recognition using SAX-Shapes   总被引:1,自引:0,他引:1  
Human action recognition is an important problem in Computer Vision. Although most of the existing solutions provide good accuracy results, the methods are often overly complex and computationally expensive, hindering practical applications. In this regard, we introduce the combination of time-series representation for the silhouette and Symbolic Aggregate approXimation (SAX), which we refer to as SAX-Shapes, to address the problem of human action recognition. Given an action sequence, the extracted silhouettes of an actor from every frame are transformed into time series. Each of these time series is then efficiently converted into the symbolic vector: SAX. The set of all these SAX vectors (SAX-Shape) represents the action. We propose a rotation invariant distance function to be used by a random forest algorithm to perform the human action recognition. Requiring only silhouettes of actors, the proposed method is validated on two public datasets. It has an accuracy comparable to the related works and it performs well even in varying rotation.  相似文献   

11.
Ren  Ziliang  Zhang  Qieshi  Gao  Xiangyang  Hao  Pengyi  Cheng  Jun 《Multimedia Tools and Applications》2021,80(11):16185-16203
Multimedia Tools and Applications - The multi-modality based human action recognition is an increasing topic. Multi-modality can provide more abundant and complementary information than single...  相似文献   

12.
Yi  Yun  Wang  Hanli  Zhang  Bowen 《Multimedia Tools and Applications》2017,76(18):18891-18913
Multimedia Tools and Applications - Human action recognition in realistic videos is an important and challenging task. Recent studies demonstrate that multi-feature fusion can significantly improve...  相似文献   

13.
设计并实现了一种嵌入式实时音乐语音识别系统.叙述了音乐语音识别系统硬件结构、软件流程,建立了一种基于多频段能量曲线分割结合过零率来检测端点的新方法,实验结果表明,该系统对特定人的平均识别率在96%以上.  相似文献   

14.
针对HOG特征在人体行为识别中仅仅表征人体局部梯度特征的不足,提出了一种扩展HOG(ExHOG)特征与CLBP特征相融合的人体行为识别方法。用背景差分法从视频中提取出完整的人体运动序列,并提取出扩展梯度方向直方图ExHOG及完备局部二值模式CLBP两种互补特征;利用K-L变换将这两种互补特征融合生成一个分类能力更强的行为特征;采用径向基函数神经网络RBFNN对行为特征进行识别分类。在KTH和Weizman行为公共数据库上进行了多组实验,结果表明提出的方法能够有效地识别人体运动类别。  相似文献   

15.
16.
基于骨骼信息的人体行为识别旨在从输入的包含一个或多个行为的骨骼序列中,正确地分析出行为的种类,是计算机视觉领域的研究热点之一。与基于图像的人体行为识别方法相比,基于骨骼信息的人体行为识别方法不受背景、人体外观等干扰因素的影响,具有更高的准确性、鲁棒性和计算效率。针对基于骨骼信息的人体行为识别方法的重要性和前沿性,对其进行全面和系统的总结分析具有十分重要的意义。本文首先回顾了9个广泛应用的骨骼行为识别数据集,按照数据收集视角的差异将它们分为单视角数据集和多视角数据集,并着重探讨了不同数据集的特点和用法。其次,根据算法所使用的基础网络,将基于骨骼信息的行为识别方法分为基于手工制作特征的方法、基于循环神经网络的方法、基于卷积神经网络的方法、基于图卷积网络的方法以及基于Transformer的方法,重点阐述分析了这些方法的原理及优缺点。其中,图卷积方法因其强大的空间关系捕捉能力而成为目前应用最为广泛的方法。采用了全新的归纳方法,对图卷积方法进行了全面综述,旨在为研究人员提供更多的思路和方法。最后,从8个方面总结现有方法存在的问题,并针对性地提出工作展望。  相似文献   

17.
A survey on vision-based human action recognition   总被引:10,自引:0,他引:10  
Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion performance, recording settings and inter-personal differences. In this survey, we explicitly address these challenges. We provide a detailed overview of current advances in the field. Image representations and the subsequent classification process are discussed separately to focus on the novelties of recent research. Moreover, we discuss limitations of the state of the art and outline promising directions of research.  相似文献   

18.
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience suggest that the temporal slowness principle is a general learning principle in visual perception. In this paper, we introduce the SFA framework to the problem of human action recognition by incorporating the discriminative information with SFA learning and considering the spatial relationship of body parts. In particular, we consider four kinds of SFA learning strategies, including the original unsupervised SFA (U-SFA), the supervised SFA (S-SFA), the discriminative SFA (D-SFA), and the spatial discriminative SFA (SD-SFA), to extract slow feature functions from a large amount of training cuboids which are obtained by random sampling in motion boundaries. Afterward, to represent action sequences, the squared first order temporal derivatives are accumulated over all transformed cuboids into one feature vector, which is termed the Accumulated Squared Derivative (ASD) feature. The ASD feature encodes the statistical distribution of slow features in an action sequence. Finally, a linear support vector machine (SVM) is trained to classify actions represented by ASD features. We conduct extensive experiments, including two sets of control experiments, two sets of large scale experiments on the KTH and Weizmann databases, and two sets of experiments on the CASIA and UT-interaction databases, to demonstrate the effectiveness of SFA for human action recognition. Experimental results suggest that the SFA-based approach (1) is able to extract useful motion patterns and improves the recognition performance, (2) requires less intermediate processing steps but achieves comparable or even better performance, and (3) has good potential to recognize complex multiperson activities.  相似文献   

19.
Ongoing human action recognition is a challenging problem that has many applications, such as video surveillance, patient monitoring, human–computer interaction, etc. This paper presents a novel framework for recognizing streamed actions using Motion Capture (MoCap) data. Unlike the after-the-fact classification of completed activities, this work aims at achieving early recognition of ongoing activities. The proposed method is time efficient as it is based on histograms of action poses, extracted from MoCap data, that are computed according to Hausdorff distance. The histograms are then compared with the Bhattacharyya distance and warped by a dynamic time warping process to achieve their optimal alignment. This process, implemented by our dynamic programming-based solution, has the advantage of allowing some stretching flexibility to accommodate for possible action length changes. We have shown the success and effectiveness of our solution by testing it on large datasets and comparing it with several state-of-the-art methods. In particular, we were able to achieve excellent recognition rates that have outperformed many well known methods.  相似文献   

20.
Sign and gesture recognition offers a natural way for human–computer interaction. This paper presents a real time sign recognition architecture including both gesture and movement recognition. Among the different technologies available for sign recognition data gloves and accelerometers were chosen for the purposes of this research. Due to the real time nature of the problem, the proposed approach works in two different tiers, the segmentation tier and the classification tier. In the first stage the glove and accelerometer signals are processed for segmentation purposes, separating the different signs performed by the system user. In the second stage the values received from the segmentation tier are classified. In an effort to emphasize the real use of the architecture, this approach deals specially with problems like sensor noise and simplification of the training phase.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号