共查询到20条相似文献,搜索用时 15 毫秒
1.
Predefined sequences of eye movements, or ‘gaze gestures’, can be consciously performed by humans and monitored non-invasively using remote video oculography. Gaze gestures hold great potential in human–computer interaction, HCI, as long as they can be easily assimilated by potential users, monitored using low cost gaze tracking equipment and machine learning algorithms are able to distinguish the spatio-temporal structure of intentional gaze gestures from typical gaze activity performed during standard HCI. In this work, an evaluation of the performance of a bioinspired Bayesian pattern recognition algorithm known as Hierarchical Temporal Memory (HTM) on the real time recognition of gaze gestures is carried out through a user study. To improve the performance of traditional HTM during real time recognition, an extension of the algorithm is proposed in order to adapt HTM to the temporal structure of gaze gestures. The extension consists of an additional top node in the HTM topology that stores and compares sequences of input data by sequence alignment using dynamic programming. The spatio-temporal codification of a gesture in a sequence serves the purpose of handling the temporal evolution of gaze gestures instances. The extended HTM allows for reliable discrimination of intentional gaze gestures from otherwise standard human–machine gaze interaction reaching up to 98% recognition accuracy for a data set of 10 categories of gaze gestures, acceptable completion speeds and a low rate of false positives during standard gaze–computer interaction. These positive results despite the low cost hardware employed supports the notion of using gaze gestures as a new HCI paradigm for the fields of accessibility and interaction with smartphones, tablets, projected displays and traditional desktop computers. 相似文献
2.
Mike Wald John-Mark Bell Philip Boulain Karl Doody Jim Gerrard 《International Journal of Speech Technology》2007,10(1):1-15
Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class
and a substitute learning experience for students unable to attend. Deaf and hard of hearing people can find it difficult
to follow speech through hearing alone or to take notes while they are lip-reading or watching a sign-language interpreter.
Synchronising the speech with text captions can ensure deaf students are not disadvantaged and assist all learners to search
for relevant specific parts of the multimedia recording by means of the synchronised text. Automatic speech recognition has
been used to provide real-time captioning directly from lecturers’ speech in classrooms but it has proved difficult to obtain
accuracy comparable to stenography. This paper describes the development, testing and evaluation of a system that enables
editors to correct errors in the captions as they are created by automatic speech recognition and makes suggestions for future
possible improvements. 相似文献
3.
Bian W Tao D Rui Y 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2012,42(2):298-307
Conventional human action recognition algorithms cannot work well when the amount of training videos is insufficient. We solve this problem by proposing a transfer topic model (TTM), which utilizes information extracted from videos in the auxiliary domain to assist recognition tasks in the target domain. The TTM is well characterized by two aspects: 1) it uses the bag-of-words model trained from the auxiliary domain to represent videos in the target domain; and 2) it assumes each human action is a mixture of a set of topics and uses the topics learned from the auxiliary domain to regularize the topic estimation in the target domain, wherein the regularization is the summation of Kullback-Leibler divergences between topic pairs of the two domains. The utilization of the auxiliary domain knowledge improves the generalization ability of the learned topic model. Experiments on Weizmann and KTH human action databases suggest the effectiveness of the proposed TTM for cross-domain human action recognition. 相似文献
4.
Towards action refinement for true concurrent real time 总被引:2,自引:0,他引:2
Action refinement is an essential operation in the design of concurrent systems, real-time or not. In this paper we develop a theory of action refinement in a real-time non-interleaving causality based setting, a timed extension of bundle event structures that allows for urgent interactions to model timeout. The syntactic action refinement operation is presented in a timed process algebra as incorporated in the internationally standardised specification language LOTOS. We show that the behaviour of the refined system can be inferred compositionally from the behaviour of the original system and from the behaviour of the processes substituted for actions with explicitly represented start points, that the timed versions of a linear-time equivalence, termed pomset trace equivalence, and a branching-time equivalence, termed history preserving bisimulation equivalence, are both congruences under the refinement, and that the syntactic and semantic action refinements developed coincide under these equivalence relations with respect to a metric denotational semantics. Therefore, our refinement operations behave well. They meet the commonly expected properties.Received: 9 January 2003 相似文献
5.
Multimedia Tools and Applications - Facial emotion is a significant way of understanding or interpreting one’s inner thoughts. Real time video at any instant exhibits the emotion which serves... 相似文献
6.
We present a neural network based system for the visual recognition of human hand pointing gestures from stereo pairs of video camera images. The accuracy of the current system allows to estimate the pointing target to an accuracy of 2 cm in a workspace area of 50×50 cm. The system consists of several neural networks that perform the tasks of image segmentation, estimation of hand location, estimation of 3D-pointing direction and necessary coordinate transforms. Drawing heavily on the use of learning algorithms, the functions of all network modules were created from data examples only. 相似文献
7.
Silhouette-based human action recognition using SAX-Shapes 总被引:1,自引:0,他引:1
Human action recognition is an important problem in Computer Vision. Although most of the existing solutions provide good accuracy results, the methods are often overly complex and computationally expensive, hindering practical applications. In this regard, we introduce the combination of time-series representation for the silhouette and Symbolic Aggregate approXimation (SAX), which we refer to as SAX-Shapes, to address the problem of human action recognition. Given an action sequence, the extracted silhouettes of an actor from every frame are transformed into time series. Each of these time series is then efficiently converted into the symbolic vector: SAX. The set of all these SAX vectors (SAX-Shape) represents the action. We propose a rotation invariant distance function to be used by a random forest algorithm to perform the human action recognition. Requiring only silhouettes of actors, the proposed method is validated on two public datasets. It has an accuracy comparable to the related works and it performs well even in varying rotation. 相似文献
8.
Jesús Martínez del Rincón Maria J. Santofimia Jean-Christophe Nebel 《Pattern recognition letters》2013
This paper presents a novel method that leverages reasoning capabilities in a computer vision system dedicated to human action recognition. The proposed methodology is decomposed into two stages. First, a machine learning based algorithm – known as bag of words – gives a first estimate of action classification from video sequences, by performing an image feature analysis. Those results are afterward passed to a common-sense reasoning system, which analyses, selects and corrects the initial estimation yielded by the machine learning algorithm. This second stage resorts to the knowledge implicit in the rationality that motivates human behaviour. Experiments are performed in realistic conditions, where poor recognition rates by the machine learning techniques are significantly improved by the second stage in which common-sense knowledge and reasoning capabilities have been leveraged. This demonstrates the value of integrating common-sense capabilities into a computer vision pipeline. 相似文献
9.
Ren Ziliang Zhang Qieshi Gao Xiangyang Hao Pengyi Cheng Jun 《Multimedia Tools and Applications》2021,80(11):16185-16203
Multimedia Tools and Applications - The multi-modality based human action recognition is an increasing topic. Multi-modality can provide more abundant and complementary information than single... 相似文献
10.
11.
Multimedia Tools and Applications - Human action recognition in realistic videos is an important and challenging task. Recent studies demonstrate that multi-feature fusion can significantly improve... 相似文献
12.
13.
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience suggest that the temporal slowness principle is a general learning principle in visual perception. In this paper, we introduce the SFA framework to the problem of human action recognition by incorporating the discriminative information with SFA learning and considering the spatial relationship of body parts. In particular, we consider four kinds of SFA learning strategies, including the original unsupervised SFA (U-SFA), the supervised SFA (S-SFA), the discriminative SFA (D-SFA), and the spatial discriminative SFA (SD-SFA), to extract slow feature functions from a large amount of training cuboids which are obtained by random sampling in motion boundaries. Afterward, to represent action sequences, the squared first order temporal derivatives are accumulated over all transformed cuboids into one feature vector, which is termed the Accumulated Squared Derivative (ASD) feature. The ASD feature encodes the statistical distribution of slow features in an action sequence. Finally, a linear support vector machine (SVM) is trained to classify actions represented by ASD features. We conduct extensive experiments, including two sets of control experiments, two sets of large scale experiments on the KTH and Weizmann databases, and two sets of experiments on the CASIA and UT-interaction databases, to demonstrate the effectiveness of SFA for human action recognition. Experimental results suggest that the SFA-based approach (1) is able to extract useful motion patterns and improves the recognition performance, (2) requires less intermediate processing steps but achieves comparable or even better performance, and (3) has good potential to recognize complex multiperson activities. 相似文献
14.
A survey on vision-based human action recognition 总被引:10,自引:0,他引:10
Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion performance, recording settings and inter-personal differences. In this survey, we explicitly address these challenges. We provide a detailed overview of current advances in the field. Image representations and the subsequent classification process are discussed separately to focus on the novelties of recent research. Moreover, we discuss limitations of the state of the art and outline promising directions of research. 相似文献
15.
A. Ibarguren I. Maurtua B. Sierra 《Engineering Applications of Artificial Intelligence》2010,23(7):1216-1228
Sign and gesture recognition offers a natural way for human–computer interaction. This paper presents a real time sign recognition architecture including both gesture and movement recognition. Among the different technologies available for sign recognition data gloves and accelerometers were chosen for the purposes of this research. Due to the real time nature of the problem, the proposed approach works in two different tiers, the segmentation tier and the classification tier. In the first stage the glove and accelerometer signals are processed for segmentation purposes, separating the different signs performed by the system user. In the second stage the values received from the segmentation tier are classified. In an effort to emphasize the real use of the architecture, this approach deals specially with problems like sensor noise and simplification of the training phase. 相似文献
16.
《Data Processing》1984,26(10):25-26
Real-time mainframe accounting systems, unlike batch systems, allow immediate access to financial information. The article discusses the benefits of real-time over batch accounting and highlights user friendliness, security and increased performance. 相似文献
17.
18.
Human action recognition is a challenging task due to significant intra-class variations, occlusion, and background clutter. Most of the existing work use the action models based on statistic learning algorithms for classification. To achieve good performance on recognition, a large amount of the labeled samples are therefore required to train the sophisticated action models. However, collecting labeled samples is labor-intensive. To tackle this problem, we propose a boosted multi-class semi-supervised learning algorithm in which the co-EM algorithm is adopted to leverage the information from unlabeled data. Three key issues are addressed in this paper. Firstly, we formulate the action recognition in a multi-class semi-supervised learning problem to deal with the insufficient labeled data and high computational expense. Secondly, boosted co-EM is employed for the semi-supervised model construction. To overcome the high dimensional feature space, weighted multiple discriminant analysis (WMDA) is used to project the features into low dimensional subspaces in which the Gaussian mixture models (GMM) are trained and boosting scheme is used to integrate the subspace models. Thirdly, we present the upper bound of the training error in multi-class framework, which is able to guide the novel classifier construction. In theory, the proposed solution is proved to minimize this upper error bound. Experimental results have shown good performance on public datasets. 相似文献
19.
Human action recognition in video is important in many computer vision applications such as automated surveillance. Human actions can be compactly encoded using a sparse set of local spatio-temporal salient features at different scales. The existing bottom-up methods construct a single dictionary of action primitives from the joint features of all scales and hence, a single action representation. This representation cannot fully exploit the complementary characteristics of the motions across different scales. To address this problem, we introduce the concept of learning multiple dictionaries of action primitives at different resolutions and consequently, multiple scale-specific representations for a given video sample. Using a decoupled fusion of multiple representations, we improved the human classification accuracy of realistic benchmark databases by about 5%, compared with the state-of-the art methods. 相似文献