首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Gesture plays an important role for recognizing lecture activities in video content analysis. In this paper, we propose a real-time gesture detection algorithm by integrating cues from visual, speech and electronic slides. In contrast to the conventional “complete gesture” recognition, we emphasize detection by the prediction from “incomplete gesture”. Specifically, intentional gestures are predicted by the modified hidden Markov model (HMM) which can recognize incomplete gestures before the whole gesture paths are observed. The multimodal correspondence between speech and gesture is exploited to increase the accuracy and responsiveness of gesture detection. In lecture presentation, this algorithm enables the on-the-fly editing of lecture slides by simulating appropriate camera motion to highlight the intention and flow of lecturing. We develop a real-time application, namely simulated smartboard, and demonstrate the feasibility of our prediction algorithm using hand gesture and laser pen with simple setup without involving expensive hardware.   相似文献   

2.
The role of gesture recognition is significant in areas like human‐computer interaction, sign language, virtual reality, machine vision, etc. Among various gestures of the human body, hand gestures play a major role to communicate nonverbally with the computer. As the hand gesture is a continuous pattern with respect to time, the hidden Markov model (HMM) is found to be the most suitable pattern recognition tool, which can be modeled using the hand gesture parameters. The HMM considers the speeded up robust feature features of hand gesture and uses them to train and test the system. Conventionally, the Viterbi algorithm has been used for training process in HMM by discovering the shortest decoded path in the state diagram. The recursiveness of the Viterbi algorithm leads to computational complexity during the execution process. In order to reduce the complexity, the state sequence analysis approach is proposed for training the hand gesture model, which provides a better recognition rate and accuracy than that of the Viterbi algorithm. The performance of the proposed approach is explored in the context of pattern recognition with the Cambridge hand gesture data set.  相似文献   

3.
Assistance is currently a pivotal research area in robotics, with huge societal potential. Since assistant robots directly interact with people, finding natural and easy-to-use user interfaces is of fundamental importance. This paper describes a flexible multimodal interface based on speech and gesture modalities in order to control our mobile robot named Jido. The vision system uses a stereo head mounted on a pan-tilt unit and a bank of collaborative particle filters devoted to the upper human body extremities to track and recognize pointing/symbolic mono but also bi-manual gestures. Such framework constitutes our first contribution, as it is shown, to give proper handling of natural artifacts (self-occlusion, camera out of view field, hand deformation) when performing 3D gestures using one or the other hand even both. A speech recognition and understanding system based on the Julius engine is also developed and embedded in order to process deictic and anaphoric utterances. The second contribution deals with a probabilistic and multi-hypothesis interpreter framework to fuse results from speech and gesture components. Such interpreter is shown to improve the classification rates of multimodal commands compared to using either modality alone. Finally, we report on successful live experiments in human-centered settings. Results are reported in the context of an interactive manipulation task, where users specify local motion commands to Jido and perform safe object exchanges.  相似文献   

4.
This paper presents a practical real time system for mapping dynamic glove-based hand gestures into Arabic speech. Arabic Glove-Talk (AGT) is a prototype for an intelligent system implemented to solve the problem of communication between the vocally impaired and other people. Various reasons increase the difficulty of dynamic gesture recognition. Neuro-fuzzy approaches are described to overcome this difficulty. The difficult task of gesture spotting is solved using a distance-based measure. We use the 5th Glove device to capture hand gestures. The system learns to recognise a basic vocabulary of 32 gestures. The basic vocabulary is extended to 128 gestures is tested on a test set, including 640 gestures using different types of classifiers to assign an unknown gesture to the corresponding spoken Arabic word. The minimum distance classifier, the neuro-fuzzy perceptron and the 1D-self-organising feature map based classifier result in 96.25%, 97.82% and 100% correct spoken words, respectively. After training, talkers successfully produced Arabic speech at nearly 75–90 words per minute.  相似文献   

5.
6.
7.
An accurate estimation of sentence units (SUs) in spontaneous speech is important for (1) helping listeners to better understand speech content and (2) supporting other natural language processing tasks that require sentence information. There has been much research on automatic SU detection; however, most previous studies have only used lexical and prosodic cues, but have not used nonverbal cues, e.g., gesture. Gestures play an important role in human conversations, including providing semantic content, expressing emotional status, and regulating conversational structure. Given the close relationship between gestures and speech, gestures may provide additional contributions to automatic SU detection. In this paper, we have investigated the use of gesture cues for enhancing the SU detection. Particularly, we have focused on: (1) collecting multimodal data resources involving gestures and SU events in human conversations, (2) analyzing the collected data sets to enrich our knowledge about co-occurrence of gestures and SUs, and (3) building statistical models for detecting SUs using speech and gestural cues. Our data analyses suggest that some gesture patterns influence a word boundary’s probability of being an SU. On the basis of the data analyses, a set of novel gestural features were proposed for SU detection. A combination of speech and gestural features was found to provide more accurate SU predictions than using only speech features in discriminative models. Findings in this paper support the view that human conversations are processes involving multimodal cues, and so they are more effectively modeled using information from both verbal and nonverbal channels.  相似文献   

8.
This paper presents an experimental study on an agent system with multimodal interfaces for a smart office environment. The agent system is based upon multimodal interfaces such as recognition modules for both speech and pen-mouse gesture, and identification modules for both face and fingerprint. For essential modules, speech recognition and synthesis were basically used for a virtual interaction between user and system. In this study, a real-time speech recognizer based on a Hidden Markov Network (HM-Net) was incorporated into the proposed system. In addition, identification techniques based on both face and fingerprint were adopted to provide a specific user with the service of a user-customized interaction with security in an office environment. In evaluation, results showed that the proposed system was easy to use and would prove useful in a smart office environment, even though the performance of the speech recognizer was not satisfactory mainly due to noisy environments.  相似文献   

9.
10.
为了得到较好的语音识别效果,构建了基于线性核函数支持向量机的非特定人孤立词语音识别系统,取得了较高的识别率,并将该实验结果同基于HMM的识别结果进行了比较,显示出了支持向量机在基于有限样本情况下进行语音识别的优势。  相似文献   

11.
In this paper, we present our work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the recognition of a person's head orientation. Each of the components is described in the paper and experimental results are presented. We also present several experiments on multimodal human-robot interaction, such as interaction using speech and gestures, the automatic determination of the addressee during human-human-robot interaction, as well on interactive learning of dialogue strategies. The work and the components presented here constitute the core building blocks for audiovisual perception of humans and multimodal human-robot interaction used for the humanoid robot developed within the German research project (Sonderforschungsbereich) on humanoid cooperative robots.  相似文献   

12.
针对在复杂背景中传统手势识别算法的识别率低问题,利用Kinect的深度摄像头获取深度图像,分割出手势区域后进行预处理;提取手势的几何特征,并提出深度信息的同心圆分布直方图特征,融合手势的几何特征和深度信息的同心圆分布直方图特征;学习训练随机森林分类器进行手势识别.文中通过在复杂背景条件下对常见的“石头”、“剪刀”、“布”3种手势进行测试,实验结果表明:文中所提方法具有很好的平移,旋转和缩放不变性,能适应复杂环境的变化.  相似文献   

13.
随着虚拟现实技术的飞速发展,人们迫切需要一种自然友好的字符输入方式,于是越来越多的研究人员投入到动态手势的研发当中。本文基于隐马尔可夫模型(HMM)搭建了一套动态手势识别系统。这套系统通过Leap Motion采集动态手势数据,并能够识别36个字母和数字的手势(数字0-9和字母A-Z)。经过大量实验表明,该系统有着很强的鲁棒性,识别单独手势的识别率能够达到93.2%。  相似文献   

14.
Speech/gesture interface to a visual-computing environment   总被引:3,自引:0,他引:3  
We developed a speech/gesture interface that uses visual hand-gesture analysis and speech recognition to control a 3D display in VMD, a virtual environment for structural biology. The reason we used a particular virtual environment context was to set the necessary constraints to make our analysis robust and to develop a command language that optimally combines speech and gesture inputs. Our interface uses: automatic speech recognition (ASR), aided by a microphone, to recognize voice commands; two strategically positioned cameras to detect hand gestures; and automatic gesture recognition (AGR), a set of computer vision techniques to interpret those hand gestures. The computer vision algorithms can extract the user's hand from the background, detect different finger positions, and distinguish meaningful gestures from unintentional hand movements. Our main goal was to simplify model manipulation and rendering to make biomolecular modeling more playful. Researchers can explore variations of their model and concentrate on biomolecular aspects of their task without undue distraction by computational aspects. They can view simulations of molecular dynamics, play with different combinations of molecular structures, and better understand the molecules' important properties. A potential benefit, for example, might be reducing the time to discover new compounds for new drugs  相似文献   

15.
作为人机交互的重要方式,手势交互和识别由于其具有的高自由度而成为计算机图形学、虚拟现实与人机交互等领域的研究热点.传统直接提取手势轮廓或手部关节点位置信息的手势识别方法,其提取的特征通常难以准确表示手势之间的区别.针对手势识别中不同手势具有的高自由度以及由于手势图像分辨率低、背景杂乱、手被遮挡、手指形状尺寸不同、个体差异性导致手势特征表示不准确等问题,本文提出了一种新的融合关节旋转特征和指尖距离特征的手势特征表示与手势识别方法.首先从手势深度图中利用手部模板并将手部看成链段结构提取手部20个关节点的3D位置信息;然后利用手部关节点位置信息提取四元数关节旋转特征和指尖距离特征,该表示构成了手势特征的内在表示;最后利用一对一支持向量机对手势进行有效识别分类.本文不仅提出了一种新的手势特征表示与提取方法,该表示融合了关节旋转信息和指尖距离特征;而且从理论上证明了该特征表示能唯一地表征手势关节点的位置信息;同时提出了基于一对一SVM多分类策略进行手势分类与识别.对ASTAR静态手势深度图数据集中8类中国数字手势和21类美国字母手势数据集分别进行了实验验证,其分类识别准确率分别为99.71%和85.24%.实验结果表明,本文提出的基于关节旋转特征和指尖距离特征的融合特征能很好地表示不同手势的几何特征,能准确地表征静态手势并进行手势识别.  相似文献   

16.
We propose a new two-stage framework for joint analysis of head gesture and speech prosody patterns of a speaker towards automatic realistic synthesis of head gestures from speech prosody. In the first stage analysis, we perform Hidden Markov Model (HMM) based unsupervised temporal segmentation of head gesture and speech prosody features separately to determine elementary head gesture and speech prosody patterns, respectively, for a particular speaker. In the second stage, joint analysis of correlations between these elementary head gesture and prosody patterns is performed using Multi-Stream HMMs to determine an audio-visual mapping model. The resulting audio-visual mapping model is then employed to synthesize natural head gestures from arbitrary input test speech given a head model for the speaker. In the synthesis stage, the audio-visual mapping model is used to predict a sequence of gesture patterns from the prosody pattern sequence computed for the input test speech. The Euler angles associated with each gesture pattern are then applied to animate the speaker head model. Objective and subjective evaluations indicate that the proposed synthesis by analysis scheme provides natural looking head gestures for the speaker with any input test speech, as well as in "prosody transplant" and gesture transplant" scenarios.  相似文献   

17.
伴随虚拟现实(Virtual Reality,VR)技术的发展,以及人们对人机交互性能和体验感的要求提高,手势识别作为影响虚拟现实中交互操作的重要技术之一,其精确度急需提升[1].针对当前手势识别方法在一些动作类似的手势识别中表现欠佳的问题,提出了一种多特征动态手势识别方法.该方法首先使用体感控制器Leap Motion追踪动态手势获取数据,然后在特征提取过程中增加对位移向量角度和拐点判定计数的提取,接着进行动态手势隐马尔科夫模型(Hidden Markov Model,HMM)的训练,最后根据待测手势与模型的匹配率进行识别.从实验结果中得出,该多特征识别方法能够提升相似手势的识别率.  相似文献   

18.
《Artificial Intelligence》2007,171(8-9):568-585
Head pose and gesture offer several conversational grounding cues and are used extensively in face-to-face interaction among people. To accurately recognize visual feedback, humans often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper we describe how contextual information can be used to predict visual feedback and improve recognition of head gestures in human–computer interfaces. Lexical, prosodic, timing, and gesture features can be used to predict a user's visual feedback during conversational dialog with a robotic or virtual agent. In non-conversational interfaces, context features based on user–interface system events can improve detection of head gestures for dialog box confirmation or document browsing. Our user study with prototype gesture-based components indicate quantitative and qualitative benefits of gesture-based confirmation over conventional alternatives. Using a discriminative approach to contextual prediction and multi-modal integration, performance of head gesture detection was improved with context features even when the topic of the test set was significantly different than the training set.  相似文献   

19.
针对多点触控手势间接指令问题,提出了基于多点触控的沙画手势识别系统,该识别系统由时间、空间、形状信息控制。提出一种手势图形建模方法,测量手势的笔划之间的空间和时间关系。采用聚类算法标记手势图形中笔划的形状信息作为局部形状特征;利用基准方法HBF49特征提取全局形状特征。通过一组有10种不同多点触控的沙画手势的数据集评估基于多点触控的沙画手势识别系统,使用图嵌入方法和SVM分类进行手势识别,识别的准确率达到94.75%。实验结果证明,此研究对完成基于多点触控的沙画虚拟系统有重要作用。  相似文献   

20.
In this work, we consider the recognition of dynamic gestures based on representative sub-segments of a gesture, which are denoted as most discriminating segments (MDSs). The automatic extraction and recognition of such small representative segments, rather than extracting and recognizing the full gestures themselves, allows for a more discriminative classifier. A MDS is a sub-segment of a gesture that is most dissimilar to all other gesture sub-segments. Gestures are classified using a MDSLCS algorithm, which recognizes the MDSs using a modified longest common subsequence (LCS) measure. The extraction of MDSs from a data stream uses adaptive window parameters, which are driven by the successive results of multiple calls to the LCS classifier. In a preprocessing stage, gestures that have large motion variations are replaced by several forms of lesser variation. We learn these forms by adaptive clustering of a training set of gestures, where we reemploy the LCS to determine similarity between gesture trajectories. The MDSLCS classifier achieved a gesture recognition rate of 92.6% when tested using a set of pre-cut free hand digit (0–9) gestures, while hidden Markov models (HMMs) achieved an accuracy of 89.5%. When the MDSLCS was tested against a set of streamed digit gestures, an accuracy of 89.6% was obtained. At present the HMMs method is considered the state-of-the-art method for classifying motion trajectories. The MDSLCS algorithm had a higher accuracy rate for pre-cut gestures, and is also more suitable for streamed gestures. MDSLCS provides a significant advantage over HMMs by not requiring data re-sampling during run-time and performing well with small training sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号