首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Two requirements should be met in order to develop a practical multimodal interface system , i . e ., ( 1 ) integration of delayed arrival of data and ( 2 ) elimination of ambiguity in recognition results of each modality . This paper presents an efficient and generic methodology for interpretation of multimodal input to satisfy these requirements . The proposed methodology can integrate delayed - arrival data satisfactorily and efficiently interpret multimodal input that contains ambiguity . In the input interpretation the multimodal interpretation process is regarded as hypothetical reasoning , and the control mechanismof interpretation is formalized by applying the assumption - based truth maintenance system ( ATMS ). The proposed method is applied to an interface agent system that accepts multimodal input consisting of voice and direct indication gesture on a touch display . The systemcommunicates to the user through a human - like interface agent's three - dimensional motion image with facial expressions , gestures , and a synthesized voice .  相似文献   

2.

Most of today's virtual environments are populated with some kind of autonomous life - like agents . Such agents follow a preprogrammed sequence of behaviors that excludes the user as a participating entity in the virtual society . In order to make inhabited virtual reality an attractive place for information exchange and social interaction , we need to equip the autonomous agents with some perception and interpretation skills . In this paper we present one skill: human action recognition . By opposition to human - computer interfaces that focus on speech or hand gestures , we propose a full - body integration of the user . We present a model of human actions along with a real - time recognition system . To cover the bilateral aspect in human - computer interfaces , we also discuss some action response issues . In particular , we describe a motion management library that solves animation continuity and mixing problems . Finally , we illustrate our systemwith two examples and discuss what we have learned .  相似文献   

3.
目的 为解决疲劳驾驶检测中人眼状态识别的难点,提出一种基于眼白分割的疲劳检测方法。方法 首先对获取图像进行人脸检测,利用眼白在Cb-Cr上良好的聚类性,基于YCbCr颜色空间建立高斯眼白分割模型;然后在人脸区域图像内做眼白分割,计算眼白面积;最后将眼白面积作为人眼开度指标,结合PERCLOS(percentage of eyelid closure over the pupil over time)判定人的疲劳状态。结果 选取10个短视频进行采帧分析,实验结果表明,高斯眼白分割模型能有效分离眼白,并识别人眼开合状态,准确率可达96.77%。结论 在良好光线条件下,本文方法能取得不错的分割效果;本文所提出的以眼白面积作为判定人眼开度的指标,能准确地判定人的疲劳状态。实验结果证明了该方法的有效性,值得今后做更深入的研究。  相似文献   

4.
Abstract

Acquisition of a user's computer based problem-solving skill is an important research area in human-computer interaction. This type of information usually can be derived by careful inspection on the actual dialog behavior. To implement such a knowledge reasoning agent, several crucial issues are pointed out and carefully inspected. These include: (a) appropirtate knowledge representation schema that are able to demonstrate the causal relationship between pairs of dialog events; (b) the formulation of a valid formula in calculating the overall knowledge index from the background information; (c) determination of the minimum sufficient number of dialog events required to form a discernible pattern; and (d) generalization of categories of performance patterns that can be applied to all types of application domains. A prototype reasoning agent based on the proposed methodology is constructed and its effectiveness is verified with the dialog events during UNIX operations.  相似文献   

5.
Gestural recognition systems are important tools for leveraging movement‐based interactions in multimodal learning environments but personalizing these interactions has proven difficult. We offer an adaptable model that uses multimodal analytics, enabling students to define their physical interactions with computer‐assisted learning environments. We argue that these interactions are foundational to developing stronger connections between students' physical actions and digital representations within a multimodal space. Our model uses real time learning analytics for gesture recognition, training a hierarchical hidden‐Markov model with a “one‐shot” construct, learning from user‐defined gestures, and accessing 3 different modes of data: skeleton positions, kinematics features, and internal model parameters. Through an empirical comparison with a “pretrained” model, we show that our model can achieve a higher recognition accuracy in repeatability and recall tasks. This suggests that our approach is a promising way to create productive experiences with gesture‐based educational simulations, promoting personalized interfaces, and analytics of multimodal learning scenarios.  相似文献   

6.
目的 情感识别的研究一直致力于帮助系统在人机交互的环节中以更合适的方式来对用户的需求进行反馈。但它在现实应用中的表现却较差。主要原因是缺乏与现实应用环境类似的大规模多模态数据集。现有的野外多模态情感数据集很少,而且受试者数量有限,使用的语言单一。方法 为了满足深度学习算法对数据量的要求,本文收集、注释并准备公开发布一个全新的自然状态下的视频数据集(multimodal emotion dataset,MED)。首先收集人员从电影、电视剧、综艺节目中手工截取视频片段,之后通过注释人员对截取视频片段的标注最终得到了1 839个视频片段。这些视频片段经过人物检测、人脸检测等操作获得有效的视频帧。该数据集包含7种基础情感和3种模态:人脸表情,身体姿态,情感语音。结果 为了提供情感识别的基准,在本文的实验部分,利用机器学习和深度学习方法对MED数据集进行了评估。首先与CK+数据集进行了对比实验,结果表明使用实验室环境下收集的数据开发算法很难应用到实际中,然后对各个模态进行了基线实验,并给出了各个模态的基线。最后多模态融合的实验结果相对于单模态的人脸表情识别提高了4.03%。结论 多模态情感数据库MED扩充了现有的真实环境下多模态数据库,以推进跨文化(语言)情感识别和对不同情感评估的感知分析等方向的研究,提高自动情感计算系统在现实应用中的表现。  相似文献   

7.
This paper describes research that addresses the problem of dialog management from a strong, context‐centric approach. We further present a quantitative method of measuring the importance of contextual cues when dealing with speech‐based human–computer interactions. It is generally accepted that using context in conjunction with a human input, such as spoken speech, enhances a machine's understanding of the user's intent as a means to pinpoint an adequate reaction. For this work, however, we present a context‐centric approach in which the use of context is the primary basis for understanding and not merely an auxiliary process. We employ an embodied conversation agent that facilitates the seamless engagement of a speech‐based information‐deployment entity by its human end user. This dialog manager emphasizes the use of context to drive its mixed‐initiative discourse model. A typical, modern automatic speech recognizer (ASR) was incorporated to handle the speech‐to‐text translations. As is the nature of these ASR systems, the recognition rate is consistently less than perfect, thus emphasizing the need for contextual assistance. The dialog system was encapsulated into a speech‐based embodied conversation agent platform for prototyping and testing purposes. Experiments were performed to evaluate the robustness of its performance, namely through measures of naturalness and usefulness, with respect to the emphasized use of context. The contribution of this work is to provide empirical evidence of the importance of conversational context in speech‐based human–computer interaction using a field‐tested context‐centric dialog manager.  相似文献   

8.
目的 人脸年龄估计技术作为一种新兴的生物特征识别技术,已经成为计算机视觉领域的重要研究方向之一。随着深度学习的飞速发展,基于深度卷积神经网络的人脸年龄估计技术已成为研究热点。方法 本文以基于深度学习的真实年龄和表象年龄估计方法为研究对象,通过调研文献,分析了基于深度学习的人脸年龄估计方法的基本思想和特点,阐述其研究现状,总结关键技术及其局限性,对比了常见人脸年龄估计方法的性能,展望了未来的发展方向。结果 尽管基于深度学习的人脸年龄估计研究取得了巨大的进展,但非受限条件下年龄估计的效果仍不能满足实际需求,主要因为当前人脸年龄估计研究仍存在以下困难:1)引入人脸年龄估计的先验知识不足;2)缺少兼顾全局和局部细节的人脸年龄估计特征表达方法;3)现有人脸年龄估计数据集的限制;4)实际应用环境下的多尺度人脸年龄估计问题。结论 基于深度学习的人脸年龄估计技术已取得显著进展,但是由于实际应用场景复杂,容易导致人脸年龄估计效果不佳。对目前基于深度学习的人脸年龄估计技术进行全面综述,从而为研究者解决存在的问题提供便利。  相似文献   

9.
This paper presents a novel design of face tracking algorithm and visual state estimation for a mobile robot face tracking interaction control system. The advantage of this design is that it can track a user's face under several external uncertainties and estimate the system state without the knowledge about target's 3D motion‐model information. This feature is helpful for the development of a real‐time visual tracking control system. In order to overcome the change in skin color due to light variation, a real‐time face tracking algorithm is proposed based on an adaptive skin color search method. Moreover, in order to increase the robustness against colored observation noise, a new visual state estimator is designed by combining a Kalman filter with an echo state network‐based self‐tuning algorithm. The performance of this estimator design has been evaluated using computer simulation. Several experiments on a mobile robot validate the proposed control system. Copyright © 2010 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society  相似文献   

10.
《Advanced Robotics》2013,27(6):585-604
We are attempting to introduce a 3D, realistic human-like animated face robot to human-robot communication. The face robot can recognize human facial expressions as well as produce realistic facial expressions in real time. For the animated face robot to communicate interactively, we propose a new concept of 'active human interface', and we investigate the performance of real time recognition of facial expressions by neural networks (NN) and the expressionability of facial messages on the face robot. We find that the NN recognition of facial expressions and the face robot's performance in generating facial expressions are of almost same level as that in humans. We also construct an artificial emotion model able to generate six basic emotions in accordance with the recognition of a given facial expression and the situational context. This implies a high potential for the animated face robot to undertake interactive communication with humans, when integrating these three component technologies into the face robot.  相似文献   

11.
《Ergonomics》2012,55(1):43-55
The aim of the study was to determine the influence of textual feedback on the content and outcome of spoken interaction with a natural language dialogue system. More specifically, the assumption that textual feedback could disrupt spoken interaction was tested in a human–computer dialogue situation. In total, 48 adult participants, familiar with the system, had to find restaurants based on simple or difficult scenarios using a real natural language service system in a speech-only (phone), speech plus textual dialogue history (multimodal) or text-only (web) modality. The linguistic contents of the dialogues differed as a function of modality, but were similar whether the textual feedback was included in the spoken condition or not. These results add to burgeoning research efforts on multimodal feedback, in suggesting that textual feedback may have little or no detrimental effect on information searching with a real system.

Statement of Relevance: The results suggest that adding textual feedback to interfaces for human–computer dialogue could enhance spoken interaction rather than create interference. The literature currently suggests that adding textual feedback to tasks that depend on the visual sense benefits human–computer interaction. The addition of textual output when the spoken modality is heavily taxed by the task was investigated.  相似文献   

12.
Abstract

This paper deals with one of several methods which were used to investigate the existence, nature and factors influencing a child's (9-14 years) conception of a computer system. Drawings by subjects as a method of data collection is introduced and discussed. Three empirical studies are described which examine the child's model in terms of its components, conduits and causal effect. The factors of age and experience are highlighted as is the discovery that task orientation led to the implication that no single mental model exists.  相似文献   

13.
《Advanced Robotics》2012,26(17):1995-2020
Abstract

In this paper, we propose a robot that acquires multimodal information, i.e. visual, auditory, and haptic information, fully autonomously using its embodiment. We also propose batch and online algorithms for multimodal categorization based on the acquired multimodal information and partial words given by human users. To obtain multimodal information, the robot detects an object on a flat surface. Then, the robot grasps and shakes it to obtain haptic and auditory information. For obtaining visual information, the robot uses a small hand-held observation table with an XBee wireless controller to control the viewpoints for observing the object. In this paper, for multimodal concept formation, multimodal latent Dirichlet allocation using Gibbs sampling is extended to an online version. This framework makes it possible for the robot to learn object concepts naturally in everyday operation in conjunction with a small amount of linguistic information from human users. The proposed algorithms are implemented on a real robot and tested using real everyday objects to show the validity of the proposed system.  相似文献   

14.
ABSTRACT

Gestural interaction devices emerged and originated various studies on multimodal human–computer interaction to improve user experience (UX). However, there is a knowledge gap regarding the use of these devices to enhance learning. We present an exploratory study which analysed the UX with a multimodal immersive videogame prototype, based on a Portuguese historical/cultural episode. Evaluation tests took place in high school environments and public videogaming events. Two users would be present simultaneously in the same virtual reality (VR) environment: one as the helmsman aboard Vasco da Gama’s fifteenth-century Portuguese ship and the other as the mythical Adamastor stone giant at the Cape of Good Hope. The helmsman player wore a VR headset to explore the environment, whereas the giant player used body motion to control the giant, and observed results on a screen, with no headset. This allowed a preliminary characterisation of UX, identifying challenges and potential use of these devices in multi-user virtual learning contexts. We also discuss the combined use of such devices, towards future development of similar systems, and its implications on learning improvement through multimodal human–computer interaction.  相似文献   

15.

Life - like characters are increasingly gaining the attention of researchers and commercial developers of user interfaces . A strong argument in favor of using such characters in the interface is the rich repertoire of options they offer , enabling the emulation of communication styles common in human - human dialog . This contribution presents a framework for the development of presentation agents , which can be used for a broad range of applications including personalized information delivery fromthe WWW .  相似文献   

16.
目的 疲劳驾驶是引发车辆交通事故的主要原因之一,针对现有方法在驾驶员面部遮挡情况下对眼睛状态识别效果不佳的问题,提出了一种基于自商图—梯度图共生矩阵的驾驶员眼部疲劳检测方法。方法 利用以残差网络(residual network,ResNet)为前置网络的SSD(single shot multibox detector)人脸检测器来获取视频中的有效人脸区域,并通过人脸关键点检测算法分割出眼睛局部区域图像;建立驾驶员眼部的自商图与梯度图共生矩阵模型,分析共生矩阵的数字统计特征,选取效果较好的特征用以判定人眼的开闭状态;结合眼睛闭合时间百分比(percentage of eyelid closure,PERCLOS)与最长闭眼持续时间(maximum closing duration,MCD)两个疲劳指标来判别驾驶员的疲劳状态。结果 在六自由度汽车性能虚拟仿真实验平台上模拟汽车驾驶,采集并分析驾驶员面部视频,本文方法能够有效识别驾驶员面部遮挡时眼睛的开闭状态,准确率高达99.12%,面部未遮挡时的识别精度为98.73%,算法处理视频的速度约为32帧/s。对比方法1采用方向梯度直方图特征与支持向量机分类器相结合的人脸检测算法,并以眼睛纵横比判定开闭眼状态,在面部遮挡时识别较弱;以卷积神经网络(convolutional neural network,CNN)判别眼睛状态的对比方法2虽然在面部遮挡情况下的准确率高达98.02%,但眨眼检测准确率效果不佳。结论 基于自商图—梯度图共生矩阵的疲劳检测方法能够有效识别面部遮挡时眼睛的开闭情况和驾驶员的疲劳状态,具有较快的检测速度与较高的准确率。  相似文献   

17.
Abstract

Two digital computer programs synthesizing optimal maneuvers in one-on-one air-to-air combat situations are described. The method develops intelligently interactive maneuvers without relying on human pilot experience. One program drives one of the interacting aircraft, thus replacing one of the human pilots on the NASA Langley Research Center's Differential Maneuvering Simulator, this in real time. The other program operates in a normal batch processing mode. Both programs use the same technique which maps the physical situation of the two aircraft into a quantized, abstract situation space. The outcome in this situation space is predicted for several trial maneuvers, a value is associated with the outcome of each trial maneuver, and finally, the maneuver with the highest predicted value is executed.

These programs, operating with six degrees of freedom and realistic aerodynamic representation for both aircraft, provide a means for objective evaluation of weapons systems and pilot performance.  相似文献   

18.

User interface designers are challenged to design for diverse users, including those of different genders, cultures and abilities; however, little research has been directed at this problem. One factor which may inhibit such research is its cost. This paper presents an approach which offers a way to seek out important characteristics of designs in a cost-effective way and reports on the results. In a study reported here, subjects from different nationalities and of both genders evaluated three dialog boxes specifically designed for 'white American women'. 'European adult male intellectuals', and 'English-speaking-internationals'. The dialog boxes were evaluated with conjoint techniques of preference rankings, and factor-analysed adjective ratings. These results showed that female subjects had stronger and more consistent patterns of preferences than the male subjects. All subjects preferred interfaces rated high on an accessibility factor and disliked complex layouts; this effect was even stronger for women. Nationality did not effect ratings. Gender had a stronger effect on the outcome than nationality.  相似文献   

19.
《Ergonomics》2012,55(10):1205-1216
Vigilance declines when exposed to highly predictable and uneventful tasks. Monotonous tasks provide little cognitive and motor stimulation and contribute to human errors. This paper aims to model and detect vigilance decline in real time through participants' reaction times during a monotonous task. A laboratory-based experiment adapting the Sustained Attention to Response Task (SART) is conducted to quantify the effect of monotony on overall performance. Relevant parameters are then used to build a model detecting hypovigilance throughout the experiment. The accuracy of different mathematical models is compared to detect in real time – minute by minute – the lapses in vigilance during the task. It is shown that monotonous tasks can lead to an average decline in performance of 45%. Furthermore, vigilance modelling enables the detection of vigilance decline through reaction times with an accuracy of 72% and a 29% false alarm rate. Bayesian models are identified as a better model to detect lapses in vigilance as compared with neural networks and generalised linear mixed models. This modelling could be used as a framework to detect vigilance decline of any human performing monotonous tasks.

Statement of Relevance: Existing research on monotony is largely entangled with endogenous factors such as sleep deprivation, fatigue and circadian rhythm. This paper uses a Bayesian model to assess the effects of a monotonous task on vigilance in real time. It is shown that the negative effects of monotony on the ability to sustain attention can be mathematically modelled and predicted in real time using surrogate measures, such as reaction times. This allows the modelling of vigilance fluctuations.  相似文献   

20.

For users with motion impairments, the standard keyboard and mouse arrangement for computer access often presents problems. Other approaches have to be adopted to overcome this. In this paper, we will describe the development of a prototype multimodal input system based on two gestural input channels. Results from extensive user trials of this system are presented. These trials showed that the physical and cognitive loads on the user can quickly become excessive and detrimental to the interaction. Designers of multimodal input systems need to be aware of this and perform regular user trials to minimize the problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号