首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Applications with intelligent conversational virtual humans, called Embodied Conversational Agents (ECAs), seek to bring human-like abilities into machines and establish natural human-computer interaction. In this paper we discuss realization of ECA multimodal behaviors which include speech and nonverbal behaviors. We devise RealActor, an open-source, multi-platform animation system for real-time multimodal behavior realization for ECAs. The system employs a novel solution for synchronizing gestures and speech using neural networks. It also employs an adaptive face animation model based on Facial Action Coding System (FACS) to synthesize face expressions. Our aim is to provide a generic animation system which can help researchers create believable and expressive ECAs.  相似文献   

2.
We developed and evaluated a multimodal affect detector that combines conversational cues, gross body language, and facial features. The multimodal affect detector uses feature-level fusion to combine the sensory channels and linear discriminant analyses to discriminate between naturally occurring experiences of boredom, engagement/flow, confusion, frustration, delight, and neutral. Training and validation data for the affect detector were collected in a study where 28 learners completed a 32- min. tutorial session with AutoTutor, an intelligent tutoring system with conversational dialogue. Classification results supported a channel × judgment type interaction, where the face was the most diagnostic channel for spontaneous affect judgments (i.e., at any time in the tutorial session), while conversational cues were superior for fixed judgments (i.e., every 20 s in the session). The analyses also indicated that the accuracy of the multichannel model (face, dialogue, and posture) was statistically higher than the best single-channel model for the fixed but not spontaneous affect expressions. However, multichannel models reduced the discrepancy (i.e., variance in the precision of the different emotions) of the discriminant models for both judgment types. The results also indicated that the combination of channels yielded superadditive effects for some affective states, but additive, redundant, and inhibitory effects for others. We explore the structure of the multimodal linear discriminant models and discuss the implications of some of our major findings.  相似文献   

3.
In this paper, we investigate an object oriented (OO) architecture for multimodal emotion recognition in interactive applications through mobile phones or handheld devices. Mobile phones are different from desktop computers since mobile phones are not performing any processing involving emotion recognition whereas desktop computers can perform such processing. In fact, in our approach, mobile phones have to pass all data collected to a server and then perform emotion recognition. The object oriented architecture that we have created, combines evidence from multiple modalities of interaction, namely the mobile device’s keyboard and the mobile device’s microphone, as well as data from emotion stereotypes. Moreover, the OO method classifies them into well structured objects with their own properties and methods. The resulting emotion detection server is capable of using and handling transmitted information from different mobile sources of multimodal data during human-computer interaction. As a test bed for the affective mobile interaction we have used an educational application that is incorporated into the mobile system.  相似文献   

4.
The main task of a service robot with a voice-enabled communication interface is to engage a user in dialogue providing an access to the services it is designed for. In managing such interaction, inferring the user goal (intention) from the request for a service at each dialogue turn is the key issue. In service robot deployment conditions speech recognition limitations with noisy speech input and inexperienced users may jeopardize user goal identification. In this paper, we introduce a grounding state-based model motivated by reducing the risk of communication failure due to incorrect user goal identification. The model exploits the multiple modalities available in the service robot system to provide evidence for reaching grounding states. In order to handle the speech input as sufficiently grounded (correctly understood) by the robot, four proposed states have to be reached. Bayesian networks combining speech and non-speech modalities during user goal identification are used to estimate probability that each grounding state has been reached. These probabilities serve as a base for detecting whether the user is attending to the conversation, as well as for deciding on an alternative input modality (e.g., buttons) when the speech modality is unreliable. The Bayesian networks used in the grounding model are specially designed for modularity and computationally efficient inference. The potential of the proposed model is demonstrated comparing a conversational system for the mobile service robot RoboX employing only speech recognition for user goal identification, and a system equipped with multimodal grounding. The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation with multimodal data collected during the conversations of the robot RoboX with users.  相似文献   

5.
As Third Generation (3G) networks emerge they provide not only higher data transmission rates but also the ability to transmit both voice and low latency data within the same session. This paper describes the architecture and implementation of a multimodal application (voice and text) that uses natural language understanding combined with a WAP browser to access email messages on a cell phone. We present results from the use of the system by users as part of a laboratory trial that evaluated usage. The user trial also compared the multimodal system with a text-only system that is representative of current products in the market today. We discuss the observed modality issues and highlight implementation problems and usability concerns that were encountered in the trial. Findings indicate that speech was used the majority of the time by participants for both input and navigation even though most of the participants had little or no prior experience with speech systems (yet did have prior experience with text-only access to applications on their phones). To our knowledge this represents the first implementation and evaluation of its kind using this combination of technologies on an unmodified cell phone. Design implications resulting from the study findings and usability issues encountered are presented to inform the design of future conversational multimodal mobile applications.  相似文献   

6.
This methodological intervention proposes that the typical conversation sets up or modifies Micro Knowledge Profiles by using (partly anaphoric) discourse devices of Thick Cross-referencing; and that a certain type of translation procedure maps from such knowledge on to Macro Acquaintance Profiles. In a typical conversation, partners already acquainted with each other and with various matters renew their acquaintance. This renewal has consequences modifying their knowledge profiles and their action plans. The details that make the conversation flow have to be set aside for the translation procedure to see the consequences of a conversation. The widespread desire to accurately profile the involvement of persons in conversations and the impossibility of telling any conversation-detached truth has been bringing about a mutation in the way we officially share and transmit knowledge; this mutation can be usefully called the Conversational Turn.  相似文献   

7.
8.
9.
In this paper, we develop techniques based on evolvability statistics of the fitness landscape surrounding sampled solutions. Averaging the measures over a sample of equal fitness solutions allows us to build up fitness evolvability portraits of the fitness landscape, which we show can be used to compare both the ruggedness and neutrality in a set of tunably rugged and tunably neutral landscapes. We further show that the techniques can be used with solution samples collected through both random sampling of the landscapes and online sampling during optimization. Finally, we apply the techniques to two real evolutionary electronics search spaces and highlight differences between the two search spaces, comparing with the time taken to find good solutions through search.  相似文献   

10.
多通道整合的相关问题及算法   总被引:2,自引:0,他引:2  
张宏超  俸文  周方  孙亚民 《计算机工程》2004,30(13):67-68,171
多通道界面旨在充分利用一个以上的感觉和动作通道来捕捉用户的意向,提高人机交互的自然性和高效性,其核心问题就是通道整合的问题。该文针对该问题提出了一个基于分层的任务模型的整合算法,该算法处理中涉及到程序运行状态表示、任务结构的设计和交互原语相关性表示等问题。针对以上3个问题展开讨论,在此基础上得到最后的算法。  相似文献   

11.
12.
13.
Multimodal identification and tracking in smart environments   总被引:1,自引:0,他引:1  
We present a model for unconstrained and unobtrusive identification and tracking of people in smart environments and answering queries about their whereabouts. Our model supports biometric recognition based upon multiple modalities such as face, gait, and voice in a uniform manner. The key technical idea underlying our approach is to abstract a smart environment by a state transition system in which each state records a set of individuals who are present in various zones of the environment. Since biometric recognition is inexact, state information is inherently probabilistic in nature. An event abstracts a biometric recognition step, and the transition function abstracts the reasoning necessary to effect state transitions. In this manner, we are able to integrate different biometric modalities uniformly and also different criteria for state transitions. Fusion of biometric modalities is also supported by our model. We define performance metrics for a smart environment in terms of the concepts of ‘precision’ and ‘recall’. We have developed a prototype implementation of our proposed concepts and provide experimental results in this paper. Our conclusion is that the state transition model is an effective abstraction of a smart environment and serves as a good basis for developing practical systems.  相似文献   

14.
Humans are known to use a wide range of non-verbal behaviour while speaking. Generating naturalistic embodied speech for an artificial agent is therefore an application where techniques that draw directly on recorded human motions can be helpful. We present a system that uses corpus-based selection strategies to specify the head and eyebrow motion of an animated talking head. We first describe how a domain-specific corpus of facial displays was recorded and annotated, and outline the regularities that were found in the data. We then present two different methods of selecting motions for the talking head based on the corpus data: one that chooses the majority option in all cases, and one that makes a weighted choice among all of the options. We compare these methods to each other in two ways: through cross-validation against the corpus, and by asking human judges to rate the output. The results of the two evaluation studies differ: the cross-validation study favoured the majority strategy, while the human judges preferred schedules generated using weighted choice. The judges in the second study also showed a preference for the original corpus data over the output of either of the generation strategies.  相似文献   

15.
This paper presents an intelligence model for conversational service robots. It employs modules called experts, each of which is specialized to execute certain kinds of tasks such as performing physical behaviors and engaging in dialogues. Some of the experts take charge in understanding human utterances and deciding robot utterances or actions. The model enables switching and canceling tasks based on recognized human intentions, as well as parallel execution of several tasks. This model specifies the interface that an expert must have, and any kind of expert can be employed if it conforms to the interface. This feature makes the model extensible.  相似文献   

16.
17.
In this paper, we present a fully automated multi- modal (3-D and 2-D) face recognition system. For the 3-D modality, we model the facial image as a 3-D binary ridge image that contains the ridge lines on the face. We use the principal curvature $kappa_{rm max}$ to extract the locations of the ridge lines around the important facial regions on the range image (i.e., the eyes, the nose, and the mouth.) For matching, we utilize a fast variant of the iterative closest point to match the ridge image of a given probe image to the archived ridge images in the database. The main advantage of this approach is reducing the computational complexity by two orders of magnitude by relying on the ridge lines. For the 2-D modality, we model the face by an attributed relational graph (ARG), where each node of the graph corresponds to a facial feature point. At each facial feature point, a set of attributes is extracted by applying Gabor wavelets to the 2-D image and assigned to the node of the graph. The edges of the graph are defined based on Delaunay triangulation and a set of geometrical features that defines the mutual relations between the edges is extracted from the Delaunay triangles and stored in the ARG model. The similarity measure between the ARG models that represent the probe and gallery images is used for 2-D face recognition. Finally, we fuse the matching results of the 3-D and the 2-D modalities at the score level to improve the overall performance of the system. Different techniques for fusion, such as the Dempster–Shafer theory of evidence and weighted sum of scores are employed and tested using the facial images in the third experiment dataset of the Face Recognition Grand Challenge version 2.0.   相似文献   

18.
Recent scholarship points to the rhetorical role of the aesthetic in multimodal composition and new media contexts. In this article, I examine the aesthetic as a rhetorical concept in writing studies and imagine the ways in which this concept can be useful to teachers of multimodal composition. My treatment of the concept begins with a return to the ancient Greek aisthetikos (relating to perception by the senses) in order to discuss the aesthetic as a meaningful mode of experience. I then review European conceptions of the aesthetic and finally draw from John Dewey and Bruno Latour to help shape this concept into a pragmatic and useful approach that can complement multimodal teaching and learning. The empirical approach I construct adds to an understanding of aesthetic experience with media in order to render more transparent the ways in which an audience creates knowledge—or takes and makes meaning—via the senses. Significantly, this approach to meaning making supports learning in digital environments where students are increasingly asked to both produce and consume media convergent texts that combine multiple modalities including sound, image, and user interaction.  相似文献   

19.
Multimodal Interfaces for Cell Phones and Mobile Technology   总被引:1,自引:0,他引:1  
By modeling users' natural spoken and multimodal communication patterns, more powerful and highly reliable interfaces can be designed that support emerging mobile technology. In this paper, we highlight three different examples of research that is advancing state-of-the-art mobile technology. The first is the development of fusion-based multimodal systems, such as ones that combine speech and pen or touch input, which are substantially improving the robustness and stability of system recognition. The second is modeling of multimodal communication patterns to establish open-microphone engagement techniques that work in challenging multi-person mobile settings. The third is new approaches to adaptive processing, which are able to transparently guide user input to match system processing capabilities. All three research directions are contributing to the design of more reliable, usable, and commercially promising mobile systems of the future.  相似文献   

20.
徐琳宏  刘鑫  原伟  祁瑞华 《计算机科学》2021,48(11):312-318
俄语的多模态情感分析技术是情感分析领域的研究热点,它可以通过文本、语音和图像等丰富信息自动分析和识别情感,有助于及时了解俄语区民众和国家的舆论热点.但目前俄语的多模态情感语料库还较少,因而制约了俄语情感分析技术的进一步发展.针对该问题,在分析多模态情感语料库的相关研究及情感分类方法的基础上,首先制定了一套科学完整的标注体系,标注内容包括话语、时空和情感3个部分的11项信息;然后在语料库的整个建设和质量监控过程中,遵循情感主体原则和情感连续性原则,拟订出操作性较强的标注规范,进而构建出规模较大的俄语多模态情感语料库;最后探讨了语料库在解析情感表达特点、分析人物性格特征和构造情感识别模型等多个方面的应用.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号