共查询到20条相似文献,搜索用时 0 毫秒
1.
Applications with intelligent conversational virtual humans, called Embodied Conversational Agents (ECAs), seek to bring human-like
abilities into machines and establish natural human-computer interaction. In this paper we discuss realization of ECA multimodal
behaviors which include speech and nonverbal behaviors. We devise RealActor, an open-source, multi-platform animation system
for real-time multimodal behavior realization for ECAs. The system employs a novel solution for synchronizing gestures and
speech using neural networks. It also employs an adaptive face animation model based on Facial Action Coding System (FACS)
to synthesize face expressions. Our aim is to provide a generic animation system which can help researchers create believable
and expressive ECAs. 相似文献
2.
We developed and evaluated a multimodal affect detector that combines conversational cues, gross body language, and facial features. The multimodal affect detector uses feature-level fusion to combine the sensory channels and linear discriminant analyses to discriminate between naturally occurring experiences of boredom, engagement/flow, confusion, frustration, delight, and neutral. Training and validation data for the affect detector were collected in a study where 28 learners completed a 32- min. tutorial session with AutoTutor, an intelligent tutoring system with conversational dialogue. Classification results supported a channel × judgment type interaction, where the face was the most diagnostic channel for spontaneous affect judgments (i.e., at any time in the tutorial session), while conversational cues were superior for fixed judgments (i.e., every 20 s in the session). The analyses also indicated that the accuracy of the multichannel model (face, dialogue, and posture) was statistically higher than the best single-channel model for the fixed but not spontaneous affect expressions. However, multichannel models reduced the discrepancy (i.e., variance in the precision of the different emotions) of the discriminant models for both judgment types. The results also indicated that the combination of channels yielded superadditive effects for some affective states, but additive, redundant, and inhibitory effects for others. We explore the structure of the multimodal linear discriminant models and discuss the implications of some of our major findings. 相似文献
3.
In this paper, we investigate an object oriented (OO) architecture for multimodal emotion recognition in interactive applications
through mobile phones or handheld devices. Mobile phones are different from desktop computers since mobile phones are not
performing any processing involving emotion recognition whereas desktop computers can perform such processing. In fact, in
our approach, mobile phones have to pass all data collected to a server and then perform emotion recognition. The object oriented
architecture that we have created, combines evidence from multiple modalities of interaction, namely the mobile device’s keyboard
and the mobile device’s microphone, as well as data from emotion stereotypes. Moreover, the OO method classifies them into
well structured objects with their own properties and methods. The resulting emotion detection server is capable of using
and handling transmitted information from different mobile sources of multimodal data during human-computer interaction. As
a test bed for the affective mobile interaction we have used an educational application that is incorporated into the mobile
system. 相似文献
4.
Plamen Prodanov Andrzej Drygajlo Jonas Richiardi Anil Alexander 《Intelligent Service Robotics》2008,1(1):3-26
The main task of a service robot with a voice-enabled communication interface is to engage a user in dialogue providing an access to the services it is designed for. In managing such interaction, inferring the user goal (intention) from the request for a service at each dialogue turn is the key issue. In service robot deployment conditions speech recognition limitations with noisy speech input and inexperienced users may jeopardize user goal identification. In this paper, we introduce a grounding state-based model motivated by reducing the risk of communication failure due to incorrect user goal identification. The model exploits the multiple modalities available in the service robot system to provide evidence for reaching grounding states. In order to handle the speech input as sufficiently grounded (correctly understood) by the robot, four proposed states have to be reached. Bayesian networks combining speech and non-speech modalities during user goal identification are used to estimate probability that each grounding state has been reached. These probabilities serve as a base for detecting whether the user is attending to the conversation, as well as for deciding on an alternative input modality (e.g., buttons) when the speech modality is unreliable. The Bayesian networks used in the grounding model are specially designed for modularity and computationally efficient inference. The potential of the proposed model is demonstrated comparing a conversational system for the mobile service robot RoboX employing only speech recognition for user goal identification, and a system equipped with multimodal grounding. The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation with multimodal data collected during the conversations of the robot RoboX with users. 相似文献
5.
Jennifer Lai Stella Mitchell Christopher Pavlovski 《International Journal of Speech Technology》2007,10(1):17-30
As Third Generation (3G) networks emerge they provide not only higher data transmission rates but also the ability to transmit
both voice and low latency data within the same session. This paper describes the architecture and implementation of a multimodal
application (voice and text) that uses natural language understanding combined with a WAP browser to access email messages
on a cell phone. We present results from the use of the system by users as part of a laboratory trial that evaluated usage.
The user trial also compared the multimodal system with a text-only system that is representative of current products in the
market today. We discuss the observed modality issues and highlight implementation problems and usability concerns that were
encountered in the trial. Findings indicate that speech was used the majority of the time by participants for both input and
navigation even though most of the participants had little or no prior experience with speech systems (yet did have prior
experience with text-only access to applications on their phones). To our knowledge this represents the first implementation
and evaluation of its kind using this combination of technologies on an unmodified cell phone. Design implications resulting
from the study findings and usability issues encountered are presented to inform the design of future conversational multimodal
mobile applications. 相似文献
6.
Probal Dasgupta 《AI & Society》2007,21(1-2):7-13
This methodological intervention proposes that the typical conversation sets up or modifies Micro Knowledge Profiles by using (partly anaphoric) discourse devices of Thick Cross-referencing; and that a certain type of translation procedure maps from such knowledge on to Macro Acquaintance Profiles. In a typical conversation, partners already acquainted with each other and with various matters renew their acquaintance. This renewal has consequences modifying their knowledge profiles and their action plans. The details that make the conversation flow have to be set aside for the translation procedure to see the consequences of a conversation. The widespread desire to accurately profile the involvement of persons in conversations and the impossibility of telling any conversation-detached truth has been bringing about a mutation in the way we officially share and transmit knowledge; this mutation can be usefully called the Conversational Turn. 相似文献
7.
8.
9.
In this paper, we develop techniques based on evolvability statistics of the fitness landscape surrounding sampled solutions. Averaging the measures over a sample of equal fitness solutions allows us to build up fitness evolvability portraits of the fitness landscape, which we show can be used to compare both the ruggedness and neutrality in a set of tunably rugged and tunably neutral landscapes. We further show that the techniques can be used with solution samples collected through both random sampling of the landscapes and online sampling during optimization. Finally, we apply the techniques to two real evolutionary electronics search spaces and highlight differences between the two search spaces, comparing with the time taken to find good solutions through search. 相似文献
10.
11.
12.
13.
Multimodal identification and tracking in smart environments 总被引:1,自引:0,他引:1
We present a model for unconstrained and unobtrusive identification and tracking of people in smart environments and answering
queries about their whereabouts. Our model supports biometric recognition based upon multiple modalities such as face, gait,
and voice in a uniform manner. The key technical idea underlying our approach is to abstract a smart environment by a state transition system in which each state records a set of individuals who are present in various zones of the environment. Since biometric recognition
is inexact, state information is inherently probabilistic in nature. An event abstracts a biometric recognition step, and
the transition function abstracts the reasoning necessary to effect state transitions. In this manner, we are able to integrate
different biometric modalities uniformly and also different criteria for state transitions. Fusion of biometric modalities
is also supported by our model. We define performance metrics for a smart environment in terms of the concepts of ‘precision’
and ‘recall’. We have developed a prototype implementation of our proposed concepts and provide experimental results in this
paper. Our conclusion is that the state transition model is an effective abstraction of a smart environment and serves as
a good basis for developing practical systems. 相似文献
14.
Humans are known to use a wide range of non-verbal behaviour while speaking. Generating naturalistic embodied speech for an artificial agent is therefore an application where techniques that draw directly on recorded human motions can be helpful. We present a system that uses corpus-based selection strategies to specify the head and eyebrow motion of an animated talking head. We first describe how a domain-specific corpus of facial displays was recorded and annotated, and outline the regularities that were found in the data. We then present two different methods of selecting motions for the talking head based on the corpus data: one that chooses the majority option in all cases, and one that makes a weighted choice among all of the options. We compare these methods to each other in two ways: through cross-validation against the corpus, and by asking human judges to rate the output. The results of the two evaluation studies differ: the cross-validation study favoured the majority strategy, while the human judges preferred schedules generated using weighted choice. The judges in the second study also showed a preference for the original corpus data over the output of either of the generation strategies. 相似文献
15.
Mikio Nakano Yuji Hasegawa Kotaro Funakoshi Johane Takeuchi Toyotaka Torii Kazuhiro Nakadai Naoyuki Kanda Kazunori Komatani Hiroshi G Okuno Hiroshi Tsujino 《Knowledge》2011,24(2):248-256
This paper presents an intelligence model for conversational service robots. It employs modules called experts, each of which is specialized to execute certain kinds of tasks such as performing physical behaviors and engaging in dialogues. Some of the experts take charge in understanding human utterances and deciding robot utterances or actions. The model enables switching and canceling tasks based on recognized human intentions, as well as parallel execution of several tasks. This model specifies the interface that an expert must have, and any kind of expert can be employed if it conforms to the interface. This feature makes the model extensible. 相似文献
16.
17.
《Information Forensics and Security, IEEE Transactions on》2008,3(3):431-440
18.
Aimée Knight 《Computers and Composition》2013,30(2):146-155
Recent scholarship points to the rhetorical role of the aesthetic in multimodal composition and new media contexts. In this article, I examine the aesthetic as a rhetorical concept in writing studies and imagine the ways in which this concept can be useful to teachers of multimodal composition. My treatment of the concept begins with a return to the ancient Greek aisthetikos (relating to perception by the senses) in order to discuss the aesthetic as a meaningful mode of experience. I then review European conceptions of the aesthetic and finally draw from John Dewey and Bruno Latour to help shape this concept into a pragmatic and useful approach that can complement multimodal teaching and learning. The empirical approach I construct adds to an understanding of aesthetic experience with media in order to render more transparent the ways in which an audience creates knowledge—or takes and makes meaning—via the senses. Significantly, this approach to meaning making supports learning in digital environments where students are increasingly asked to both produce and consume media convergent texts that combine multiple modalities including sound, image, and user interaction. 相似文献
19.
Multimodal Interfaces for Cell Phones and Mobile Technology 总被引:1,自引:0,他引:1
By modeling users' natural spoken and multimodal communication patterns, more powerful and highly reliable interfaces can
be designed that support emerging mobile technology. In this paper, we highlight three different examples of research that
is advancing state-of-the-art mobile technology. The first is the development of fusion-based multimodal systems, such as
ones that combine speech and pen or touch input, which are substantially improving the robustness and stability of system
recognition. The second is modeling of multimodal communication patterns to establish open-microphone engagement techniques
that work in challenging multi-person mobile settings. The third is new approaches to adaptive processing, which are able
to transparently guide user input to match system processing capabilities. All three research directions are contributing
to the design of more reliable, usable, and commercially promising mobile systems of the future. 相似文献
20.
俄语的多模态情感分析技术是情感分析领域的研究热点,它可以通过文本、语音和图像等丰富信息自动分析和识别情感,有助于及时了解俄语区民众和国家的舆论热点.但目前俄语的多模态情感语料库还较少,因而制约了俄语情感分析技术的进一步发展.针对该问题,在分析多模态情感语料库的相关研究及情感分类方法的基础上,首先制定了一套科学完整的标注体系,标注内容包括话语、时空和情感3个部分的11项信息;然后在语料库的整个建设和质量监控过程中,遵循情感主体原则和情感连续性原则,拟订出操作性较强的标注规范,进而构建出规模较大的俄语多模态情感语料库;最后探讨了语料库在解析情感表达特点、分析人物性格特征和构造情感识别模型等多个方面的应用. 相似文献