首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Artificial Intelligence》2007,171(8-9):568-585
Head pose and gesture offer several conversational grounding cues and are used extensively in face-to-face interaction among people. To accurately recognize visual feedback, humans often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper we describe how contextual information can be used to predict visual feedback and improve recognition of head gestures in human–computer interfaces. Lexical, prosodic, timing, and gesture features can be used to predict a user's visual feedback during conversational dialog with a robotic or virtual agent. In non-conversational interfaces, context features based on user–interface system events can improve detection of head gestures for dialog box confirmation or document browsing. Our user study with prototype gesture-based components indicate quantitative and qualitative benefits of gesture-based confirmation over conventional alternatives. Using a discriminative approach to contextual prediction and multi-modal integration, performance of head gesture detection was improved with context features even when the topic of the test set was significantly different than the training set.  相似文献   

2.
This paper discusses a multi-agent and its speech act theory supporting cooperative mechanical CAD. The multi-agent is constructed with organizer, agents and communication manager. The structures of the organizer and agent are elaborated. The communication manager consists of central management and dialog management. This speech act theory supports the agents communication involving more than 2 people. The Pr/T net of speech act and firing rules are presented. The multi-agent model has been put into practice of computer integrated autobody development, and make the distributed cooperative autobody design system more flexible and coordinate.  相似文献   

3.
4.
In this paper we present a speech-to-speech (S2S) translation system called the BBN TransTalk that enables two-way communication between speakers of English and speakers who do not understand or speak English. The BBN TransTalk has been configured for several languages including Iraqi Arabic, Pashto, Dari, Farsi, Malay, Indonesian, and Levantine Arabic. We describe the key components of our system: automatic speech recognition (ASR), machine translation (MT), text-to-speech (TTS), dialog manager, and the user interface (UI). In addition, we present novel techniques for overcoming specific challenges in developing high-performing S2S systems. For ASR, we present techniques for dealing with lack of pronunciation and linguistic resources and effective modeling of ambiguity in pronunciations of words in these languages. For MT, we describe techniques for dealing with data sparsity as well as modeling context. We also present and compare different user confirmation techniques for detecting errors that can cause the dialog to drift or stall.  相似文献   

5.
The presence of disfluencies in spontaneous speech, while poses a challenge for robust automatic recognition, also offers means for gaining additional insights into understanding a speaker's communicative and cognitive state. This paper analyzes disfluencies in children's spontaneous speech, in the context of spoken dialog based computer game play, and addresses the automatic detection of disfluency boundaries. Although several approaches have been proposed to detect disfluencies in speech, relatively little work has been done to utilize visual information to improve the performance and robustness of the disfluency detection system. This paper describes the use of visual information along with prosodic and language information to detect the presence of disfluencies in a child's computer-directed speech and shows how these information sources can be integrated to increase the overall information available for disfluency detection. The experimental results on our children's multimodal dialog corpus indicate that disfluency detection accuracy of over 80% can be obtained by utilizing audio-visual information. Specifically, results showed that the addition of visual information to prosody and language features yield relative improvements in disfluency detection error rates of 3.6% and 6.3%, respectively, for information fusion at the feature level and decision level.  相似文献   

6.
The automatic recognition of user’s communicative style within a spoken dialog system framework, including the affective aspects, has received increased attention in the past few years. For dialog systems, it is important to know not only what was said but also how something was communicated, so that the system can engage the user in a richer and more natural interaction. This paper addresses the problem of automatically detecting “frustration”, “politeness”, and “neutral” attitudes from a child’s speech communication cues, elicited in spontaneous dialog interactions with computer characters. Several information sources such as acoustic, lexical, and contextual features, as well as, their combinations are used for this purpose. The study is based on a Wizard-of-Oz dialog corpus of 103 children, 7–14 years of age, playing a voice activated computer game. Three-way classification experiments, as well as, pairwise classification between polite vs. others and frustrated vs. others were performed. Experimental results show that lexical information has more discriminative power than acoustic and contextual cues for detection of politeness, whereas context and acoustic features perform best for frustration detection. Furthermore, the fusion of acoustic, lexical and contextual information provided significantly better classification results. Results also showed that classification performance varies with age and gender. Specifically, for the “politeness” detection task, higher classification accuracy was achieved for females and 10–11 years-olds, compared to males and other age groups, respectively.  相似文献   

7.
The design of Spoken Dialog Systems cannot be considered as the simple combination of speech processing technologies. Indeed, speech-based interface design has been an expert job for a long time. It necessitates good skills in speech technologies and low-level programming. Moreover, rapid development and reusability of previously designed systems remains uneasy. This makes optimality and objective evaluation of design very difficult. The design process is therefore a cyclic process composed of prototype releases, user satisfaction surveys, bug reports and refinements. It is well known that human intervention for testing is time-consuming and above all very expensive. This is one of the reasons for the recent interest in dialog simulation for evaluation as well as for design automation and optimization. In this paper we expose a probabilistic framework for a realistic simulation of spoken dialogs in which the major components of a dialog system are modeled and parameterized thanks to independent data or expert knowledge. Especially, an Automatic Speech Recognition (ASR) system model and a User Model (UM) have been developed. The ASR model, based on articulatory similarities in language models, provides task-adaptive performance prediction and Confidence Level (CL) distribution estimation. The user model relies on the Bayesian Networks (BN) paradigm and is used both for user behavior modeling and Natural Language Understanding (NLU) modeling. The complete simulation framework has been used to train a reinforcement-learning agent on two different tasks. These experiments helped to point out several potentially problematic dialog scenarios.  相似文献   

8.
GUS is the first of a series of experimental computer systems that we intend to construct as part of a program of research on language understanding. In large measure, these systems will fill the role of periodic progress reports, summarizing what we have learned, assessing the mutual coherence of the various lines of investigation we have been following, and suggesting where more emphasis is needed in future work. GUS (Genial Understander System) is intended to engage a sympathetic and highly cooperative human in an English dialog, directed towards a specific goal within a very restricted domain of discourse. As a starting point, GUS was restricted to the role of a travel agent in a conversation with a client who wants to make a simple return trip to a single city in California.There is good reason for restricting the domain of discourse for a computer system which is to engage in an English dialog. Specializing the subject matter that the system can talk about permits it to achieve some measure of realism without encompassing all the possibilities of human knowledge or of the English language. It also provides the user with specific motivation for participating in the conversation, thus narrowing the range of expectations that GUS must have about the user's purposes. A system restricted in this way will be more able to guide the conversation within the boundaries of its competence.  相似文献   

9.
Modeling conversation policies using permissions and obligations   总被引:1,自引:1,他引:1  
Both conversation specifications and policies are required to facilitate effective agent communication. Specifications provide the order in which speech acts can occur in a meaningful conversation, whereas policies restrict the specifications that can be used in a certain conversation based on the sender, receiver, messages exchanged thus far, content, and other context. We propose that positive/negative permissions and obligations be used to model conversation specifications and policies. We also propose the use of ontologies to categorize speech acts such that high level policies can be defined without going into specifics of the speech acts. This approach is independent of the syntax and semantics of the communication language and can be used for different agent communication languages. Our policy based framework can help in agent communication in three ways: (i) to filter inappropriate messages, (ii) to help an agent to decide which speech act to use next, and (iii) to prevent an agent from sending inappropriate messages. Our work differs from most existing research on communication policies because it is not tightly coupled to any domain information such as the mental states of agents or specific communicative acts. Contributions of this work include: (i) an extensible framework that is applicable to varied domain knowledge and different agent communication languages, and (ii) the declarative representation of conversation specifications and policies in terms of permitted and obligated speech acts.  相似文献   

10.
We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output.  相似文献   

11.
在自然人机对话中,由于环境噪声、方言口音等因素带来的语音识别错误以及语义分析的不充分等原因,计算机在理解用户交互意图时出现偏差,使得计算机对要反馈的话题出现错误,造成人机对话进程的断裂.以面向咖啡为主题的漫谈式人机对话为例,将对话中断分为3种情况:话题反馈不当引起中断、话题正确情况下的模糊反馈不当和精确反馈不当引起中断.根据用户与计算机对话的记录分析比较上述3种情况下人机对话进程断裂情况.统计数据结果表明,话题反馈不当带来的对话中断最为明显,在对话进程断裂情况中达到了60.1%的比例;在话题反馈正确情况下,模糊回答不当和精确回答不当带来的话题中断比例分别为22.2%和21.6%;在语音识别错误情况下,语义分析会带来数量更大的反馈错误.实验数据分析结果表明,在语音识别错误情况下,根据上下文信息提高计算机对用户话题反馈的准确率,能够有效降低人机对话的中断,提高人机对话的自然度.该工作为自然人机对话的意图分类重要性提供了数据分析和实验论证.  相似文献   

12.
Designing 3D objects from scratch is difficult, especially when the user intent is fuzzy and lacks a clear target form. We facilitate design by providing reference and inspiration from existing model contexts. We rethink model design as navigating through different possible combinations of part assemblies based on a large collection of pre‐segmented 3D models. We propose an interactive sketch‐to‐design system, where the user sketches prominent features of parts to combine. The sketched strokes are analysed individually, and more importantly, in context with the other parts to generate relevant shape suggestions via adesign galleryinterface. As a modelling session progresses and more parts get selected, contextual cues become increasingly dominant, and the model quickly converges to a final form. As a key enabler, we use pre‐learned part‐based contextual information to allow the user to quickly explore different combinations of parts. Our experiments demonstrate the effectiveness of our approach for efficiently designing new variations from existing shape collections.  相似文献   

13.
Control in spoken dialog systems is challenging largely because automatic speech recognition is unreliable, and hence the state of the conversation can never be known with certainty. Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for planning and control in this context; however, POMDPs face severe scalability challenges, and past work has been limited to trivially small dialog tasks. This paper presents a novel POMDP optimization technique-composite summary point-based value iteration (CSPBVI)-which enables optimization to be performed on slot-filling POMDP-based dialog managers of a realistic size. Using dialog models trained on data from a tourist information domain, simulation results show that CSPBVI scales effectively, outperforms non-POMDP baselines, and is robust to estimation errors.  相似文献   

14.
Ambient Assisted Living (AAL) systems must provide adapted services easily accessible by a wide variety of users. This can only be possible if the communication between the user and the system is carried out through an interface that is simple, rapid, effective, and robust. Natural language interfaces such as dialog systems fulfill these requisites, as they are based on a spoken conversation that resembles human communication. In this paper, we enhance systems interacting in AAL domains by means of incorporating context-aware conversational agents that consider the external context of the interaction and predict the user’s state. The user’s state is built on the basis of their emotional state and intention, and it is recognized by means of a module conceived as an intermediate phase between natural language understanding and dialog management in the architecture of the conversational agent. This prediction, carried out for each user turn in the dialog, makes it possible to adapt the system dynamically to the user’s needs. We have evaluated our proposal developing a context-aware system adapted to patients suffering from chronic pulmonary diseases, and provide a detailed discussion of the positive influence of our proposal in the success of the interaction, the information and services provided, as well as the perceived quality.  相似文献   

15.
16.
Previous research has shown the importance of individual learning goal orientation for both job and task performance and consequently organizational performance. Despite its importance, knowledge on the antecedents of learning goal orientation remains scarce, especially in the context of self‐managing team‐based organizations. In fact, most of the research on goal orientation antecedents has been focused on individual characteristics, belief, and ability, while the contextual factors that might influence them remain unspecified. We build on and further extend earlier studies by jointly exploring the role of individual and contextual factors affecting individual learning orientation. In particular, this study combines individual informal social network, self‐efficacy, performance feedbacks, and team identification into a model that explains individuals' learning goal orientation within self‐managing team‐based organizations. The model was empirically tested on a sample of 104 individuals belonging to an R&D organization relying on self‐managing teams. Results show that performance feedback has a negative direct effect, while team identification has a positive direct effect on individual learning goal orientation. In addition, we found that individual self‐efficacy is a mediator of the relationships between performance feedback and brokerage in the advice network and individual learning goal orientation. Finally, we did not find a relationship between centrality in the friendship network and individual learning goal orientation.  相似文献   

17.
This paper presents a vision‐based localization and mapping algorithm developed for an unmanned aerial vehicle (UAV) that can operate in a riverine environment. Our algorithm estimates the three‐dimensional positions of point features along a river and the pose of the UAV. By detecting features surrounding a river and the corresponding reflections on the water's surface, we can exploit multiple‐view geometry to enhance the observability of the estimation system. We use a robot‐centric mapping framework to further improve the observability of the estimation system while reducing the computational burden. We analyze the performance of the proposed algorithm with numerical simulations and demonstrate its effectiveness through experiments with data from Crystal Lake Park in Urbana, Illinois. We also draw a comparison to existing approaches. Our experimental platform is equipped with a lightweight monocular camera, an inertial measurement unit, a magnetometer, an altimeter, and an onboard computer. To our knowledge, this is the first result that exploits the reflections of features in a riverine environment for localization and mapping.  相似文献   

18.
Conventional approaches to speech-to-speech (S2S) translation typically ignore key contextual information such as prosody, emphasis, discourse state in the translation process. Capturing and exploiting such contextual information is especially important in machine-mediated S2S translation as it can serve as a complementary knowledge source that can potentially aid the end users in improved understanding and disambiguation. In this work, we present a general framework for integrating rich contextual information in S2S translation. We present novel methodologies for integrating source side context in the form of dialog act (DA) tags, and target side context using prosodic word prominence. We demonstrate the integration of the DA tags in two different statistical translation frameworks, phrase-based translation and a bag-of-words lexical choice model. In addition to producing interpretable DA annotated target language translations, we also obtain significant improvements in terms of automatic evaluation metrics such as lexical selection accuracy and BLEU score. Our experiments also indicate that finer representation of dialog information such as yes–no questions, wh-questions and open questions are the most useful in improving translation quality. For target side enrichment, we employ factored translation models to integrate the assignment and transfer of prosodic word prominence (pitch accents) during translation. The factored translation models provide significant improvement in assignment of correct pitch accents to the target words in comparison with a post-processing approach. Our framework is suitable for integrating any word or utterance level contextual information that can be reliably detected (recognized) from speech and/or text.  相似文献   

19.
Oral discourse is the primary form of human–human communication, hence, computer interfaces that communicate via unstructured spoken dialogues will presumably provide a more efficient, meaningful, and naturalistic interaction experience. Within the context of learning environments, there are theoretical positions supporting a speech facilitation hypothesis that predicts that spoken tutorial dialogues will increase learning more than typed dialogues. We evaluated this hypothesis in an experiment where 24 participants learned computer literacy via a spoken and a typed conversation with AutoTutor, an intelligent tutoring system with conversational dialogues. The results indicated that (a) enhanced content coverage was achieved in the spoken condition; (b) learning gains for both modalities were on par and greater than a no-instruction control; (c) although speech recognition errors were unrelated to learning gains, they were linked to participants' evaluations of the tutor; (d) participants adjusted their conversational styles when speaking compared to typing; (e) semantic and statistical natural language understanding approaches to comprehending learners' responses were more resilient to speech recognition errors than syntactic and symbolic-based approaches; and (f) simulated speech recognition errors had differential impacts on the fidelity of different semantic algorithms. We discuss the impact of our findings on the speech facilitation hypothesis and on human–computer interfaces that support spoken dialogues.  相似文献   

20.
The web has become the largest repository of multimedia information and its convergence with telecommunications is now bringing the benefits of web technology to hand-held devices. To optimize data access using these devices and provide services which meet the user needs through intelligent information retrieval, the system must sense and interpret the user environment and the communication context. In addition, natural spoken conversation with handheld devices makes possible the use of these applications in environments in which the use of GUI interfaces is not effective, provides a more natural human-computer interaction, and facilitates access to the web for people with visual or motor disabilities, allowing their integration and the elimination of barriers to Internet access. In this paper, we present an architecture for the design of context-aware systems that use speech to access web services. Our contribution focuses specifically on the use of context information to improve the effectiveness of providing web services by using a spoken dialog system for the user-system interaction. We also describe an application of our proposal to develop a context-aware railway information system, and provide a detailed evaluation of the influence of the context information in the quality of the services that are supplied.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号