首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to significantly improve the performance of the speech recognition, which leads to an improvement in both the language understanding and the dialogue management tasks.  相似文献   

2.
Traditional dialogue systems use a fixed silence threshold to detect the end of users’ turns. Such a simplistic model can result in system behaviour that is both interruptive and unresponsive, which in turn affects user experience. Various studies have observed that human interlocutors take cues from speaker behaviour, such as prosody, syntax, and gestures, to coordinate smooth exchange of speaking turns. However, little effort has been made towards implementing these models in dialogue systems and verifying how well they model the turn-taking behaviour in human–computer interactions. We present a data-driven approach to building models for online detection of suitable feedback response locations in the user's speech. We first collected human–computer interaction data using a spoken dialogue system that can perform the Map Task with users (albeit using a trick). On this data, we trained various models that use automatically extractable prosodic, contextual and lexico-syntactic features for detecting response locations. Next, we implemented a trained model in the same dialogue system and evaluated it in interactions with users. The subjective and objective measures from the user evaluation confirm that a model trained on speaker behavioural cues offers both smoother turn-transitions and more responsive system behaviour.  相似文献   

3.
Speech and hand gestures offer the most natural modalities for everyday human-to-human interaction. The availability of diverse spoken dialogue applications and the proliferation of accelerometers on consumer electronics allow the introduction of new interaction paradigms based on speech and gestures. Little attention has been paid, however, to the manipulation of spoken dialogue systems (SDS) through gestures. Situation-induced disabilities or real disabilities are determinant factors that motivate this type of interaction. In this paper, six concise and intuitively meaningful gestures are proposed that can be used to trigger the commands in any SDS. Using different machine learning techniques, a classification error for the gesture patterns of less than 5 % is achieved, and the proposed set of gestures is compared to ones proposed by users. Examining the social acceptability of the specific interaction scheme, high levels of acceptance for public use are encountered. An experiment was conducted comparing a button-enabled and a gesture-enabled interface, which showed that the latter imposes little additional mental and physical effort. Finally, results are provided after recruiting a male subject with spastic cerebral palsy, a blind female user, and an elderly female person.  相似文献   

4.
This paper proposes a new technique to increase the robustness of spoken dialogue systems employing an automatic procedure that aims to correct frames incorrectly generated by the system’s component that deals with spoken language understanding. To do this the technique carries out a training that takes into account knowledge of previous system misunderstandings. The correction is transparent for the user as he is not aware of some mistakes made by the speech recogniser and thus interaction with the system can proceed more naturally. Experiments have been carried out using two spoken dialogue systems previously developed in our lab: Saplen and Viajero, which employ prompt-dependent and prompt-independent language models for speech recognition. The results obtained from 10,000 simulated dialogues show that the technique improves the performance of the two systems for both kinds of language modelling, especially for the prompt-independent language model. Using this type of model the Saplen system increases sentence understanding by 19.54%, task completion by 26.25%, word accuracy by 7.53%, and implicit recovery of speech recognition errors by 20.3%, whereas for the Viajero system these figures increase by 14.93%, 18.06%, 6.98% and 15.63%, respectively.  相似文献   

5.
This paper proposes a new technique to test the performance of spoken dialogue systems by artificially simulating the behaviour of three types of user (very cooperative, cooperative and not very cooperative) interacting with a system by means of spoken dialogues. Experiments using the technique were carried out to test the performance of a previously developed dialogue system designed for the fast-food domain and working with two kinds of language model for automatic speech recognition: one based on 17 prompt-dependent language models, and the other based on one prompt-independent language model. The use of the simulated user enables the identification of problems relating to the speech recognition, spoken language understanding, and dialogue management components of the system. In particular, in these experiments problems were encountered with the recognition and understanding of postal codes and addresses and with the lengthy sequences of repetitive confirmation turns required to correct these errors. By employing a simulated user in a range of different experimental conditions sufficient data can be generated to support a systematic analysis of potential problems and to enable fine-grained tuning of the system.  相似文献   

6.
This paper describes an empirical study of man-computer speech interaction. The goals of the experiment were to find out how people would communicate with a real-time, speaker-independent continuous speech understanding system. The experimental design compared three communication modes: natural language typing, speaking directly to a computer and speaking to a computer through a human interpreter. The results show that speech to a computer is not as ill-formed as one would expect. People speaking to a computer are more disciplined than when speaking to each other. There are significant differences in the usage of spoken language compared to typed language, and several phenomena which are unique to spoken or typed input respectively. Usefulness for work in speech understanding systems for the future is considered.  相似文献   

7.
For acquiring new skills or knowledge, contemporary learners frequently rely on the help of educational technologies supplementing human teachers as a learning aid. In the interaction with such systems, speech-based communication between the human user and the technical system has increasingly gained importance. Since spoken computer output can take on a variety of forms depending on the method of speech generation and the employment of prosodic modulations, the effects of such auditory variations on the user’s learning achievement require systematic investigation. The experiment reported here examined the specific effects of speech generation method and prosody of spoken system feedback in a computer-supported learning environment, and may serve as validational tool for future investigations of spoken computer feedback effects on learning. Learning performance in a basic cognitive task was compared between users receiving pre-recorded, naturally spoken system feedback with neutral prosody, pre-recorded feedback with motivating (praising or blaming) prosody, or computer-synthesized feedback. The observed results provide empirical evidence that users of technical tutoring systems benefit from pre-recorded, naturally spoken feedback, and do even more so from feedback with motivational prosodic modulations matching their performance success. Theoretical implications and considerations for future implementations of spoken feedback in computer-based educational systems are discussed.  相似文献   

8.
Human–human interaction consists of various nonverbal behaviors that are often emotion-related. To establish rapport, it is essential that the listener respond to reactive emotion in a way that makes sense given the speaker's emotional state. However, human–robot interactions generally fail in this regard because most spoken dialogue systems play only a question-answer role. Aiming for natural conversation, we examine an emotion processing module that consists of a user emotion recognition function and a reactive emotion expression function for a spoken dialogue system to improve human–robot interaction. For the emotion recognition function, we propose a method that combines valence from prosody and sentiment from text by decision-level fusion, which considerably improves the performance. Moreover, this method reduces fatal recognition errors, thereby improving the user experience. For the reactive emotion expression function, the system's emotion is divided into emotion category and emotion level, which are predicted using the parameters estimated by the recognition function on the basis of distributions inferred from human–human dialogue data. As a result, the emotion processing module can recognize the user's emotion from his/her speech, and expresses a reactive emotion that matches. Evaluation with ten participants demonstrated that the system enhanced by this module is effective to conduct natural conversation.  相似文献   

9.
Spoken dialogue system performance can vary widely for different users, as well for the same user during different dialogues. This paper presents the design and evaluation of an adaptive version of TOOT, a spoken dialogue system for retrieving online train schedules. Based on rules learned from a set of training dialogues, adaptive TOOT constructs a user model representing whether the user is having speech recognition problems as a particular dialogue progresses. Adaptive TOOT then automatically adapts its dialogue strategies based on this dynamically changing user model. An empirical evaluation of the system demonstrates the utility of the approach.  相似文献   

10.
Oral discourse is the primary form of human–human communication, hence, computer interfaces that communicate via unstructured spoken dialogues will presumably provide a more efficient, meaningful, and naturalistic interaction experience. Within the context of learning environments, there are theoretical positions supporting a speech facilitation hypothesis that predicts that spoken tutorial dialogues will increase learning more than typed dialogues. We evaluated this hypothesis in an experiment where 24 participants learned computer literacy via a spoken and a typed conversation with AutoTutor, an intelligent tutoring system with conversational dialogues. The results indicated that (a) enhanced content coverage was achieved in the spoken condition; (b) learning gains for both modalities were on par and greater than a no-instruction control; (c) although speech recognition errors were unrelated to learning gains, they were linked to participants' evaluations of the tutor; (d) participants adjusted their conversational styles when speaking compared to typing; (e) semantic and statistical natural language understanding approaches to comprehending learners' responses were more resilient to speech recognition errors than syntactic and symbolic-based approaches; and (f) simulated speech recognition errors had differential impacts on the fidelity of different semantic algorithms. We discuss the impact of our findings on the speech facilitation hypothesis and on human–computer interfaces that support spoken dialogues.  相似文献   

11.
Conversational systems have become an element of everyday life for billions of users who use speech‐based interfaces to services, engage with personal digital assistants on smartphones, social media chatbots, or smart speakers. One of the most complex tasks in the development of these systems is to design the dialogue model, the logic that provided a user input selects the next answer. The dialogue model must also consider mechanisms to adapt the response of the system and the interaction style according to different groups and user profiles. Rule‐based systems are difficult to adapt to phenomena that were not taken into consideration at design‐time. However, many of the systems that are commercially available are based on rules, and so are the most widespread tools for the development of chatbots and speech interfaces. In this article, we present a proposal to: (a) automatically generate the dialogue rules from a dialogue corpus through the use of evolving algorithms, (b) adapt the rules according to the detected user intention. We have evaluated our proposal with several conversational systems of different application domains, from which our approach provided an efficient way for adapting a set of dialogue rules considering user utterance clusters.  相似文献   

12.
There is a strong relationship between evaluation and methods for automatically training language processing systems, where generally the same resource and metrics are used both to train system components and to evaluate them. To date, in dialogue systems research, this general methodology is not typically applied to the dialogue manager and spoken language generator. However, any metric for evaluating system performance can be used as a feedback function for automatically training the system. This approach is motivated with examples of the application of reinforcement learning to dialogue manager optimization, and the use of boosting to train the spoken language generator.  相似文献   

13.
This paper describes a statistically motivated framework for performing real-time dialogue state updates and policy learning in a spoken dialogue system. The framework is based on the partially observable Markov decision process (POMDP), which provides a well-founded, statistical model of spoken dialogue management. However, exact belief state updates in a POMDP model are computationally intractable so approximate methods must be used. This paper presents a tractable method based on the loopy belief propagation algorithm. Various simplifications are made, which improve the efficiency significantly compared to the original algorithm as well as compared to other POMDP-based dialogue state updating approaches. A second contribution of this paper is a method for learning in spoken dialogue systems which uses a component-based policy with the episodic Natural Actor Critic algorithm.The framework proposed in this paper was tested on both simulations and in a user trial. Both indicated that using Bayesian updates of the dialogue state significantly outperforms traditional definitions of the dialogue state. Policy learning worked effectively and the learned policy outperformed all others on simulations. In user trials the learned policy was also competitive, although its optimality was less conclusive. Overall, the Bayesian update of dialogue state framework was shown to be a feasible and effective approach to building real-world POMDP-based dialogue systems.  相似文献   

14.
Habitability refers to the match between the language people employ when using a computer system and the language that the system can accept. In this paper, the concept of “habitability” is explored in relation to the design of dialogues for speech-based systems. Two studies investigating the role of habitability in speech systems for banking applications are reported. The first study employed a speech-driven automated teller machine (ATM), using a visual display to indicate available vocabulary. Users made several distinct types of error with this system, indicating that habitability in speech systems cannot be achieved simply by displaying the input language. The second study employed a speech input/speech output home banking application, in which system constraints were indicated by either a spoken menu of words or a “query-style” prompt (e.g. “what service do you require?”). Between-subjects comparisons of these two conditions confirmed that the “menu-style” dialogue was rated as more habitable than the “query-style”. It also led to fewer errors, and was rated as easier to use, suggesting that habitability is a key issue in speech system usability. Comparison with the results of the first study suggests that for speech input, spoken menu prompts may be more habitable than similar menus shown on a visual display. The implications of these results to system design are discussed, and some initial dialogue design recommendations are presented.  相似文献   

15.
We describe an evaluation of spoken dialogue strategies designed using hierarchical reinforcement learning agents. The dialogue strategies were learnt in a simulated environment and tested in a laboratory setting with 32 users. These dialogues were used to evaluate three types of machine dialogue behaviour: hand-coded, fully-learnt and semi-learnt. These experiments also served to evaluate the realism of simulated dialogues using two proposed metrics contrasted with ‘Precision-Recall’. The learnt dialogue behaviours used the Semi-Markov Decision Process (SMDP) model, and we report the first evaluation of this model in a realistic conversational environment. Experimental results in the travel planning domain provide evidence to support the following claims: (a) hierarchical semi-learnt dialogue agents are a better alternative (with higher overall performance) than deterministic or fully-learnt behaviour; (b) spoken dialogue strategies learnt with highly coherent user behaviour and conservative recognition error rates (keyword error rate of 20%) can outperform a reasonable hand-coded strategy; and (c) hierarchical reinforcement learning dialogue agents are feasible and promising for the (semi) automatic design of optimized dialogue behaviours in larger-scale systems.  相似文献   

16.
This paper describes an experiment on the effects of learning, mode of interaction (written vs. spoken) and transfer mode on user performance and discourse organization during interaction with a natural language dialogue system. Forty-eight participants took part in a series of 12 dialogues with an information retrieval system presented either in the written or the spoken mode during the first six dialogues. The next six dialogues were then presented either in the same interaction mode or in another mode. The analysis of the results showed that performance (time, number of effective turns) improved throughout the dialogues whatever the mode of interaction. Nevertheless, performance was higher in the written mode. Moreover, mode-specific characteristics were observed. These consisted in greater use of subject pronouns and articles in the spoken mode. Similarly, in the spoken mode, the users found it easier to re-use the formulations presented in the system speech than in the written mode. Furthermore, the analysis also revealed a positive transfer effect on performance and discourse organization when the individuals first interacted in the spoken mode and then in the written mode. Both positive and negative transfer effects were observed when the individuals interacted first in the written mode followed by the spoken mode. The implications of the results are discussed in terms of direct and indirect consequences of modality effects on natural language dialogue interaction.  相似文献   

17.
We address the issue of appropriate user modeling to generate cooperative responses to users in spoken dialogue systems. Unlike previous studies that have focused on a user’s knowledge, we propose more generalized modeling. We specifically set up three dimensions for user models: the skill level in use of the system, the knowledge level about the target domain, and the degree of urgency. Moreover, the models are automatically derived by decision tree learning using actual dialogue data collected by the system. We obtained reasonable accuracy in classification for all dimensions. Dialogue strategies based on user modeling were implemented on the Kyoto City Bus Information System that was developed at our laboratory. Experimental evaluations revealed that the cooperative responses adapted to each subject type served as good guides for novices without increasing the duration dialogue lasted for skilled users.  相似文献   

18.
《Ergonomics》2012,55(13-14):1386-1407
Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.  相似文献   

19.
A multimodal interactive dialogue automaton (kiosk) for self-service is presented in the paper. Multimodal user interface allow people to interact with the kiosk by natural speech, gestures additionally to the standard input and output devices. Architecture of the kiosk contains key modules of speech processing and computer vision. An array of four microphones is applied for far-field capturing and recording of user’s speech commands, it allows the kiosk to detect voice activity, to localize sources of desired speech signals, and to eliminate environmental acoustical noises. A noise robust speaker-independent recognition system is applied to automatic interpretation and understanding of continuous Russian speech. The distant speech recognizer uses grammar of voice queries as well as garbage and silence models to improve recognition accuracy. Pair of portable video-cameras are applied for vision-based detection and tracking of user’s head and body position inside of the working area. Russian-speaking talking head serves both for bimodal audio-visual speech synthesis and for improvement of communication intelligibility by turning the head to an approaching client. Dialogue manager controls the flow of dialogue and synchronizes sub-modules for input modalities fusion and output modalities fission. The experiments made with the multimodal kiosk were directed to cognitive and usability studies of human-computer interaction by different communication means  相似文献   

20.
Chan FY  Khalid HM 《Ergonomics》2003,46(13-14):1386-1407
Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号