首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 33 毫秒
1.
2.
3.
Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language models estimated at both the word level and the phrase level are combined. An efficient weighted finite state transducer (WFST) based paraphrase generation approach is also presented. Significant error rate reductions of 0.5–0.6% absolute were obtained over the baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with word and phrase level feed-forward neural network LMs, a significant error rate reduction of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectively.  相似文献   

4.
The main task of a service robot with a voice-enabled communication interface is to engage a user in dialogue providing an access to the services it is designed for. In managing such interaction, inferring the user goal (intention) from the request for a service at each dialogue turn is the key issue. In service robot deployment conditions speech recognition limitations with noisy speech input and inexperienced users may jeopardize user goal identification. In this paper, we introduce a grounding state-based model motivated by reducing the risk of communication failure due to incorrect user goal identification. The model exploits the multiple modalities available in the service robot system to provide evidence for reaching grounding states. In order to handle the speech input as sufficiently grounded (correctly understood) by the robot, four proposed states have to be reached. Bayesian networks combining speech and non-speech modalities during user goal identification are used to estimate probability that each grounding state has been reached. These probabilities serve as a base for detecting whether the user is attending to the conversation, as well as for deciding on an alternative input modality (e.g., buttons) when the speech modality is unreliable. The Bayesian networks used in the grounding model are specially designed for modularity and computationally efficient inference. The potential of the proposed model is demonstrated comparing a conversational system for the mobile service robot RoboX employing only speech recognition for user goal identification, and a system equipped with multimodal grounding. The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation with multimodal data collected during the conversations of the robot RoboX with users.  相似文献   

5.
Language models (LMs) are important components of many applications that work with natural language, such as word prediction and completion programs, automatic speech recognition, and machine translation. In this paper, we introduce various types of improvements for LMs dealing with word prediction and completion in Hebrew. Whereas previous systems for the Hebrew language apply known variants of existing LMs without any alteration, this study presents two types of improvements concerning the LMs: one is general and the other is special for the Hebrew language. These improvements enable all tested LMs to improve their keystroke saving abilities.  相似文献   

6.
We describe the design and evaluation of two different dynamic student uncertainty adaptations in wizarded versions of a spoken dialogue tutoring system. The two adaptive systems adapt to each student turn based on its uncertainty, after an unseen human “wizard” performs speech recognition and natural language understanding and annotates the turn for uncertainty. The design of our two uncertainty adaptations is based on a hypothesis in the literature that uncertainty is an “opportunity to learn”; both adaptations use additional substantive content to respond to uncertain turns, but the two adaptations vary in the complexity of these responses. The evaluation of our two uncertainty adaptations represents one of the first controlled experiments to investigate whether substantive dynamic responses to student affect can significantly improve performance in computer tutors. To our knowledge we are the first study to show that dynamically responding to uncertainty can significantly improve learning during computer tutoring. We also highlight our ongoing evaluation of our uncertainty-adaptive systems with respect to other important performance metrics, and we discuss how our corpus can be used by the wider computer speech and language community as a linguistic resource supporting further research on effective affect-adaptive spoken dialogue systems in general.  相似文献   

7.
This paper proposes a new technique to test the performance of spoken dialogue systems by artificially simulating the behaviour of three types of user (very cooperative, cooperative and not very cooperative) interacting with a system by means of spoken dialogues. Experiments using the technique were carried out to test the performance of a previously developed dialogue system designed for the fast-food domain and working with two kinds of language model for automatic speech recognition: one based on 17 prompt-dependent language models, and the other based on one prompt-independent language model. The use of the simulated user enables the identification of problems relating to the speech recognition, spoken language understanding, and dialogue management components of the system. In particular, in these experiments problems were encountered with the recognition and understanding of postal codes and addresses and with the lengthy sequences of repetitive confirmation turns required to correct these errors. By employing a simulated user in a range of different experimental conditions sufficient data can be generated to support a systematic analysis of potential problems and to enable fine-grained tuning of the system.  相似文献   

8.
This paper presents a new technique to enhance the performance of the input interface of spoken dialogue systems based on a procedure that combines during speech recognition the advantages of using prompt-dependent language models with those of using a language model independent of the prompts generated by the dialogue system. The technique proposes to create a new speech recognizer, termed contextual speech recognizer, that uses a prompt-independent language model to allow recognizing any kind of sentence permitted in the application domain, and at the same time, uses contextual information (in the form of prompt-dependent language models) to take into account that some sentences are more likely to be uttered than others at a particular moment of the dialogue. The experiments show the technique allows enhancing clearly the performance of the input interface of a previously developed dialogue system based exclusively on prompt-dependent language models. But most important, in comparison with a standard speech recognizer that uses just one prompt-independent language model without contextual information, the proposed recognizer allows increasing the word accuracy and sentence understanding rates by 4.09% and 4.19% absolute, respectively. These scores are slightly better than those obtained using linear interpolation of the prompt-independent and prompt-dependent language models used in the experiments.  相似文献   

9.
Spoken dialogue system performance can vary widely for different users, as well for the same user during different dialogues. This paper presents the design and evaluation of an adaptive version of TOOT, a spoken dialogue system for retrieving online train schedules. Based on rules learned from a set of training dialogues, adaptive TOOT constructs a user model representing whether the user is having speech recognition problems as a particular dialogue progresses. Adaptive TOOT then automatically adapts its dialogue strategies based on this dynamically changing user model. An empirical evaluation of the system demonstrates the utility of the approach.  相似文献   

10.
Traditional dialogue systems use a fixed silence threshold to detect the end of users’ turns. Such a simplistic model can result in system behaviour that is both interruptive and unresponsive, which in turn affects user experience. Various studies have observed that human interlocutors take cues from speaker behaviour, such as prosody, syntax, and gestures, to coordinate smooth exchange of speaking turns. However, little effort has been made towards implementing these models in dialogue systems and verifying how well they model the turn-taking behaviour in human–computer interactions. We present a data-driven approach to building models for online detection of suitable feedback response locations in the user's speech. We first collected human–computer interaction data using a spoken dialogue system that can perform the Map Task with users (albeit using a trick). On this data, we trained various models that use automatically extractable prosodic, contextual and lexico-syntactic features for detecting response locations. Next, we implemented a trained model in the same dialogue system and evaluated it in interactions with users. The subjective and objective measures from the user evaluation confirm that a model trained on speaker behavioural cues offers both smoother turn-transitions and more responsive system behaviour.  相似文献   

11.
The major premise of this paper is that in order for a DSS to be effective in a given problem domain, it is important to model the decision-making behavior of the user over and above traditional problem solving concerns. This is achieved by extending the traditional planning framework based on first-order logic to include modal logic. This extended framework is then used to represent beliefs and desires of the user, communicative actions performed by the user and the system, as well as the usual goals, task-related actions etc. The essence of this approach, then, is to view natural language utterances of the user of a DSS as speech acts which can be modeled using the extended framework; this view can, in turn, be used to interpret natural language utterances of the user.  相似文献   

12.
13.
Conversational systems have become an element of everyday life for billions of users who use speech‐based interfaces to services, engage with personal digital assistants on smartphones, social media chatbots, or smart speakers. One of the most complex tasks in the development of these systems is to design the dialogue model, the logic that provided a user input selects the next answer. The dialogue model must also consider mechanisms to adapt the response of the system and the interaction style according to different groups and user profiles. Rule‐based systems are difficult to adapt to phenomena that were not taken into consideration at design‐time. However, many of the systems that are commercially available are based on rules, and so are the most widespread tools for the development of chatbots and speech interfaces. In this article, we present a proposal to: (a) automatically generate the dialogue rules from a dialogue corpus through the use of evolving algorithms, (b) adapt the rules according to the detected user intention. We have evaluated our proposal with several conversational systems of different application domains, from which our approach provided an efficient way for adapting a set of dialogue rules considering user utterance clusters.  相似文献   

14.
In this paper, we introduce the Interactive Systems Laboratories multimedia data indexing and retrieval system 'View4You'. The main components of the system, namely the segmenter, the speech recognizer and the information retrieval engine, are described in detail. In the View4You system, public television newscasts are recorded on a daily basis. The newscasts are automatically segmented and an index is created for each of the segments by means of automatic speech recognition. The user can query the system in natural language. The system returns a list of segments which is sorted by relevance with respect to the user query. By selecting a segment, the user can watch the corresponding part of the news show on his or her computer screen. Several end to end evaluations on real world data, using questions from naive users, are described. By substituting each of the components of the system with a perfect (manually simulated) one, the effect of the components' imperfection on the end to end result can be determined. We show that the information retrieval component has the largest impact on the system performance, followed by the segmentation. The quality of the speech recognizer, as long as its error rate is below approximately 25%, is shown to have only a relatively small importance.  相似文献   

15.
Visual interaction processes are modeled in this paper as sequences of visual sentences in which for each visual sentence only a limited set of user actions is possible. We introduce the notion of "dynamic visual language" as a weakly ordered set of visual sentences characterized by the presence of common elements. We present a formal model of derivation of visual sentences in a dynamic visual language in which each visual sentence specifies the possible actions which can be performed on it and the possible transformations it can go through. In this way, we offer a formal setting in which the interaction process can be formally specified. A user interface can be derived from the formal specification, so that it embeds proper context elements which limit user disorientation. The concepts are illustrated by the user interaction with a prototype of a digital library developed at the University of Bari.  相似文献   

16.
This paper describes a new genetic learning approach to the construction of a local model network (LMN) and design of a local controller network (LCN) with application to a single-link flexible manipulator. A highly nonlinear flexible manipulator system is modelled using an LMN comprising Autoregressive–moving-average model with exogenous inputs (ARMAX) type local models (LMs) whereas linear Proportional-integral-derivative (PID) type local controllers (LCs) are used to design an LCN. In addition to allowing the simultaneous optimisation of the number of LMs and LCs, model parameters and interpolation function parameters, the approach provides a flexible framework for targeting transparency and generalisation. Simulation results confirm the excellent nonlinear modelling properties of an LM network and illustrate the potential benefits of the proposed LM control scheme.  相似文献   

17.
Computation of probabilities for an island-driven parser   总被引:1,自引:0,他引:1  
The authors describe an effort to adapt island-driven parsers to handle stochastic context-free grammars. These grammars could be used as language models (LMs) by a language processor (LP) to computer the probability of a linguistic interpretation. As different islands may compete for growth, it is important to compute the probability that an LM generates a sentence containing islands and gaps between them. Algorithms for computing these probabilities are introduced. The complexity of these algorithms is analyzed both from theoretical and practical points of view. It is shown that the computation of probabilities in the presence of gaps of unknown length requires the impractical solution of a nonlinear system of equations, whereas the computation of probabilities for cases with gaps containing a known number of unknown words has polynomial time complexity and is practically feasible. The use of the results obtained in automatic speech understanding systems is discussed  相似文献   

18.
Convergence is the phenomenon in human dialogue whereby participants adopt characteristics of each other's speech. Communicants are unaware of this occurring. If it were possible to invoke such a phenomenon in a natural language interface it would provide a means of keeping user inputs within the range of lexical and syntactic coverage of the system, while keeping the dialogue ‘natural’ in the sense of requiring no more conscious effort in observing conventions of format than human-human dialogue.

A ‘Wizard of Oz’ study was conducted to test the feasibility of this technique. Subjects were required to type queries into what they thought was a natural language database querying system. On completion of input the system presented a paraphrase for confirmation by subjects before presenting the answer. The paraphrases were constructed using particular terms and syntactic structures. Subjects began to use these terms and structures spontaneously in subsequent queries.

Observation of convergence in human-computer dialogue suggests that the technique can be incorporated in user interfaces to improve communication. The implementation issues for natural language dialogue are discussed, and other applications of the technique in HCI are outlined.  相似文献   


19.
State-of-the-art large vocabulary continuous speech recognition (LVCSR) systems often combine outputs from multiple sub-systems that may even be developed at different sites. Cross system adaptation, in which model adaptation is performed using the outputs from another sub-system, can be used as an alternative to hypothesis level combination schemes such as ROVER. Normally cross adaptation is only performed on the acoustic models. However, there are many other levels in LVCSR systems’ modelling hierarchy where complimentary features may be exploited, for example, the sub-word and the word level, to further improve cross adaptation based system combination. It is thus interesting to also cross adapt language models (LMs) to capture these additional useful features. In this paper cross adaptation is applied to three forms of language models, a multi-level LM that models both syllable and word sequences, a word level neural network LM, and the linear combination of the two. Significant error rate reductions of 4.0–7.1% relative were obtained over ROVER and acoustic model only cross adaptation when combining a range of Chinese LVCSR sub-systems used in the 2010 and 2011 DARPA GALE evaluations.  相似文献   

20.
The usage patterns of speech and visual input modes are investigated as a function of relative input mode efficiency for both desktop and personal digital assistant (PDA) working environments. For this purpose the form-filling part of a multimodal dialogue system is implemented and evaluated; three multimodal modes of interaction are implemented: ldquoClick-to-Talk,rdquo ldquoOpen-Mike,rdquo and ldquoModality-Selection.rdquo ldquoModality-Selectionrdquo implements an adaptive interface where the system selects the most efficient input mode at each turn, effectively alternating between a ldquoClick-to-Talkrdquo and ldquoOpen-Mikerdquo interaction style as proposed in ldquoModality tracking in the multimodal Bell Labs Communicator,rdquo in Proceedings of the Automatic Speech Recognition and Understanding Workshop, by A. Potamianos, , 2003. The multimodal systems are evaluated and compared with the unimodal systems. Objective and subjective measures used include task completion, task duration, turn duration, and overall user satisfaction. Turn duration is broken down into interaction time and inactivity time to better measure the efficiency of each input mode. Duration statistics and empirical probability density functions are computed as a function of interaction context and user. Results show that the multimodal systems outperform the unimodal systems in terms of objective and subjective criteria. Also, users tend to use the most efficient input mode at each turn; however, biases towards the default input modality and a general bias towards the speech modality also exists. Results demonstrate that although users exploit some of the available synergies in multimodal dialogue interaction, further efficiency gains can be achieved by designing adaptive interfaces that fully exploit these synergies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号