共查询到20条相似文献,搜索用时 31 毫秒
1.
Potamianos A. Fosler-Lussier E. Ammicht E. Perakakis M. 《Multimedia, IEEE Transactions on》2007,9(3):550-566
For pt.1see ibid., vol. 9, p. 3 (2007). In this paper, the task and user interface modules of a multimodal dialogue system development platform are presented. The main goal of this work is to provide a simple, application-independent solution to the problem of multimodal dialogue design for information seeking applications. The proposed system architecture clearly separates the task and interface components of the system. A task manager is designed and implemented that consists of two main submodules: the electronic form module that handles the list of attributes that have to be instantiated by the user, and the agenda module that contains the sequence of user and system tasks. Both the electronic forms and the agenda can be dynamically updated by the user. Next a spoken dialogue module is designed that implements the speech interface for the task manager. The dialogue manager can handle complex error correction and clarification user input, building on the semantics and pragmatic modules presented in Part I of this paper. The spoken dialogue system is evaluated for a travel reservation task of the DARPA Communicator research program and shown to yield over 90% task completion and good performance for both objective and subjective evaluation metrics. Finally, a multimodal dialogue system which combines graphical and speech interfaces, is designed, implemented and evaluated. Minor modifications to the unimodal semantic and pragmatic modules were required to build the multimodal system. It is shown that the multimodal system significantly outperforms the unimodal speech-only system both in terms of efficiency (task success and time to completion) and user satisfaction for a travel reservation task 相似文献
2.
近年来,随着人工智能的发展与智能设备的普及,人机智能对话技术得到了广泛的关注。口语语义理解是口语对话系统中的一项重要任务,而口语意图检测是口语语义理解中的关键环节。由于多轮对话中存在语义缺失、框架表示以及意图转换等复杂的语言现象,因此面向多轮对话的意图检测任务十分具有挑战性。为了解决上述难题,文中提出了基于门控机制的信息共享网络,充分利用了多轮对话中的上下文信息来提升检测性能。具体而言,首先结合字音特征构建当前轮文本和上下文文本的初始表示,以减小语音识别错误对语义表示的影响;其次,使用基于层级化注意力机制的语义编码器得到当前轮和上下文文本的深层语义表示,包含由字到句再到多轮文本的多级语义信息;最后,通过在多任务学习框架中引入门控机制来构建基于门控机制的信息共享网络,使用上下文语义信息辅助当前轮文本的意图检测。实验结果表明,所提方法能够高效地利用上下文信息来提升口语意图检测效果,在全国知识图谱与语义计算大会(CCKS2018)技术评测任务2的数据集上达到了88.1%的准确率(Acc值)和88.0%的综合正确率(F1值),相比于已有的方法显著提升了性能。 相似文献
3.
4.
This article describes the various knowledge sources that, in general, are required to handle multimodal human-machine interaction efficiently: these are called the task, user, dialogue, environment and system models. The first part discusses the content of these models. Special emphasis is given on problems that occur when speech is combined with other modalities. The second part focuses on spoken language characteristics and proposes an adapted semantic representation for the task model. It also describes a stochastic method to collect and process the information related to this model. The conclusion discusses an extension of such a stochastic method to multimodality. 相似文献
5.
6.
领域外话语的开放性、口语化以及表达多样性,使得现有的限定领域口语对话系统不能很好地处理超出领域话语。该文提出了一种限定领域口语对话系统协处理方案,基于人工智能标记语言AIML,设计一套理解开放语义用户话语的理解模板,并对未匹配话语基于话语相似度进行理解模板分类,进而采用扩展有限状态自动机处理模式,结合对话流程上下文的状态及信息,实现理解模板到应答模板的转换,改变了单纯模板匹配方法在对话流程控制方面的相对缺失。中文手机导购领域的测试表明,该文所提出的协处理方法能有效地辅助口语对话系统完成限定领域完整对话流程,得到更好的用户满意度。
相似文献
相似文献
7.
8.
9.
针对中文口语问句的表达多样性对对话系统问题理解带来的挑战,该文采用“在语法结构之上获取语义知识”的设计理念,提出了一种语法和语义相结合的口语对话系统问题理解方法。首先人工编制了独立于领域和应用方向的语法知识库,进而通过句子压缩模块简化复杂句子,取得结构信息,再进行问题类型模式识别,得到唯一确定问题的语义组织方法、查询策略和应答方式的句型模式。另一方面,根据领域语义知识库,从源句子中提取相应的语义信息,并根据识别到的句型模式所对应的知识组织方法进行语义知识组织,完成对问句的理解。该文的方法被应用到开发的中文手机导购对话系统。测试结果表明,该方法能有效地完成对话流程中的用户问题理解。 相似文献
10.
11.
《Advanced Robotics》2013,27(1-2):209-232
We describe an implementation integrating a complete spoken dialogue system with a mobile robot, which a human can direct to specific locations, ask for information about its status and supply information about its environment. The robot uses an internal map for navigation, and communicates its current orientation and accessible locations to the dialogue system using a topological map as interface. We focus on linguistic and inferential aspects of the human–robot communication process. The result is a novel approach using a principled semantic theory combined with techniques from automated deduction applied to a mobile robot platform. Due to the abstract level of the dialogue system, it is easily portable to other environments or applications. 相似文献
12.
Hélène Bonneau-Maynard Matthieu Quignard Alexandre Denis 《Language Resources and Evaluation》2009,43(4):329-354
The aim of the French Media project was to define a protocol for the evaluation of speech understanding modules for dialog systems. Accordingly, a corpus
of 1,257 real spoken dialogs related to hotel reservation and tourist information was recorded, transcribed and semantically
annotated, and a semantic attribute-value representation was defined in which each conceptual relationship was represented
by the names of the attributes. Two semantic annotation levels are distinguished in this approach. At the first level, each
utterance is considered separately and the annotation represents the meaning of the statement without taking into account
the dialog context. The second level of annotation then corresponds to the interpretation of the meaning of the statement
by taking into account the dialog context; in this way a semantic representation of the dialog context is defined. This paper
discusses the data collection, the detailed definition of both annotation levels, and the annotation scheme. Then the paper
comments on both evaluation campaigns which were carried out during the project and discusses some results. 相似文献
13.
《Knowledge》2006,19(3):153-163
Spoken dialogue systems can be considered knowledge-based systems designed to interact with users using speech in order to provide information or carry out simple tasks. Current systems are restricted to well-known domains that provide knowledge about the words and sentences the users will likely utter. Basically, these systems rely on an input interface comprised of speech recogniser and semantic analyser, a dialogue manager, and an output interface comprised of response generator and speech synthesiser. As an attempt to enhance the performance of the input interface, this paper proposes a technique based on a new type of speech recogniser comprised of two modules. The first one is a standard speech recogniser that receives the sentence uttered by the user and generates a graph of words. The second module analyses the graph and produces the recognised sentence using the context knowledge provided by the current prompt of the system. We evaluated the performance of two input interfaces working in a previously developed dialogue system: the original interface of the system and a new one that features the proposed technique. The experimental results show that when the sentences uttered by the users are out-of-context analysed by the new interface, the word accuracy and sentence understanding rates increase by 93.71 and 77.42% absolute, respectively, regarding the original interface. The price to pay for this clear enhancement is a little reduction in the scores when the new interface analyses sentences in-context, as they decrease by 2.05 and 3.41% absolute, respectively, in comparison with the original interface. Given that in real dialogues sentences may be out-of-context analysed, specially when they are uttered by inexperienced users, the technique can be very useful to enhance the system performance. 相似文献
14.
《Artificial Intelligence in Engineering》1999,13(4):373-384
This paper describes a domain-limited system for speech understanding as well as for speech translation. An integrated semantic decoder directly converts the preprocessed speech signal into its semantic representation by a maximum a-posteriori classification. With the combination of probabilistic knowledge on acoustic, phonetic, syntactic, and semantic levels, the semantic decoder extracts the most probable meaning of the utterance. No separate speech recognition stage is needed because of the integration of the Viterbi-algorithm (calculating acoustic probabilities by the use of Hidden-Markov-Models) and a probabilistic chart parser (calculating semantic and syntactic probabilities by special models). The semantic structure is introduced as a representation of an utterance's meaning. It can be used as an intermediate level for a succeeding intention decoder (within a speech understanding system for the control of a running application by spoken inputs) as well as an interlingua-level for a succeeding language production unit (within an automatic speech translation system for the creation of spoken output in another language). Following the above principles and using the respective algorithms, speech understanding and speech translating front-ends for the domains ‘graphic editor’, ‘service robot’, ‘medical image visualisation’ and ‘scheduling dialogues’ could be successfully realised. 相似文献
15.
D. Albesano P. Baggia M. Danieli R. Gemello E. Gerbino C. Rullent 《International Journal of Speech Technology》1997,2(2):101-111
This paper presents a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions even for people with poor recognition performance. The robust behaviour of the system has been achieved by combining the use of specific language models during the recognition phase of analysis, the tolerance toward spontaneous speech phenomena, the activity of a robust parser, and the use of pragmatic-based dialogue knowledge. This integration of the different modules allows the system to deal with partial or total breakdowns at other levels of analysis. We report the field trial data of the system with respect to speech recognition metrics of word accuracy and sentence understanding rate, time-to-completion, time-to-acquisition of crucial parameters, and degree of success of the interactions in providing the speakers with the information they required. The evaluation data show that most of the subjects were able to interact fruitfully with the system. These results suggest that the design choices made to achieve robust behaviour are a promising way to create usable spoken language telephone systems. 相似文献
16.
赋予聊天机器人个人信息对于提供自然的对话至关重要,对此提出具有个人信息的对话模型,包括问题分类、个人信息回复和开放域对话三个模块.在问题分类模块中,分析测试不同分类方法的效果;个人信息回复模块利用BiLSTM进行语义信息编码,训练采用对比损失函数,同时实验对比多种匹配模型;开放域对话模型以最大互信息为目标函数,减少无意... 相似文献
17.
Rubén San-Segundo Juan M. Montero Javier Macías-Guarasa Javier Ferreiros José M. Pardo 《International Journal of Speech Technology》2005,8(1):45-66
In this paper, we propose a strategy for designing dialogue managers in spoken dialogue systems for a restricted domain. This strategy combines several information sources intuition, observation and simulation, in order to maximize the adaptation within the system capability and the expectation of the user. These sources are combined by an iterative process consisting of five steps, where different dialogue alternatives are proposed and evaluated sequentially. The evaluation process includes different measures depending on the information required. Several measures are proposed and analyzed in each step. We also describe a user-modeling technique and an approach for designing the confirmation sub-dialogues based on recognition confidence measures. The knowledge-combining methodology is described and applied to a railway information system. In a subjective evaluation, users from the university gave the system a 3.9 score on a 5-point scale with an average call duration of 205 seconds. The employers of the railway company were more critical of the system. They gave it a score of 2.1 even though the system resolved more than half of the calls (57.8%) within an average call duration of three minutes (185 seconds). 相似文献
18.
This paper describes a statistically motivated framework for performing real-time dialogue state updates and policy learning in a spoken dialogue system. The framework is based on the partially observable Markov decision process (POMDP), which provides a well-founded, statistical model of spoken dialogue management. However, exact belief state updates in a POMDP model are computationally intractable so approximate methods must be used. This paper presents a tractable method based on the loopy belief propagation algorithm. Various simplifications are made, which improve the efficiency significantly compared to the original algorithm as well as compared to other POMDP-based dialogue state updating approaches. A second contribution of this paper is a method for learning in spoken dialogue systems which uses a component-based policy with the episodic Natural Actor Critic algorithm.The framework proposed in this paper was tested on both simulations and in a user trial. Both indicated that using Bayesian updates of the dialogue state significantly outperforms traditional definitions of the dialogue state. Policy learning worked effectively and the learned policy outperformed all others on simulations. In user trials the learned policy was also competitive, although its optimality was less conclusive. Overall, the Bayesian update of dialogue state framework was shown to be a feasible and effective approach to building real-world POMDP-based dialogue systems. 相似文献
19.
J.M. Lucas-Cuesta J. Ferreiros F. Fernández-Martı´nez J.D. Echeverry S. Lutfi 《Expert systems with applications》2013,40(4):1069-1085
We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to significantly improve the performance of the speech recognition, which leads to an improvement in both the language understanding and the dialogue management tasks. 相似文献