期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

John Dines Hui Liang Lakshmi Saheer Matthew Gibson William Byrne Keiichiro Oura Keiichi Tokuda Junichi Yamagishi Simon King Mirjam Wester Teemu Hirsimäki Reima Karhila Mikko Kurimo 《Computer Speech and Language》2013,27(2):420-437

In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the user by way of speech recognition. In this work we present results of several different unsupervised and cross-lingual adaptation approaches as well as an end-to-end speaker adaptive speech-to-speech translation system. Our experiments show that we can successfully apply speaker adaptation in both unsupervised and cross-lingual scenarios and our proposed algorithms seem to generalise well for several language pairs. We also discuss important future directions including the need for better evaluation metrics. 相似文献

2.

BBN TransTalk: Robust multilingual two-way speech-to-speech translation for mobile platforms

Rohit Prasad Prem Natarajan David Stallard Shirin Saleem Shankar Ananthakrishnan Stavros Tsakalidis Chia-lin Kao Fred Choi Ralf Meermeier Mark Rawls Jacob Devlin Kriste Krstovski Aaron Challenner 《Computer Speech and Language》2013,27(2):475-491

In this paper we present a speech-to-speech (S2S) translation system called the BBN TransTalk that enables two-way communication between speakers of English and speakers who do not understand or speak English. The BBN TransTalk has been configured for several languages including Iraqi Arabic, Pashto, Dari, Farsi, Malay, Indonesian, and Levantine Arabic. We describe the key components of our system: automatic speech recognition (ASR), machine translation (MT), text-to-speech (TTS), dialog manager, and the user interface (UI). In addition, we present novel techniques for overcoming specific challenges in developing high-performing S2S systems. For ASR, we present techniques for dealing with lack of pronunciation and linguistic resources and effective modeling of ambiguity in pronunciations of words in these languages. For MT, we describe techniques for dealing with data sparsity as well as modeling context. We also present and compare different user confirmation techniques for detecting errors that can cause the dialog to drift or stall. 相似文献

3.

A formal model of emotions for an empathic rational dialog agent

Magalie Ochs David Sadek Catherine Pelachaud 《Autonomous Agents and Multi-Agent Systems》2012,24(3):410-440

Recent research has shown that virtual agents expressing empathic emotions toward users have the potential to enhance human–machine interaction. To provide empathic capabilities to a rational dialog agent, we propose a formal model of emotions based on an empirical and theoretical analysis of the users’ conditions of emotion elicitation. The emotions are represented by particular mental states of the agent, composed of beliefs, uncertainties and intentions. This semantically grounded formal representation enables a rational dialog agent to identify from a dialogical situation the empathic emotion that it should express. An implementation and an evaluation of an empathic rational dialog agent have enabled us to validate the proposed model of empathy. 相似文献

4.

The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks

Bowen Zhou Xiaodong Cui Songfang Huang Martin Cmejrek Wei Zhang Jian Xue Jia Cui Bing Xiang Gregg Daggett Upendra Chaudhari Sameer Maskey Etienne Marcheret 《Computer Speech and Language》2013,27(2):592-618

This paper describes our recent improvements to IBM TRANSTAC speech-to-speech translation systems that address various issues arising from dealing with resource-constrained tasks, which include both limited amounts of linguistic resources and training data, as well as limited computational power on mobile platforms such as smartphones. We show how the proposed algorithms and methodologies can improve the performance of automatic speech recognition, statistical machine translation, and text-to-speech synthesis, while achieving low-latency two-way speech-to-speech translation on mobiles. 相似文献

5.

User modeling in dialog systems: Potentials and hazards 总被引：1，自引：0，他引：1

Alfred Kobsa 《AI & Society》1990,4(3):214-231

In order to be capable of exhibiting a wide range of cooperative behavior, a computer-based dialog system must have available assumptions about the current user's goals, plans, background knowledge and (false) beliefs, i.e., maintain a so-called user model. Apart from cooperativity aspects, such a model is also necessary for intelligent coherent dialog behavior in general. This article surveys recent research on the problem of how such a model can be constructed, represented and used by a system during its interaction with the user. Possible applications, as well as potential problems concerning the advisability of application, are then discussed. Finally, a number of guidelines are presented which should be observed in future research to reduce the risk of a potential misuse of user modeling technology. 相似文献

6.

The RavenClaw dialog management framework: Architecture and systems

Dan Bohus Alexander I. Rudnicky 《Computer Speech and Language》2009,23(3):332-361

In this paper, we describe RavenClaw, a plan-based, task-independent dialog management framework. RavenClaw isolates the domain-specific aspects of the dialog control logic from domain-independent conversational skills, and in the process facilitates rapid development of mixed-initiative systems operating in complex, task-oriented domains. System developers can focus exclusively on describing the dialog task control logic, while a large number of domain-independent conversational skills such as error handling, timing and turn-taking are transparently supported and enforced by the RavenClaw dialog engine. To date, RavenClaw has been used to construct and deploy a large number of systems, spanning different domains and interaction styles, such as information access, guidance through procedures, command-and-control, medical diagnosis, etc. The framework has easily adapted to all of these domains, indicating a high degree of versatility and scalability. 相似文献

7.

Collaborative recommendation of e‐learning resources: an experimental investigation

N. Manouselis R. Vuorikari F. Van Assche 《Journal of Computer Assisted Learning》2010,26(4):227-242

Repositories with educational resources can support the formation of online learning communities by providing a platform for collaboration. Users (e.g. teachers, tutors and learners) access repositories, search for interesting resources to access and use, and in many cases, also exchange experiences and opinions. A particular class of online services that take advantage of the collected knowledge and experience of users are collaborative filtering ones. The successful operation of such services in the context of real‐life applications requires careful testing and parameterization before their actual deployment. In this paper, the case of developing a learning resources' collaborative filtering service for an online community of teachers in Europe was examined. More specifically, a data set of evaluations of learning resources was collected from the teachers that use the European Schoolnet's learning resource portal. These evaluations were then used to support the experimental investigation of design choices for an online collaborative filtering service for the portal's learning resources. A candidate multi‐attribute utility collaborative filtering algorithm was appropriately parameterized and tested for this purpose. Results indicated that the development of such systems should be taking place considering the particularities of the actual communities that are to be served. 相似文献

8.

Model-based mixture discriminant analysis—an experimental study

Zohar Halbe 《Pattern recognition》2005,38(3):437-440

The subject of this paper is an experimental study of a discriminant analysis (DA) based on Gaussian mixture estimation of the class-conditional densities. Five parameterizations of the covariance matrixes of the Gaussian components are studied. Recommendation for selection of the suitable parameterization of the covariance matrixes is given. 相似文献

9.

The pragmatic quality of Resources‐ Events‐Agents diagrams: an experimental evaluation

Geert Poels Ann Maes Frederik Gailly Roland Paemeleire 《Information Systems Journal》2011,21(1):63-89

The Resources‐Events‐Agents (REA) model is a semantic data model for the development of enterprise information systems. Although this model has been proposed as a benchmark for enterprise information modelling, only few studies have attempted to empirically validate the claimed benefits of REA modelling. Moreover, these studies focused on the evaluation of REA‐based system implementations rather than directly assessing the REA‐modelled conceptual schemas that these systems are based on. This paper presents a laboratory experiment that measured the user understanding of diagrammatic conceptual schemas developed using the REA model. The theoretical foundation for the hypotheses are cognitive theories that explain pattern recognition phenomena and the resulting reduction in cognitive effort for understanding conceptual schemas. The results of the experiment indicate a more accurate understanding of the business processes and policies modelled when users recognize the REA model’s core pattern of enterprise information in the diagram. The implication for modelling practice is that the use of the REA model improves the requirements engineering process by facilitating the user validation of conceptual schemas produced by analysts, and thus helps ensuring the quality of the enterprise information system that is developed or implemented. 相似文献

10.

VoiceXML dialog system of the multimodal IP-Telephony—The application for voice ordering service

Min-Jen Tsai 《Expert systems with applications》2006,31(4):684-696

The development of IP-Telephony in recent years has been substantial. The improvement in voice quality, the integration between voice and data, especially the interaction with multimedia has made the 3G communication more promising. The value added services of Telephony techniques alleviate the dependence on the phone and provide a universal platform for the multimodal telephony applications. For example, the web-based application with VoiceXML has been developed to simplify the human–machine interaction because it takes the advantage of the speech-enabled services and makes the telephone-web access a reality. However, it is not cost-efficient to build voice only stand-alone web application and is more reasonable that voice interfaces should be retrofitted to be compatible or collaborate with the existing HTML or XML-based web applications. Therefore, this paper considers that the functionality of the web service should enable multiple access modalities so that users can perceive and interact with the site in either visual or speech response simultaneously. Under this principle, our research develops a prototype system of multimodal VoIP with the integrated web-based Mandarin dialog system which adopts automatic speech recognition (ASR), text-to-speech (TTS), VoiceXML browser, and VoIP technologies to create user friendly graphic user interface (GUI) and voice user interface (VUI). The users can use traditional telephone, cellular phone, or even VoIP connection via personal computer to interact with the VoiceXML server. In the mean time, the users browse the web and access the same content with common HTML or XML-based browser. The proposed system shows excellent performance and can be easily incorporated into voice ordering service for a wider accessibility. 相似文献

11.

Complementary computing: policies for transferring callers from dialog systems to human receptionists

Eric Horvitz Tim Paek 《User Modeling and User-Adapted Interaction》2007,17(1-2):159-182

We describe a study of the use of decision-theoretic policies for optimally joining human and automated problem-solving efforts. We focus specifically on the challenge of determining when it is best to transfer callers from an automated dialog system to human receptionists. We demonstrate the sensitivities of transfer actions to both the inferred competency of the spoken-dialog models and the current sensed load on human receptionists. The policies draw upon probabilistic models constructed via machine learning from cases that were logged by a call routing service deployed at our organization. We describe the learning of models that predict outcomes and interaction times and show how these models can be used to generate expected-utility policies that identify when it is best to transfer callers to human operators. We explore the behavior of the policies with simulations constructed from real-world call data. See D’Agostino (2005) for a reflection from the business community about the failure to date of automated speech recognition systems to penetrate widely. 相似文献

12.

Foundations of dialog engineering: the development of human-computer interaction. Part II

《International journal of man-machine studies》1986,24(2):101-123

The human-computer interface is increasingly the major determinant of the success or failure of computer systems. It is time that we provided foundations of engineering human-computer interaction (HCI) as explicit and well-founded as those for hardware and software engineering. Through the influences of other disciplines and their contribution to software engineering, a rich environment for HCI studies, theory and applications now exists. Many principles underlying HCI have systemic foundations independent of the nature of the systems taking part and these may be analysed control-theoretically and information-theoretically. The fundamental principles at different levels may be used in the practical design of dialog shells for engineering effective HCI. This paper surveys the development of styles of dialog through generations of computers, the principles involved, and the move towards integrated systems. It then systematically explores the foundations of HCI by analysing the various analogies to HCI possible when the parties are taken to be general systems, equipment, computers or people. 相似文献

13.

Genetic process mining: an experimental evaluation 总被引：4，自引：0，他引：4

A. K. A. de Medeiros A. J. M. M. Weijters W. M. P. van der Aalst 《Data mining and knowledge discovery》2007,14(2):245-304

One of the aims of process mining is to retrieve a process model from an event log. The discovered models can be used as objective starting points during the deployment of process-aware information systems (Dumas et al., eds., Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, New York, 2005) and/or as a feedback mechanism to check prescribed models against enacted ones. However, current techniques have problems when mining processes that contain non-trivial constructs and/or when dealing with the presence of noise in the logs. Most of the problems happen because many current techniques are based on local information in the event log. To overcome these problems, we try to use genetic algorithms to mine process models. The main motivation is to benefit from the global search performed by this kind of algorithms. The non-trivial constructs are tackled by choosing an internal representation that supports them. The problem of noise is naturally tackled by the genetic algorithm because, per definition, these algorithms are robust to noise. The main challenge in a genetic approach is the definition of a good fitness measure because it guides the global search performed by the genetic algorithm. This paper explains how the genetic algorithm works. Experiments with synthetic and real-life logs show that the fitness measure indeed leads to the mining of process models that are complete (can reproduce all the behavior in the log) and precise (do not allow for extra behavior that cannot be derived from the event log). The genetic algorithm is implemented as a plug-in in the ProM framework. 相似文献

14.

Linux update: an experimental ATM network

Jepsen T.C. Wright S.A. Klevans R.L. Ze Zhang 《Software, IEEE》1999,16(5):32-39

Both private and public networks increasingly use ATM to provide high bandwidth and guaranteed quality of service. The ability to use Linux in PC platforms and the ready availability of Linux source code make the PC-based Linux workstation an ideal platform for ATM multimedia development. This article describes some of Fujitsu's ATM tools for Linux and reports the results of prototyping using Fujitsu's ATM/Linux development environment 相似文献

15.

InParS: an experimental intelligent parallelization system

《Advances in Engineering Software》1999,30(2):93-101

In this paper we discuss the design and implementation of an intelligent program parallelization system, called InParS. This system in based on intelligent parallelization models proposed by many researchers in the area of parallelizing compilers. The presented experiment is one of few attempts toward investigating the viability of artificial intelligence techniques in automatic program parallelization. The early version of InParS was aimed at transforming Fortran-like DO loops into a vector code well-suited for vector processors. The new version of InParS targets distributed memory parallel computers. Some preliminary research results are also presented, which give an indication of how incorporating artificial intelligence techniques can contribute towards the success of automatic program parallelization. 相似文献

16.

Evaluating mass knowledge acquisition using the ALICE chatterbot: The AZ-ALICE dialog system

《International journal of human-computer studies》2006,64(11):1132-1140

In this paper, we evaluate mass knowledge acquisition using modified ALICE chatterbots. In particular we investigate the potential of allowing subjects to modify chatterbot responses to see if distributed learning from a web environment can succeed. This experiment looks at dividing knowledge into general conversation and domain specific categories for which we have selected telecommunications. It was found that subject participation in knowledge acquisition can contribute a significant improvement to both the conversational and telecommunications knowledge bases. We further found that participants were more satisfied with domain-specific responses rather than general conversation. 相似文献

17.

Active acquisition of user models: Implications for decision-theoretic dialog planning and plan recognition

Dekai Wu 《User Modeling and User-Adapted Interaction》1991,1(2):149-172

This article investigates the implications ofactive user model acquisition upon plan recognition, domain planning, and dialog planning in dialog architectures. A dialog system performs active user model acquisition by querying the user during the course of the dialog. Existing systems employ passive strategies that rely on inferences drawn from passive observation of the dialog. Though passive acquisition generally reduces unnecessary dialog, in some cases the system can effectively shorten the overall dialog length by selectively initiating subdialogs for acquiring information about the user.We propose a theory identifying conditions under which the dialog system should adoptactive acquisition goals. Active acquisition imposes a set ofrationality requirements not met by current dialog architectures. To ensure rational dialog decisions, we propose significant extensions to plan recognition, domain planning, and dialog planning models, incorporating decision-theoretic heuristics for expected utility. The most appropriate framework for active acquisition is a multi-attribute utility model wherein plans are compared along multiple dimensions of utility. We suggest a general architectural scheme, and present an example from a preliminary implementation.The author will be at the Department of Computer Science, University of Toronto, untilThe author will be at the Department of Computer Science, University of Toronto, untilThe author will be at the Department of Computer Science, University of Toronto, untilThe author will be at the Department of Computer Science, University of Toronto, untilThe author will be at the Department of Computer Science, University of Toronto, untilThe author will be at the Department of Computer Science, University of Toronto, until 相似文献

18.

融合角色、结构和语义的口语对话预训练语言模型

黄健李锋《计算机应用研究》2022,39(8)

口语语言理解是任务式对话系统的重要组件,预训练语言模型在口语语言理解中取得了重要突破,然而这些预训练语言模型大多是基于大规模书面文本语料。考虑到口语与书面语在结构、使用条件和表达方式上的明显差异,构建了大规模、双角色、多轮次、口语对话语料,并提出融合角色、结构和语义的四个自监督预训练任务：全词掩码、角色预测、话语内部反转预测和轮次间互换预测,通过多任务联合训练面向口语的预训练语言模型SPD-BERT（SPoken Dialog-BERT）。在金融领域智能客服场景的三个人工标注数据集——意图识别、实体识别和拼音纠错上进行详细的实验测试,实验结果验证了该语言模型的有效性。相似文献

19.

Hyperspectral change detection: an experimental comparative study

Mahdi Hasanlou Seyd Teymoor Seydi 《International journal of remote sensing》2013,34(20):7029-7083

ABSTRACT

The Earth’s surface is constantly changing due to variations originating from the increasing human population. In the last decade, numerous methods were presented in the literature for change detection using multispectral image data. Owing to the increasing availability of hyperspectral images, these methods are now being applied to hyperspectral images. The main objective of this study is to present different change detection methods in hyperspectral imagery. Numerous algorithms (more than 43 algorithms) have been proposed for change detection in hyperspectral imagery over the last decade. In this work, we provide a comparative review of these algorithms through experimental results. We place the algorithms in five major groups: (1) match-based, (2) transformation-based, (3) direct classification-based, (4) post-classification-based, and (5) hybrid-based. We evaluate and compare the performances of all five groups using two real-world data sets of multi-temporal hyperspectral imagery. This comparative study investigates the advantages and disadvantages of the effects of preprocessing steps in the efficiency of the hyperspectral change detection (HSCD) methods. These preprocessing steps are considered in four scenarios, including: (1) considering only spatial or geometric correction without noise reduction and spectral correction; (2) spatial, atmospheric, and radiometric corrections without noise reduction; (3) spatial correction and noise reduction without atmospheric and radiometric corrections; and (4) spatial, atmospheric, and radiometric correction with noise reduction. The empirical results, followed by a summary of the pros and cons of each algorithm, aim to help researchers select the procedures with the best characteristics for HSCD applications. 相似文献

20.

Structured flowcharts outperform pseudocode: an experimental comparison

Scanlan D.A. 《Software, IEEE》1989,6(5):28-36

The author discovered, while teaching a course on data structures, that his students overwhelmingly preferred structured flowcharts over pseudocode for comprehending the algorithms presented. He describes an experiment that he designed to find out if real differences in comprehension exist between structured flowcharts and pseudocode when used to describe conditional logic. He hypothesized that structured flowcharts (1) take less time to comprehend, (2) produce fewer errors in understanding, (3) give students more confidence in their understanding of an algorithm, (4) reduce the time spent answering questions about an algorithm, and (5) reduce the number of times students need to look at an algorithm. These hypotheses were tested on three algorithms of varying complexity. The results strongly indicate that structured flowcharts do indeed aid algorithm comprehension. A large difference was found even for the simplest algorithm.<> 相似文献