共查询到20条相似文献,搜索用时 15 毫秒
1.
Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions
restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and
accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support
client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands
based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse
at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that
time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the
next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate
their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared
to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via
online updating of model parameters based on individual user data. Personalization significantly increased relative reduction
in error rate by an additional 5%. 相似文献
2.
In this paper we present a statistical approach to question answering (QA). Our motivation is to build robust systems for many languages without the need for highly tuned linguistic modules. Consequently, word tokens and web data are used extensively but neither explicit linguistic knowledge nor annotated data is incorporated. A mathematical model for answer retrieval and answer classification is derived. Experiments are conducted by searching for answers in the AQUAINT corpus, as well as in web data. The redundancy inherent in web data outperforms retrieval from a fixed corpus, where there are typically relatively few answer occurrences for any given question. We participated with an implementation of this framework in the TREC 2006 QA evaluations, where we ranked 9th among 27 participants on the factoid task. 相似文献
3.
Two research projects are described that explore the use of spoken natural language interfaces to virtual reality (VR) systems. Both projects combine off-the-shelf speech recognition and synthesis technology with in-house command interpreters that interface to the VR applications. Details about the interpreters and other technical aspects of the projects are provided, together with a discussion of some of the design decisions involved in the creation of speech interfaces. Questions and issues raised by the projects are presented as inspiration for future work. These issues include: requirements for object and information representation in VR models to support natural language interfaces; use of the visual context to establish the interaction context; difficulties with referencing events in the virtual world; and problems related to the usability of speech and natural language interfaces in general. 相似文献
4.
The process of Natural Language Generation for a Conversational Agent translates some semantic language to its surface form expressed in natural language. In this paper, we are going to show a Case Based Reasoning technique which is easily extensible and adaptable to multiple domains and languages, that generates coherent phrases and produces a natural outcome in the context of a Conversational Agent that maintains a dialogue with the user. 相似文献
5.
We discuss development of a word-unigram language model for online handwriting recognition. First, we tokenize a text corpus
into words, contrasting with tokenization methods designed for other purposes. Second, we select for our model a subset of
the words found, discussing deviations from an N-most-frequent-words approach. From a 600-million-word corpus, we generated a 53,000-word model which eliminates 45% of word-recognition
errors made by a character-level-model baseline system. We anticipate that our methods will be applicable to offline recognition
as well, and to some extent to other recognizers, such as speech recognizers and video retrieval systems.
Received: November 1, 2001 / Revised version: July 22, 2002 相似文献
6.
This paper proposes the use of Maximum A Posteriori Linear Regression (MAPLR) transforms as feature for language recognition. Rather than estimating the transforms using maximum likelihood linear regression (MLLR), MAPLR inserts the priori information of the transforms in the estimation process using maximum a posteriori (MAP) as the estimation criterion to drive the transforms. By multi MAPLR adaptation each language spoken utterance is convert to one discriminative transform supervector consist of one target language transform vector and other non-target transform vectors. SVM classifiers are employed to model the discriminative MAPLR transform supervector. This system can achieve performance comparable to that obtained with state-of-the-art approaches and better than MLLR. Experiment results on 2007 NIST Language Recognition Evaluation (LRE) databases show that relative decline in EER of 4% and on mincost of 9% are obtained after the language recognition system using MAPLR instead of MLLR in 30-s tasks, and further improvement is gained combining with state-of-the-art systems. It leads to gains of 6% on EER and 11% on minDCF comparing with the performance of the only combination of the MMI system and the GMM-SVM system. 相似文献
7.
The article describes aspects of the development of a conversational natural language understanding (NLU) system done during the first year of the European research project CATCH-2004 (Converse in AThens Cologne and Helsinki) [http://www.catch2004.org]. The project is co-funded by the European Union in the scope of the IST programme (IST 1999-11103). Its objectives focus on multi-modal, multi-lingual conversational natural language access to information systems. The paper emphasises on architecture, and telephony-based speech and NLU components as well as aspects of the implementation of a city event information (CEI) system in English, Finnish, German and Greek. The CEI system accesses two different databases in Athens and Helsinki using a common retrieval interface. Furthermore the paper singles out methodologies involved for acoustic and language model of the speech recognition component, parsing techniques and dialog modelling for the conversational natural language subsystem. For the implementation it outlines an incremental system refinement methodology necessary to adapt the system components to real-life data. It addresses the implementation of language specific characteristics and a common dialog design for all four languages, but also deals with aspects towards a multilingual conversational system. Finally, it presents prospects for further developments of the project. 相似文献
8.
We outline an approach to parsing based on system modelling. The underlying assumption, which determines the limits of the approach, is that a narrative natural language text constitutes a symbolic model of the system described, written for the purpose of communicating static and/or dynamic system aspects. 相似文献
9.
We propose a novel language model for Hangul text recognition. Without relying on prior linguistic knowledge in training, the proposed model learns variable length Hangul character sequences, which comprise the elementary tokens of Korean language, and their probabilities from statistics of a raw text corpus. Experiments in handwritten Hangul recognition shows that the proposed language model is effective in postprocessing of recognition results. 相似文献
10.
连续语音识别技术,是集语音处理、模式识别、句法和语义分析于一体的综合性语音处理技术,能够识别任意的连续语音,如一个句子或一段话,大大提高了语音交互的连续性和体验度,是语音识别技术的核心之一。本文介绍了连续语音识别技术的研究现状及几种常见的技术方法,并且分析探讨了连续语音识别技术的应用和发展前景。 相似文献
11.
Unconstrained off-line continuous handwritten text recognition is a very challenging task which has been recently addressed by different promising techniques. This work presents our latest contribution to this task, integrating neural network language models in the decoding process of three state-of-the-art systems: one based on bidirectional recurrent neural networks, another based on hybrid hidden Markov models and, finally, a combination of both. Experimental results obtained on the IAM off-line database demonstrate that consistent word error rate reductions can be achieved with neural network language models when compared with statistical N-gram language models on the three tested systems. The best word error rate, 16.1%, reported with ROVER combination of systems using neural network language models significantly outperforms current benchmark results for the IAM database. 相似文献
12.
This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy. 相似文献
13.
本文介绍了一个拥有2000个说话者的面向移动电话应用的粤语语音数据库,该语音库用于电话应用方面的语音识别研究。在简单介绍本语音数据库的开发背景后,着重介绍了该语音库的结构、内容、特点和注释规范。 相似文献
14.
In knowledge discovery in a text database, extracting and returning a subset of information highly relevant to a user's query is a critical task. In a broader sense, this is essentially identification of certain personalized patterns that drives such applications as Web search engine construction, customized text summarization and automated question answering. A related problem of text snippet extraction has been previously studied in information retrieval. In these studies, common strategies for extracting and presenting text snippets to meet user needs either process document fragments that have been delimitated a priori or use a sliding window of a fixed size to highlight the results. In this work, we argue that text snippet extraction can be generalized if the user's intention is better utilized. It overcomes the rigidness of existing approaches by dynamically returning more flexible start-end positions of text snippets, which are also semantically more coherent. This is achieved by constructing and using statistical language models which effectively capture the commonalities between a document and the user intention. Experiments indicate that our proposed solutions provide effective personalized information extraction services. 相似文献
15.
语音处理技术应用于语言的学习,可以改进、加强和丰富传统的语言学习方法,有利于提高学习效率;然而,如何有效地利用这些技术,建立各种计算机辅助语言学习 CALL(Computer-Aided Language Learning)系统,是当前语音处理技术研究发展的一个重要方向。该文通过对传统的语言教学方法中所涉及到的主要因素的分析,研究了语音处理技术在语言学习系统中应用的主要层次和方法,以及使其更有效等方面的问题。 相似文献
16.
Hidden Markov models have been found very useful for a wide range of applications in machine learning and pattern recognition. The wavelet transform has emerged as a new tool for signal and image analysis. Learning models for wavelet coefficients have been mainly based on fixed-length sequences, but real applications often require to model variable-length, very long or real-time sequences. In this paper, we propose a new learning architecture for sequences analyzed on short-term basis, but not assuming stationarity within each frame. Long-term dependencies will be modeled with a hidden Markov model which, in each internal state, will deal with the local dynamics in the wavelet domain, using a hidden Markov tree. The training algorithms for all the parameters in the composite model are developed using the expectation-maximization framework. This novel learning architecture could be useful for a wide range of applications. We detail two experiments with artificial and real data: model-based denoising and speech recognition. Denoising results indicate that the proposed model and learning algorithm are more effective than previous approaches based on isolated hidden Markov trees. In the case of the ‘Doppler’ benchmark sequence, with 1024 samples and additive white noise, the new method reduced the mean squared error from 1.0 to 0.0842. The proposed methods for feature extraction, modeling and learning, increased the phoneme recognition rates in 28.13%, with better convergence than models based on Gaussian mixtures. 相似文献
17.
Automatic understanding and recognition of human shopping behavior has many potential applications, attracting an increasing interest in the marketing domain. The reliability and performance of the automatic recognition system is highly influenced by the adopted theoretical model of behavior. In this work, we address the analogy between human shopping behavior and a natural language. The adopted methodology associates low-level information extracted from video data with semantic information using the proposed behavior language model. Our contribution on the action recognition level consists of proposing a new feature set which fuses Histograms of Optical Flow (HOF) with directional features. On the behavior level we propose combining smoothed bi-grams with the maximum dependency in a chain of conditional probabilities. The experiments are performed on both laboratory and real-life datasets. The introduced behavior language model achieves an accuracy of 87% on the laboratory data and 76% on the real-life dataset, an improvement of 11% and 8% respectively over the baseline model, by incorporating semantic knowledge and capturing correlations between the basic actions. 相似文献
18.
如何将自然语言表述的初等几何命题自动转化为计算机可理解的作图语言是自然语言处理中的空白,也是实现教育软件人机交互的难点。文中通过对几何范围内的受限自然语言的研究,建立了有效可行的语言理解模型,实现了从自然语言到形式化规则的自动转化,并且设计出相应的软件。 相似文献
19.
A language space provides a unified framework to deal with the properties of language constructs by associating them with their specification rules. The concrete syntax is represented by segments of the language space. The semantics is given by derived operations of the algebras where these constructs are interpreted by the processing tools operating on the language space. We examine in this paper only processing tools that collect syntactic information over the language space. Tools involved in semantics processing such as translators and interpreters are also integrated in the language space but are not discussed here. 相似文献
20.
Based on the research model, language anxiety, prior non-native language experience, Internet self-efficacy and language self-efficacy are analyzed for the intention to use non-native language commercial web sites, respectively. Prior non-native language experience has affected language anxiety, language self-efficacy and intention to use non-native language commercial web sites, respectively. By the same token, whether or not Internet self-efficacy and language self-efficacy affected by language anxiety is also examined. A valid sample of 418 undergraduates was tested in this study. Regression analysis results fully supported the model tested. These results suggest that language anxiety, prior non-native language experience, language self-efficacy and Internet self-efficacy have an effect on the intention to use non-native language commercial web sites. Prior non-native language experience has significantly affected language anxiety, language self-efficacy and the intention to use the non-native language commercial web sites, respectively. Furthermore, language anxiety has significantly affected language self-efficacy and Internet self-efficacy, respectively. Educational research and practitioner implications are provided at the end of the paper. 相似文献
|