首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Spoken language understanding (SLU) aims at extracting meaning from natural language speech. Over the past decade, a variety of practical goal-oriented spoken dialog systems have been built for limited domains. SLU in these systems ranges from understanding predetermined phrases through fixed grammars, extracting some predefined named entities, extracting users' intents for call classification, to combinations of users' intents and named entities. In this paper, we present the SLU system of VoiceTone/spl reg/ (a service provided by AT&T where AT&T develops, deploys and hosts spoken dialog applications for enterprise customers). The SLU system includes extracting both intents and the named entities from the users' utterances. For intent determination, we use statistical classifiers trained from labeled data, and for named entity extraction we use rule-based fixed grammars. The focus of our work is to exploit data and to use machine learning techniques to create scalable SLU systems which can be quickly deployed for new domains with minimal human intervention. These objectives are achieved by 1) using the predicate-argument representation of semantic content of an utterance; 2) extending statistical classifiers to seamlessly integrate hand crafted classification rules with the rules learned from data; and 3) developing an active learning framework to minimize the human labeling effort for quickly building the classifier models and adapting them to changes. We present an evaluation of this system using two deployed applications of VoiceTone/spl reg/.  相似文献   

2.
We propose a novel universal acoustic characterization approach to spoken language recognition (LRE). The key idea is to describe any spoken language with a common set of fundamental units that can be defined “universally” across all spoken languages. In this study, speech attributes, such as manner and place of articulation, are chosen to form this unit inventory and used to build a set of language-universal attribute models with data-driven modeling techniques. The vector space modeling approach to LRE is adopted, where a spoken utterance is first decoded into a sequence of attributes independently of its language. Then, a feature vector is generated by using co-occurrence statistics of manner or place units, and the final LRE decision is implemented with a vector space language classifier. Several architectural configurations will be studied, and it will be shown that best performance is attained using a maximal figure-of-merit language classifier. Experimental evidence not only demonstrates the feasibility of the proposed techniques, but it also shows that the proposed technique attains comparable performance to standard approaches on the LRE tasks investigated in this work when the same experimental conditions are adopted.  相似文献   

3.
This paper presents a system to detect multiple intents (MIs) in an input sentence when only single-intent (SI)-labeled training data are available. To solve the problem, this paper categorizes input sentences into three types and uses a two-stage approach in which each stage attempts to detect MIs in different types of sentences. In the first stage, the system generates MI hypotheses based on conjunctions in the input sentence, then evaluates the hypotheses and then selects the best one that satisfies specified conditions. In the second stage, the system applies sequence labeling to mark intents on the input sentence. The sequence labeling model is trained based on SI-labeled training data. In experiments, the proposed two-stage MI detection method reduced errors for written and spoken input by 20.54 and 17.34 % respectively.  相似文献   

4.
This work proposes a design for robust control system of a single-gimbal control moment gyro (SGCMG) driven by a hollow ultrasonic motor. Considering the nonlinear characteristic of the whole system, the fuzzy Takage-Sugeno control theory was introduced to achieve a robust control over wide range stability. Based on the proposed control theory, the parameters of feedback control of close loop were determined, and a whole system model has been build for system simulation and robust controller research. The simulation results have shown the effectiveness of the proposed control algorithm. The proposed controller was implemented on an embedded microcontroller unit through C language. On the actual system of SGCMG driven by ultrasonic motor, a sinusoidal signal speed tracking has shown that the speed error less 0.5°/s and the close loop system express a good stability over a referenced speed range of 0.2–72°/s. Step response experiments shown that the system fast response time without overshot. The experiments have verified the proposed robust control algorithm in speed, robustness and stability.  相似文献   

5.
A new non-linear tracking control algorithm based on an attitude error quaternion is studied in this paper. The control law developed here uses the commanded attitude rate without transformation into the body frame. The direct use of the commanded attitude rate simplifies the calculation of its derivative, which is used in the control law. The solutions and the equilibrium points of the closed-loop system, which is a time-varying non-linear system, are obtained in different scenarios. In order to analyse the stability of the system and the tracking performance, two different forms of perturbation dynamics with seven state variables are introduced. Local stability and performance analysis shows that the eigenvalues of the linearized perturbation dynamics are determined only by the gain matrices in the control algorithm and the inertia matrix. The existence of globally stable tracking control is proved using a Lyapunov function. Simulation results show that the spacecraft can track the commanded attitude and rate quickly for a non-zero acceleration rate command.  相似文献   

6.
We propose three methods for extending the Boosting family of classifiers motivated by the real-life problems we have encountered. First, we propose a semisupervised learning method for exploiting the unlabeled data in Boosting. We then present a novel classification model adaptation method. The goal of adaptation is optimizing an existing model for a new target application, which is similar to the previous one but may have different classes or class distributions. Finally, we present an efficient and effective cost-sensitive classification method that extends Boosting to allow for weighted classes. We evaluated these methods for call classification in the AT&;T VoiceTone® spoken language understanding system. Our results indicate that it is possible to obtain the same classification performance by using 30% less labeled data when the unlabeled data is utilized through semisupervised learning. Using model adaptation we can achieve the same classification accuracy using less than half of the labeled data from the new application. Finally, we present significant improvements in the “important” (i.e., higher weighted) classes without a significant loss in overall performance using the proposed cost-sensitive classification method.  相似文献   

7.
李艳玲  颜永红 《计算机应用》2015,35(7):1965-1968
标注数据的获取一直是有监督方法需要面临的一个难题,针对中文口语理解任务中的意图识别研究了结合主动学习和自训练、协同训练两种弱监督训练方法,提出在级联框架下,从关键语义概念识别中获取语义类特征子集和句子本身的字特征子集分别作为两个"视角"的特征进行协同训练。通过在中文口语语料上进行的实验表明:结合主动学习和自训练的方法与被动学习、主动学习相比较,可以最大限度地降低人工标注量;而协同训练在很少的初始标注数据的前提下,利用两个特征子集进行协同训练,最终使得单一字特征子集上的分类错误率平均下降了0.52%。  相似文献   

8.
9.
Abstract. The eliminative nature of constraint satisfaction over finite domains offers an interesting potentialfor robustness in the parsing of spoken language. An approach is presented that puts unusually ambitious demands on the design of the constraint satisfaction procedure by trying to combine preferential reasoning, dynamic scheduling, parallel processing, and incremental constraint solving, within a coherent solution.  相似文献   

10.
The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring “islands of pronunciation reduction” that contain (potential) MWEs can be identified in a large speech corpus.  相似文献   

11.
We describe a spoken dialogue interface with a mobile robot, which a human can direct to specific locations, ask for information about its status, and supply information about its environment. The robot uses an internal map for navigation, and communicates its current orientation and accessible locations to the dialogue system. In this article, we focus on linguistic and inferential aspects of the human–robot communication process. This work was conducted at ICCS, School of Informatics, University of Edinburgh, Edinburgh, Scotland, UK This work was presented in part at the 11th International Symposium on Artificial Life and Robotics, Oita, Japan, January 23–25, 2006  相似文献   

12.
Recently published results (Basher and Mukundan 1987) concerning hypersurface attractivity for systems with delays are discussed. The proof for hypersurface attractivity is in doubt, as is indicated in this note.  相似文献   

13.
14.
15.
Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-specified, and have meaning only when interpreted by humans. We are interested in developing computational systems which represent words, utterances, and underlying concepts in terms of sensory-motor experiences leading to richer levels of machine understanding. A key element of this work is the development of effective architectures for processing multisensory data. Inspired by theories of infant cognition, we present a computational model which learns words from untranscribed acoustic and video input. Channels of input derived from different sensors are integrated in an information-theoretic framework. Acquired words are represented in terms of associations between acoustic and visual sensory experience. The model has been implemented in a real-time robotic system which performs interactive language learning and understanding. Successful learning has also been demonstrated using infant-directed speech and images.  相似文献   

16.
This paper describes the development of LSESpeak, a spoken Spanish generator for Deaf people. This system integrates two main tools: a sign language into speech translation system and an SMS (Short Message Service) into speech translation system. The first tool is made up of three modules: an advanced visual interface (where a deaf person can specify a sequence of signs), a language translator (for generating the sequence of words in Spanish), and finally, an emotional text to speech (TTS) converter to generate spoken Spanish. The visual interface allows a sign sequence to be defined using several utilities. The emotional TTS converter is based on Hidden Semi-Markov Models (HSMMs) permitting voice gender, type of emotion, and emotional strength to be controlled. The second tool is made up of an SMS message editor, a language translator and the same emotional text to speech converter. Both translation tools use a phrase-based translation strategy where translation and target language models are trained from parallel corpora. In the experiments carried out to evaluate the translation performance, the sign language-speech translation system reported a 96.45 BLEU and the SMS-speech system a 44.36 BLEU in a specific domain: the renewal of the Identity Document and Driving License. In the evaluation of the emotional TTS, it is important to highlight the improvement in the naturalness thanks to the morpho-syntactic features, and the high flexibility provided by HSMMs when generating different emotional strengths.  相似文献   

17.
Recent developments in research on humanoid robots and interactive agents have highlighted the importance of and expectation on automatic speech recognition (ASR) as a means of endowing such an agent with the ability to communicate via speech. This article describes some of the approaches pursued at NTT Communication Science Laboratories (NTT-CSL) for dealing with such challenges in ASR. In particular, we focus on methods for fast search through finite-state machines, Bayesian solutions for modeling and classification of speech, and a discriminative training approach for minimizing errors in large vocabulary continuous speech recognition.  相似文献   

18.
在分析一般语音语料库建设方法的基础上,结合实际语料库需求和地域语言特点,提出了适用于电话信道维吾尔语口语语料库建设的设计规范及语音采集、标注方法,建立了时长300小时的电话信道维吾尔语口语语料库,并就电话信道对线性预测倒谱系数(LPCC)、梅尔倒谱系数(MFCC)、自适应成分加权特征(ACW)倒谱等语音特征参数的影响进行分析研究。  相似文献   

19.
Spoken language resources (SLRs) are essential for both research and application development. In this article we clarify the concept of SLR validation. We define validation and how it differs from evaluation. Further, relevant principles of SLR validation are outlined. We argue that the best way to validate SLRs is to implement validation throughout SLR production and have it carried out by an external and experienced institute. We address which tasks should be carried out by the validation institute, and which not. Further, we list the basic issues that validation criteria for SLR should address. A standard validation protocol is shown, illustrating how validation can prove its value throughout the production phase in terms of pre-validation, full validation and pre-release validation.
Henk van den HeuvelEmail:
  相似文献   

20.
利用预训练语言模型(pre-trained language models,PLM)提取句子的特征表示,在处理下游书面文本的自然语言理解的任务中已经取得了显著的效果。但是,当将其应用于口语语言理解(spoken language understanding,SLU)任务时,由于前端语音识别(automatic speech recognition,ASR)的错误,会导致SLU精度的下降。因此,本文研究如何增强PLM提高SLU模型对ASR错误的鲁棒性。具体来讲,通过比较ASR识别结果和人工转录结果之间的差异,识别出连读和删除的文本组块,通过设置新的预训练任务微调PLM,使发音相近的文本组块产生类似的特征嵌入表示,以达到减轻ASR错误对PLM影响的目的。通过在3个基准数据集上的实验表明,所提出的方法相比之前的方法,精度有较大提升,验证方法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号