共查询到20条相似文献,搜索用时 15 毫秒
1.
Byeongchang Kim Seonghan Ryu Gary Geunbae Lee 《Multimedia Tools and Applications》2017,76(9):11377-11390
This paper presents a system to detect multiple intents (MIs) in an input sentence when only single-intent (SI)-labeled training data are available. To solve the problem, this paper categorizes input sentences into three types and uses a two-stage approach in which each stage attempts to detect MIs in different types of sentences. In the first stage, the system generates MI hypotheses based on conjunctions in the input sentence, then evaluates the hypotheses and then selects the best one that satisfies specified conditions. In the second stage, the system applies sequence labeling to mark intents on the input sentence. The sequence labeling model is trained based on SI-labeled training data. In experiments, the proposed two-stage MI detection method reduced errors for written and spoken input by 20.54 and 17.34 % respectively. 相似文献
2.
Gokhan Tur 《Machine Learning》2007,69(1):55-74
We propose three methods for extending the Boosting family of classifiers motivated by the real-life problems we have encountered. First, we propose a semisupervised learning method for exploiting the unlabeled data in Boosting. We then present a novel classification model adaptation method. The goal of adaptation is optimizing an existing model for a new target application, which is similar to the previous one but may have different classes or class distributions. Finally, we present an efficient and effective cost-sensitive classification method that extends Boosting to allow for weighted classes. We evaluated these methods for call classification in the AT&;T VoiceTone® spoken language understanding system. Our results indicate that it is possible to obtain the same classification performance by using 30% less labeled data when the unlabeled data is utilized through semisupervised learning. Using model adaptation we can achieve the same classification accuracy using less than half of the labeled data from the new application. Finally, we present significant improvements in the “important” (i.e., higher weighted) classes without a significant loss in overall performance using the proposed cost-sensitive classification method. 相似文献
3.
Applied Intelligence - Slot filling and intent detection are two important tasks in a spoken language understanding (SLU) system, it is becoming a tendency that two tasks are jointing learn in SLU.... 相似文献
4.
Dilek Hakkani-Tür Frdric Bchet Giuseppe Riccardi Gokhan Tur 《Computer Speech and Language》2006,20(4):495-514
We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. 相似文献
5.
In this paper, we address the issue of generating in-domain language model training data when little or no real user data are available. The two-stage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. In the second stage, two sampling methods are explored to filter the synthetic corpus to achieve a desired probability distribution of the semantic content, both on the sentence level and on the class level. The first method utilizes user simulation technology, which obtains the probability model via an interplay between a probabilistic user model and the dialogue system. The second method synthesizes novel dialogue interactions from the raw data by modelling after a small set of dialogues produced by the developers during the course of system refinement. Evaluation is conducted on recognition performance in a restaurant information domain. We show that a partial match to usage-appropriate semantic content distribution can be achieved via user simulations. Furthermore, word error rate can be reduced when limited amounts of in-domain training data are augmented with synthetic data derived by our methods.
相似文献
Stephanie SeneffEmail: |
6.
Sabato Marco Siniscalchi Jeremy Reed Torbjørn Svendsen Chin-Hui Lee 《Computer Speech and Language》2013,27(1):209-227
We propose a novel universal acoustic characterization approach to spoken language recognition (LRE). The key idea is to describe any spoken language with a common set of fundamental units that can be defined “universally” across all spoken languages. In this study, speech attributes, such as manner and place of articulation, are chosen to form this unit inventory and used to build a set of language-universal attribute models with data-driven modeling techniques. The vector space modeling approach to LRE is adopted, where a spoken utterance is first decoded into a sequence of attributes independently of its language. Then, a feature vector is generated by using co-occurrence statistics of manner or place units, and the final LRE decision is implemented with a vector space language classifier. Several architectural configurations will be studied, and it will be shown that best performance is attained using a maximal figure-of-merit language classifier. Experimental evidence not only demonstrates the feasibility of the proposed techniques, but it also shows that the proposed technique attains comparable performance to standard approaches on the LRE tasks investigated in this work when the same experimental conditions are adopted. 相似文献
7.
Cybernetics and Systems Analysis - 相似文献
8.
Helmer Strik Micha Hulsbosch Catia Cucchiarini 《Language Resources and Evaluation》2010,44(1-2):41-58
The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring “islands of pronunciation reduction” that contain (potential) MWEs can be identified in a large speech corpus. 相似文献
9.
We describe a spoken dialogue interface with a mobile robot, which a human can direct to specific locations, ask for information
about its status, and supply information about its environment. The robot uses an internal map for navigation, and communicates
its current orientation and accessible locations to the dialogue system. In this article, we focus on linguistic and inferential
aspects of the human–robot communication process.
This work was conducted at ICCS, School of Informatics, University of Edinburgh, Edinburgh, Scotland, UK
This work was presented in part at the 11th International Symposium on Artificial Life and Robotics, Oita, Japan, January
23–25, 2006 相似文献
10.
WOLFGANG MENZEL 《人工智能实验与理论杂志》2013,25(1):77-89
Abstract. The eliminative nature of constraint satisfaction over finite domains offers an interesting potentialfor robustness in the parsing of spoken language. An approach is presented that puts unusually ambitious demands on the design of the constraint satisfaction procedure by trying to combine preferential reasoning, dynamic scheduling, parallel processing, and incremental constraint solving, within a coherent solution. 相似文献
11.
Manfred Stede 《Artificial Intelligence Review》1992,6(4):383-414
Practical natural language understanding systems used to be concerned with very small miniature domains only: They knew exactly what potential text might be about, and what kind of sentence structures to expect. This optimistic assumption is no longer feasible if NLU is to scale up to deal with text that naturally occurs in the "real world". The key issue is robustness: The system needs to be prepared for cases where the input data does not correspond to the expectations encoded in the grammar. In this paper, we survey the approaches towards the robustness problem that have been developed throughout the last decade. We inspect techniques to overcome both syntactically and semantically ill-formed input in sentence parsing and then look briefly into more recent ideas concerning the extraction of information from texts, and the related question of the role that linguistic research plays in this game. Finally, the robust sentence parsing schemes are classified on a more abstract level of analysis.Dept. of Computer Science, University of TorontoFor helpful comments on earlier drafts of this paper, I thank Judy Dick, Graeme Hirst, Diane Horton, Kem Luther, and Jan Wiebe. Financial support by the University of Toronto is acknowledged. Communication and requests for reprints should be directed to the author at Department of Computer Science, University of Toronto, Toronto, Canada M5S 1A4. 相似文献
12.
多媒体信息由于维度高、数据量大、可解释性差等特征制约了其检索性能,提出了基于自然语言理解的智能化多媒体信息检索系统模型。该系统基于自然语言理解、数据挖掘、自反馈等技术的运用,在一定程度上扩大了检索范围,提高了检索准确率。 相似文献
13.
Sankaranarayanan Ananthakrishnan Dennis N. Mehay Sanjika Hewavitharana Rohit Kumar Matt Roy Enoch Kan 《Machine Translation》2015,29(1):25-47
Lexical ambiguity can cause critical failure in conversational spoken language translation (CSLT) systems that rely on statistical machine translation (SMT) if the wrong sense is presented in the target language. Interactive CSLT systems offer the capability to detect and pre-empt such word-sense translation errors (WSTEs) by engaging the human operators in a precise clarification dialogue aimed at resolving the problem. This paper presents an end-to-end framework for accurate detection and interactive resolution of WSTEs to minimize communication errors due to ambiguous source words. We propose (a) a novel, extensible, two-level classification architecture for identifying potential WSTEs in SMT hypotheses; (b) a constrained phrase-pair clustering mechanism for identifying the translated sense of ambiguous source words in SMT hypotheses; and (c) an interactive strategy that integrates this information to request specific clarifying information from the operator. By leveraging unsupervised and lightly supervised learning techniques, our approach minimizes the need for expensive human annotation in developing each component of this framework. Each component, as well as the overall framework, was evaluated in the context of an interactive English-to-Iraqi Arabic CSLT system. 相似文献
14.
Koliya Pulasinghe Keigo Watanabe Kiyotaka Izumi Kazuo Kiguchi 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2004,34(1):293-302
We present a methodology of controlling machines using spoken language commands. The two major problems relating to the speech interfaces for machines, namely, the interpretation of words with fuzzy implications and the out-of-vocabulary (OOV) words in natural conversation, are investigated. The system proposed in this paper is designed to overcome the above two problems in controlling machines using spoken language commands. The present system consists of a hidden Markov model (HMM) based automatic speech recognizer (ASR), with a keyword spotting system to capture the machine sensitive words from the running utterances and a fuzzy-neural network (FNN) based controller to represent the words with fuzzy implications in spoken language commands. Significance of the words, i.e., the contextual meaning of the words according to the machine's current state, is introduced to the system to obtain more realistic output equivalent to users' desire. Modularity of the system is also considered to provide a generalization of the methodology for systems having heterogeneous functions without diminishing the performance of the system. The proposed system is experimentally tested by navigating a mobile robot in real time using spoken language commands. 相似文献
15.
Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-specified, and have meaning only when interpreted by humans. We are interested in developing computational systems which represent words, utterances, and underlying concepts in terms of sensory-motor experiences leading to richer levels of machine understanding. A key element of this work is the development of effective architectures for processing multisensory data. Inspired by theories of infant cognition, we present a computational model which learns words from untranscribed acoustic and video input. Channels of input derived from different sensors are integrated in an information-theoretic framework. Acquired words are represented in terms of associations between acoustic and visual sensory experience. The model has been implemented in a real-time robotic system which performs interactive language learning and understanding. Successful learning has also been demonstrated using infant-directed speech and images. 相似文献
16.
Verónica López-Ludeña Roberto Barra-Chicote Syaheerah Lutfi Juan Manuel Montero Rubén San-Segundo 《Expert systems with applications》2013,40(4):1283-1295
This paper describes the development of LSESpeak, a spoken Spanish generator for Deaf people. This system integrates two main tools: a sign language into speech translation system and an SMS (Short Message Service) into speech translation system. The first tool is made up of three modules: an advanced visual interface (where a deaf person can specify a sequence of signs), a language translator (for generating the sequence of words in Spanish), and finally, an emotional text to speech (TTS) converter to generate spoken Spanish. The visual interface allows a sign sequence to be defined using several utilities. The emotional TTS converter is based on Hidden Semi-Markov Models (HSMMs) permitting voice gender, type of emotion, and emotional strength to be controlled. The second tool is made up of an SMS message editor, a language translator and the same emotional text to speech converter. Both translation tools use a phrase-based translation strategy where translation and target language models are trained from parallel corpora. In the experiments carried out to evaluate the translation performance, the sign language-speech translation system reported a 96.45 BLEU and the SMS-speech system a 44.36 BLEU in a specific domain: the renewal of the Identity Document and Driving License. In the evaluation of the emotional TTS, it is important to highlight the improvement in the naturalness thanks to the morpho-syntactic features, and the high flexibility provided by HSMMs when generating different emotional strengths. 相似文献
17.
In this paper, a spoken query system is demonstrated which can be used to access the latest agricultural commodity prices and weather information in Kannada language using mobile phone. The spoken query system consists of Automatic Speech Recognition (ASR) models, Interactive Voice Response System (IVRS) call flow, Agricultural Marketing Network (AGMARKNET) and India Meteorological Department (IMD) databases. The ASR models are developed by using the Kaldi speech recognition toolkit. The task specific speech data is collected from the different dialect regions of Karnataka (a state in India speaks Kannada language) to develop ASR models. The web crawler is used to get the commodity price and weather information from AGMARKNET and IMD websites. The postgresql database management system is used to manage the crawled data. The 80 and 20% of validated speech data is used for system training and testing respectively. The accuracy and Word Error Rate (WER) of ASR models are highlighted and end to end spoken query system is developed for Kannada language. 相似文献
18.
探讨了一种新型自然语言理解系统构架以及在此构架中的一个典型子系统.反馈式自然语言处理系统对某个单句,尤其是较难理解的单句,进行了反复多次、顾前瞻后地理解.它作为一个大平台,挂接各种基于经典算法的子系统,而且也便于扩充.上下文无关文法理论成熟,以及基于其的各种算法也比较成熟.在反馈式自然语言处理系统中,使用了基于上下文无关文法的算法对简单的名词性短语进行了分析的方法. 相似文献
19.
基于自然语言理解的相似度计算仍是计算机语言处理技术尚需深入研究的内容.通过在“知网”知识表示的基础上,综合考虑深度和密度两方面的影响因素,利用一种较为成熟的改进的多因素语义相似度处理算法,基于全文检索匹配技术,设计并实现了一个限定领域内的在线答疑系统.实例运行结果表明,系统可靠性较高,且答疑效果较为明显,达到了预期目标. 相似文献
20.
Nakamura A. Watanabe S. Hori T. McDermott E. Katagiri S. 《Computational Intelligence Magazine, IEEE》2006,1(2):5-9
Recent developments in research on humanoid robots and interactive agents have highlighted the importance of and expectation on automatic speech recognition (ASR) as a means of endowing such an agent with the ability to communicate via speech. This article describes some of the approaches pursued at NTT Communication Science Laboratories (NTT-CSL) for dealing with such challenges in ASR. In particular, we focus on methods for fast search through finite-state machines, Bayesian solutions for modeling and classification of speech, and a discriminative training approach for minimizing errors in large vocabulary continuous speech recognition. 相似文献