首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a system to detect multiple intents (MIs) in an input sentence when only single-intent (SI)-labeled training data are available. To solve the problem, this paper categorizes input sentences into three types and uses a two-stage approach in which each stage attempts to detect MIs in different types of sentences. In the first stage, the system generates MI hypotheses based on conjunctions in the input sentence, then evaluates the hypotheses and then selects the best one that satisfies specified conditions. In the second stage, the system applies sequence labeling to mark intents on the input sentence. The sequence labeling model is trained based on SI-labeled training data. In experiments, the proposed two-stage MI detection method reduced errors for written and spoken input by 20.54 and 17.34 % respectively.  相似文献   

2.
Spoken language understanding (SLU) aims at extracting meaning from natural language speech. Over the past decade, a variety of practical goal-oriented spoken dialog systems have been built for limited domains. SLU in these systems ranges from understanding predetermined phrases through fixed grammars, extracting some predefined named entities, extracting users' intents for call classification, to combinations of users' intents and named entities. In this paper, we present the SLU system of VoiceTone/spl reg/ (a service provided by AT&T where AT&T develops, deploys and hosts spoken dialog applications for enterprise customers). The SLU system includes extracting both intents and the named entities from the users' utterances. For intent determination, we use statistical classifiers trained from labeled data, and for named entity extraction we use rule-based fixed grammars. The focus of our work is to exploit data and to use machine learning techniques to create scalable SLU systems which can be quickly deployed for new domains with minimal human intervention. These objectives are achieved by 1) using the predicate-argument representation of semantic content of an utterance; 2) extending statistical classifiers to seamlessly integrate hand crafted classification rules with the rules learned from data; and 3) developing an active learning framework to minimize the human labeling effort for quickly building the classifier models and adapting them to changes. We present an evaluation of this system using two deployed applications of VoiceTone/spl reg/.  相似文献   

3.
Sun  Chengai  Lv  Liangyu  Liu  Tailu  Li  Tangjun 《Applied Intelligence》2022,52(6):6057-6064
Applied Intelligence - Slot filling and intent detection are two important tasks in a spoken language understanding (SLU) system, it is becoming a tendency that two tasks are jointing learn in SLU....  相似文献   

4.
APL continuous system modelling program (APL-CSMP) is an APL based compiler and simulator system that enables the user to define and execute mathematical equations which may contain certain predefined mathematical functions and special operation blocks, supplied by APL-CSMP. Facilities allowing definition and changing of parametric data, end conditions, error bounds for computations, selection of the integration method, and control of the display of the results produced by the simulation run can also be included within the model. The source language has a syntax very similar to that of CSMP-III. Models are directly entered via the APL terminal in the form of mathematical equations and analysed by a compiler. The result of the compilation is a set of automatically generated APL functions incorporating an APL program equivalent to the model.  相似文献   

5.
PEP (Program Editor and Processor) is an interactive programming system based on an Algol-like language. It is intended to replace BASIC as a system for interactive program development on small computers (LSI-11). The language processed by the system allows declaration of variables, constants and procedures; it has structured statements for conditional and repetitive execution of program parts. We describe design and implementation of the system and give our impressions after 1 year of experience with the system.  相似文献   

6.
《电子技术应用》2016,(7):83-86
设计了一套智能仿生双向手语翻译系统,该系统主要由STM32微处理器、LD3320非特定语音识别模块、SYN6288语音合成芯片等组成,能够实现语音与手势的双向翻译。其中语音转手势部分可通过语音识别模块获得指令,手语机器人根据指令完成语音转动作的翻译。手势转语音部分通过数据手套捕获手臂的动作和姿态,识别手语动作,控制手语机器人发出语音。该系统具有成本低、识别度高、使用方便等优势,具有良好的应用前景。  相似文献   

7.
8.
This paper presents a new technique to enhance the performance of the input interface of spoken dialogue systems based on a procedure that combines during speech recognition the advantages of using prompt-dependent language models with those of using a language model independent of the prompts generated by the dialogue system. The technique proposes to create a new speech recognizer, termed contextual speech recognizer, that uses a prompt-independent language model to allow recognizing any kind of sentence permitted in the application domain, and at the same time, uses contextual information (in the form of prompt-dependent language models) to take into account that some sentences are more likely to be uttered than others at a particular moment of the dialogue. The experiments show the technique allows enhancing clearly the performance of the input interface of a previously developed dialogue system based exclusively on prompt-dependent language models. But most important, in comparison with a standard speech recognizer that uses just one prompt-independent language model without contextual information, the proposed recognizer allows increasing the word accuracy and sentence understanding rates by 4.09% and 4.19% absolute, respectively. These scores are slightly better than those obtained using linear interpolation of the prompt-independent and prompt-dependent language models used in the experiments.  相似文献   

9.
In this paper, we address the issue of generating in-domain language model training data when little or no real user data are available. The two-stage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. In the second stage, two sampling methods are explored to filter the synthetic corpus to achieve a desired probability distribution of the semantic content, both on the sentence level and on the class level. The first method utilizes user simulation technology, which obtains the probability model via an interplay between a probabilistic user model and the dialogue system. The second method synthesizes novel dialogue interactions from the raw data by modelling after a small set of dialogues produced by the developers during the course of system refinement. Evaluation is conducted on recognition performance in a restaurant information domain. We show that a partial match to usage-appropriate semantic content distribution can be achieved via user simulations. Furthermore, word error rate can be reduced when limited amounts of in-domain training data are augmented with synthetic data derived by our methods.
Stephanie SeneffEmail:
  相似文献   

10.
We developed and evaluated a multimodal affect detector that combines conversational cues, gross body language, and facial features. The multimodal affect detector uses feature-level fusion to combine the sensory channels and linear discriminant analyses to discriminate between naturally occurring experiences of boredom, engagement/flow, confusion, frustration, delight, and neutral. Training and validation data for the affect detector were collected in a study where 28 learners completed a 32- min. tutorial session with AutoTutor, an intelligent tutoring system with conversational dialogue. Classification results supported a channel × judgment type interaction, where the face was the most diagnostic channel for spontaneous affect judgments (i.e., at any time in the tutorial session), while conversational cues were superior for fixed judgments (i.e., every 20 s in the session). The analyses also indicated that the accuracy of the multichannel model (face, dialogue, and posture) was statistically higher than the best single-channel model for the fixed but not spontaneous affect expressions. However, multichannel models reduced the discrepancy (i.e., variance in the precision of the different emotions) of the discriminant models for both judgment types. The results also indicated that the combination of channels yielded superadditive effects for some affective states, but additive, redundant, and inhibitory effects for others. We explore the structure of the multimodal linear discriminant models and discuss the implications of some of our major findings.  相似文献   

11.
The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring “islands of pronunciation reduction” that contain (potential) MWEs can be identified in a large speech corpus.  相似文献   

12.
A new interactive command language was developed to implement integrated computer-aided design in computer control applications. It is intended to operate at the supervisory level in a real-time hierarchical system.The integrated approach is based on convolution algebra and provides a unified conceptual framework for studying the behaviour of practical systems in the time domain. Consequently, as an alternative to choosing fixed system structures and design procedures, the user may evaluate experimental procedures suggested by the underlying theory.The language has been implemented on a minicomputer at Bradford University and applied to industrial control problems. The highly flexible and practical aspects of the language are demonstrated.  相似文献   

13.
Spoken language resources (SLRs) are essential for both research and application development. In this article we clarify the concept of SLR validation. We define validation and how it differs from evaluation. Further, relevant principles of SLR validation are outlined. We argue that the best way to validate SLRs is to implement validation throughout SLR production and have it carried out by an external and experienced institute. We address which tasks should be carried out by the validation institute, and which not. Further, we list the basic issues that validation criteria for SLR should address. A standard validation protocol is shown, illustrating how validation can prove its value throughout the production phase in terms of pre-validation, full validation and pre-release validation.
Henk van den HeuvelEmail:
  相似文献   

14.
Named Entity Recognition and Classification (NERC) is an important component of applications like Opinion Tracking, Information Extraction, or Question Answering. When these applications require to work in several languages, NERC becomes a bottleneck because its development requires language-specific tools and resources like lists of names or annotated corpora. This paper presents a lightly supervised system that acquires lists of names and linguistic patterns from large raw text collections in western languages and starting with only a few seeds per class selected by a human expert. Experiments have been carried out with English and Spanish news collections and with the Spanish Wikipedia. Evaluation of NE classification on standard datasets shows that NE lists achieve high precision and reveals that contextual patterns increase recall significantly. Therefore, it would be helpful for applications where annotated NERC data are not available such as those that have to deal with several western languages or information from different domains.  相似文献   

15.
In this paper, a spoken query system is demonstrated which can be used to access the latest agricultural commodity prices and weather information in Kannada language using mobile phone. The spoken query system consists of Automatic Speech Recognition (ASR) models, Interactive Voice Response System (IVRS) call flow, Agricultural Marketing Network (AGMARKNET) and India Meteorological Department (IMD) databases. The ASR models are developed by using the Kaldi speech recognition toolkit. The task specific speech data is collected from the different dialect regions of Karnataka (a state in India speaks Kannada language) to develop ASR models. The web crawler is used to get the commodity price and weather information from AGMARKNET and IMD websites. The postgresql database management system is used to manage the crawled data. The 80 and 20% of validated speech data is used for system training and testing respectively. The accuracy and Word Error Rate (WER) of ASR models are highlighted and end to end spoken query system is developed for Kannada language.  相似文献   

16.
A programming language extension, AGILE, for the processing of graphs within an interactive computer graphics environment, is defined. The language is intended to be used for expressing and illustrating graph-theoretic algorithms and applications. However it does not deal with the actual drawing or display of graphs; rather one is able to access an existing general-purpose graphics package. The language then is intended to be used, in conjunction with a graphics package, as a tool for the production of more specialised graphics systems: the language allows one to naturally exploit the underlying graph structure found in a wide class of problems, while a graphics environment permits the elegant display of (and interaction with) such representations.AGILE extends the host language, C, by the addition of a graph database, and operators and control structures to manipulate this database. The graph structure is composed of five basic types: nodes, edges, graphs, sets and bugs (references). A general set of operators and tests are provided, including those for entity creation and deletion, node and edge traversal and tests for equality and containment of sets and graphs. Edges may be treated as being either directed or undirected; also multiple edges between nodes and self-loops are allowed. Arbitrary values and properties may be associated with each of the basic types. In particular, since a node may have a graph as value, a graph hierarchy is possible. Graphics primitives are provided by the GPAC graphics system.Three substantial applications have been programmed in the language: a system for producing diagrams of graphs and a class of data structures, a system for animating four algorithms for finding the maximum flow in a network, and a system for animating and making films of systems dynamics models.Several examples of programmes written in AGILE are included.  相似文献   

17.
In this paper, we present the building of various language resources for a multi-engine bi-directional English-Filipino Machine Translation (MT) system. Since linguistics information on Philippine languages are available, but as of yet, the focus has been on theoretical linguistics and little is done on the computational aspects of these languages, attempts are reported here on the manual construction of these language resources such as the grammar, lexicon, morphological information, and the corpora which were literally built from almost non-existent digital forms. Due to the inherent difficulties of manual construction, we also discuss our experiments on various technologies for automatic extraction of these resources to handle the intricacies of the Filipino language, designed with the intention of using them for the MT system. To implement the different MT engines and to ensure the improvement of translation quality, other language tools (such as the morphological analyzer and generator, and the part of speech tagger) were developed.  相似文献   

18.
Although it seems that software metrics have moved beyond mere performance measurement, it is not too clear how machine effectiveness, efficiency, and effort pertain to human requirements on such matters. In industry as well as academia, the ISO 9241-11 norm provides the dominant view on usability, stating that usability is a function of effectiveness, efficiency, and satisfaction. Although intuitively, usability requirements should be part of a software's design in an early stage, conceptually and empirically, it seems more likely that performance requirements (i.e., the absence of errors) should be the center of concern. This paper offers an elaborated view on usability, satisfaction, and performance. Certain theoretical conceptions are tested with data gathered from professional users of banking and hospital systems by means of a 4-year single-item survey and a structured questionnaire, respectively. Results suggested that performance factors (i.e., efficiency) are more important than usability in understanding why stakeholders are satisfied with a system or not. Moreover, it neither is dissatisfaction with a system nor that a system is less usable that predicate requirements change. Instead, avoiding machine inaccuracy best predicted the variability in agreement to “must have” requirements, while achieving human accuracy predicted the variability in agreement to the “won’t have” requirements. The present contribution provides a consistent research framework that can bring more focus to design (i.e., prioritization), clarify discussions about design trade-offs, makes concepts measurable, and eventually may lead to better-informed designs.  相似文献   

19.
We design an interactive video-on-demand (VOD) system using both the client-server paradigm and broadcast delivery paradigm. Between the VOD warehouse and customers, we adopt a client-server paradigm to provide an interactive service. Within the VOD warehouse, we adopt a broadcast delivery paradigm to support many concurrent customers. In particular, we exploit the enormous bandwidth of optical fibers for broadcast delivery, so that the system can provide many video programs and maintain a small access delay. In addition, we design and adopt an interleaved broadcast delivery scheme, so that every video stream only requires a small buffer size for temporary storage. A simple proxy is allocated to each ongoing customer, and it retrieves video from optical channels and delivers video to the customer through an information network. The proposed VOD system is suitable for large scale applications with many customers, and has several desirable features: 1) it can be scaled up to serve more concurrent customers and provide more video programs, 2) it provides interactive operations, 3) it only requires point-to-point communication between the VOD warehouse and the customer and involves no network control, 4) it has a small access delay, and 5) it requires a small buffer size for each video stream.  相似文献   

20.
This paper describes a new version of a speech into sign language translation system with new tools and characteristics for increasing its adaptability to a new task or a new semantic domain. This system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). In order to increase the system adaptability, this paper presents new improvements in all the three main modules for generating automatically the task dependent information from a parallel corpus: automatic generation of Spanish variants when generating the vocabulary and language model for the speech recogniser, an acoustic adaptation module for the speech recogniser, data-oriented language and translation models for the machine translator and a list of signs to design. The avatar animation module includes a new editor for rapidly design of the required signs. These developments have been necessary to reduce the effort when adapting a Spanish into Spanish sign language (LSE: Lengua de Signos Española) translation system to a new domain. The whole translation presents a SER (Sign Error Rate) lower than 10% and a BLEU higher than 90% while the effort for adapting the system to a new domain has been reduced more than 50%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号