期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《计算机工程》2018,(4):59-65

已有语音识别方法将用户用英文语音表达的任务目标直接施加到模糊自适应环中,采取直接将识别结果匹配规则前件的方法,限制了系统的识别能力。为此,提出一种语音式任务目标的结构化转换方法。对于语音式任务目标进行句法分析和关键成分提取,对关键成分进行语义关联拓展,建立与任务目标等价的语义关联集合,基于集合完成面向模糊规则前件的结构化转换。通过搭建任务机器人实验系统,验证了该方法具有较好的语音式任务目标识别能力。相似文献

2.

面向任务口语理解研究现状综述

下载免费PDF全文

侯丽仙李艳玲李成城《计算机工程与应用》2019,55(11):7-15

口语理解是对话系统重要的功能模块，语义槽填充和意图识别是面向任务口语理解的两个关键子任务。近年来，联合识别方法已经成为解决口语理解中语义槽填充和意图识别任务的主流方法，介绍两个任务由独立建模到联合建模的方法，重点介绍基于深度神经网络的语义槽填充和意图识别联合建模方法，并总结了目前存在的问题以及未来的发展趋势。相似文献

3.

意图识别与语义槽填充的双向关联模型

下载免费PDF全文

王丽花杨文忠姚苗王婷理姗姗《计算机工程与应用》2021,57(3):196-202

意图识别与语义槽填充联合建模正成为口语理解（Spoken Language Understanding,SLU）的新趋势。但是,现有的联合模型只是简单地将两个任务进行关联,建立了两任务间的单向联系,未充分利用两任务之间的关联关系。考虑到意图识别与语义槽填充的双向关联关系可以使两任务相互促进,提出了一种基于门控机制的双向关联模型（BiAss-Gate）,将两个任务的上下文信息进行融合,深度挖掘意图识别与语义槽填充之间的联系,从而优化口语理解的整体性能。实验表明,所提模型BiAss-Gate在ATIS和Snips数据集上,语义槽填充F1值最高达95.8%,意图识别准确率最高达98.29%,对比其他模型性能得到了显著提升。相似文献

4.

基于深度学习的口语理解联合建模算法综述

魏鹏飞曾碧汪明慧曾安《软件学报》2022,33(11):4192-4216

口语理解是自然语言处理领域的研究热点之一,应用在个人助理、智能客服、人机对话、医疗等多个领域.口语理解技术指的是将机器接收到的用户输入的自然语言转换为语义表示,主要包含意图识别、槽位填充这两个子任务.现阶段,使用深度学习对口语理解中意图识别和槽位填充任务的联合建模方法已成为主流,并且获得了很好的效果.因此,对基于深度学习的口语理解联合建模算法进行总结分析具有十分重要的意义.首先介绍了深度学习技术应用到口语理解的相关工作,然后从意图识别和槽位填充的关联关系对现有的研究工作进行剖析,并对不同模型的实验结果进行了对比分析和总结,最后给出了未来的研究方向及展望. 相似文献

5.

统计中文口语理解执行策略的研究

《计算机科学与探索》2017,(6):980-987

口语理解的语义框架包括两个决策——关键语义概念识别和意图识别,主要针对这两个决策的执行策略进行研究。首先研究了并联型和级联型两种策略;然后在此基础上提出了联合型结构进行中文口语理解,即通过三角链条件随机场对意图以及关键语义概念共同建模,用一个单独的图模型结构共同表示它们的依赖关系。通过与其他几种策略进行比较实验得出结论:该模型可以将两个任务一次完成,在关键语义概念识别任务上性能优于其他的执行策略。相似文献

6.

基于两阶段分类的口语理解方法

吴尉林陆汝占段建勇刘慧高峰陈玉泉《计算机研究与发展》2008,45(5):861-868

口语理解是实现口语对话系统的关键技术之一.它主要面临两方面的挑战:1)稳健性,因为输入语句往往是病态的;2)可移植性,即口语理解单元应能够快速移植到新的领域和语言.提出了一种新的基于两阶段分类的口语理解方法:第1阶段为主题分类,用来识别用户输入语句的主题;第2阶段为主题相关的语义槽分类,根据识别的主题抽取相应的语义槽/值对.该方法能对用户输入语句进行深层理解,同时也能保持稳健性.它基本上是数据驱动的,而且训练数据的标记也比较容易,可方便地移植到新的领域和语言.实验分别在汉语交通查询领域和英语DARPA Communicator领域进行,结果表明了该方法的有效性. 相似文献

7.

人机对话系统中意图识别方法综述

下载免费PDF全文

刘娇李艳玲林民《计算机工程与应用》2019,55(12):1-7

口语理解是人机对话系统的重要组成部分,而意图识别是口语理解中的一个子任务,而且至关重要。意图识别的准确性直接关系到语义槽填充的性能并且有助于后续对话系统的研究。考虑到人机对话系统中意图识别的困难,传统的机器学习方法无法理解用户话语的深层语义信息,主要对近些年应用在意图识别研究方面的深度学习方法进行分析、比较和总结,进一步思考如何将深度学习模型应用到多意图识别任务中,从而推动基于深度神经网络的多意图识别方法的研究。相似文献

8.

基于带约束语义文法的领域相关自然语言理解方法

王东升王石王卫民符建辉诸峰《中文信息学报》2018,32(2):38-49

开放域问答系统通常可以借助一些数据冗余方法来提高问答准确性,而对于缺乏大规模领域语料的领域相关问答系统来说,准确理解用户的意图成为这类系统的关键。该文首先定义了一种带约束语义文法,与本体等语义资源相结合,可以在词汇级、句法级、语义级对自然语言句子的解析过程进行约束,解决自然语言理解歧义问题;然后给出了一个高效的文法匹配算法,其首先依据定义的各种约束条件预先过滤一些规则,然后依据提出的匹配度计算模型对候选的规则进行排序,找到最佳匹配。为了验证方法的有效性,将方法应用到两个实际的应用领域的信息查询系统。实验结果表明,本系统提出的方法切实有效,系统理解准确率分别达到了82.4%和86.2%,MRR值分别达到了91.6%和93.5%。相似文献

9.

一种基于窗口机制的口语理解异构图网络

张启辰王帅李静梅《软件学报》2024,35(4):1885-1898

口语理解(spoken language understanding, SLU)是面向任务的对话系统的核心组成部分,旨在提取用户查询的语义框架.在对话系统中,口语理解组件(SLU)负责识别用户的请求,并创建总结用户需求的语义框架, SLU通常包括两个子任务:意图检测(intent detection, ID)和槽位填充(slot filling, SF).意图检测是一个语义话语分类问题,在句子层面分析话语的语义;槽位填充是一个序列标注任务,在词级层面分析话语的语义.由于意图和槽之间的密切相关性,主流的工作采用联合模型来利用跨任务的共享知识.但是ID和SF是两个具有强相关性的不同任务,它们分别表征了话语的句级语义信息和词级信息,这意味着两个任务的信息是异构的,同时具有不同的粒度.提出一种用于联合意图检测和槽位填充的异构交互结构,采用自注意力和图注意力网络的联合形式充分地捕捉两个相关任务中异构信息的句级语义信息和词级信息之间的关系.不同于普通的同构结构,所提模型是一个包含不同类型节点和连接的异构图架构,因为异构图涉及更全面的信息和丰富的语义,同时可以更好地交互表征不同粒度节点之间的信息.此... 相似文献

10.

利用领域信息的基于字的鲁棒中文口语理解研究

包长春徐为群李亚丽潘接林颜永红《微计算机应用》2010,31(6)

鲁棒性是口语理解研究最具挑战性的关键问题之一.本文采用两个策略提高口语解析的鲁棒性:一是使用浅层统计理解框架,将口语解析简化为实体识别,并且以字取代词作为基本处理单元;二是在统计框架下,分别从特征提取和语料扩充两个角度充分利用领域信息.实验结果显示上述方法能有效提升语义解析性能.对于人机对话的测试集,当输入为语音识别结果时,解析性能(F1值)由75.27提升至90.24,输入为人工转抄结果时,性能由80.59提升至97.14. 相似文献

11.

A spoken query system for the agricultural commodity prices and weather information access in Kannada language

Thimmaraja G. Yadava H. S. Jayanna 《International Journal of Speech Technology》2017,20(3):635-644

In this paper, a spoken query system is demonstrated which can be used to access the latest agricultural commodity prices and weather information in Kannada language using mobile phone. The spoken query system consists of Automatic Speech Recognition (ASR) models, Interactive Voice Response System (IVRS) call flow, Agricultural Marketing Network (AGMARKNET) and India Meteorological Department (IMD) databases. The ASR models are developed by using the Kaldi speech recognition toolkit. The task specific speech data is collected from the different dialect regions of Karnataka (a state in India speaks Kannada language) to develop ASR models. The web crawler is used to get the commodity price and weather information from AGMARKNET and IMD websites. The postgresql database management system is used to manage the crawled data. The 80 and 20% of validated speech data is used for system training and testing respectively. The accuracy and Word Error Rate (WER) of ASR models are highlighted and end to end spoken query system is developed for Kannada language. 相似文献

12.

MARS: A Statistical Semantic Parsing and Generation-Based Multilingual Automatic tRanslation System

Yuqing Gao Bowen Zhou Zijian Diao Jeffrey Sorensen Michael Picheny 《Machine Translation》2002,17(3):185-212

We present MARS (Multilingual Automatic tRanslation System), a research prototype speech-to-speech translation system. MARS is aimed at two-way conversational spoken language translation between English and Mandarin Chinese for limited domains, such as air travel reservations. In MARS, machine translation is embedded within a complex speech processing task, and the translation performance is highly effected by the performance of other components, such as the recognizer and semantic parser, etc. All components in the proposed system are statistically trained using an appropriate training corpus. The speech signal is first recognized by an automatic speech recognizer (ASR). Next, the ASR-transcribed text is analyzed by a semantic parser, which uses a statistical decision-tree model that does not require hand-crafted grammars or rules. Furthermore, the parser provides semantic information that helps further re-scoring of the speech recognition hypotheses. The semantic content extracted by the parser is formatted into a language-independent tree structure, which is used for an interlingua based translation. A Maximum Entropy based sentence-level natural language generation (NLG) approach is used to generate sentences in the target language from the semantic tree representations. Finally, the generated target sentence is synthesized into speech by a speech synthesizer.Many new features and innovations have been incorporated into MARS: the translation is based on understanding the meaning of the sentence; the semantic parser uses a statistical model and is trained from a semantically annotated corpus; the output of the semantic parser is used to select a more specific language model to refine the speech recognition performance; the NLG component uses a statistical model and is also trained from the same annotated corpus. These features give MARS the advantages of robustness to speech disfluencies and recognition errors, tighter integration of semantic information into speech recognition, and portability to new languages and domains. These advantages are verified by our experimental results. 相似文献

13.

基于上下文信息的口语意图检测方法

徐扬王建成刘启元李寿山《计算机科学》2020,47(1):205-211

近年来,随着人工智能的发展与智能设备的普及,人机智能对话技术得到了广泛的关注。口语语义理解是口语对话系统中的一项重要任务,而口语意图检测是口语语义理解中的关键环节。由于多轮对话中存在语义缺失、框架表示以及意图转换等复杂的语言现象,因此面向多轮对话的意图检测任务十分具有挑战性。为了解决上述难题,文中提出了基于门控机制的信息共享网络,充分利用了多轮对话中的上下文信息来提升检测性能。具体而言,首先结合字音特征构建当前轮文本和上下文文本的初始表示,以减小语音识别错误对语义表示的影响;其次,使用基于层级化注意力机制的语义编码器得到当前轮和上下文文本的深层语义表示,包含由字到句再到多轮文本的多级语义信息;最后,通过在多任务学习框架中引入门控机制来构建基于门控机制的信息共享网络,使用上下文语义信息辅助当前轮文本的意图检测。实验结果表明,所提方法能够高效地利用上下文信息来提升口语意图检测效果,在全国知识图谱与语义计算大会(CCKS2018)技术评测任务2的数据集上达到了88.1%的准确率(Acc值)和88.0%的综合正确率(F1值),相比于已有的方法显著提升了性能。相似文献

14.

汉语股票实时行情查询对话系统 总被引：1，自引：0，他引：1

张琳高峰郭荣毛家菊陆汝占《计算机应用》2004,24(7):61-63

介绍了一个用于股票实时行情查询的口语化的人机对话系统，该系统集成了语音识别、语言理解、对话控制等技术。文中定义了一个情景语义框架模型，较好地处理了口语理解系统的一些难点。相似文献

15.

Increasing adaptability of a speech into sign language translation system

Verónica López-Ludeña Rubén San-Segundo Carlos González Morcillo Juan Carlos López José M. Pardo Muñoz 《Expert systems with applications》2013,40(4):1312-1322

This paper describes a new version of a speech into sign language translation system with new tools and characteristics for increasing its adaptability to a new task or a new semantic domain. This system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). In order to increase the system adaptability, this paper presents new improvements in all the three main modules for generating automatically the task dependent information from a parallel corpus: automatic generation of Spanish variants when generating the vocabulary and language model for the speech recogniser, an acoustic adaptation module for the speech recogniser, data-oriented language and translation models for the machine translator and a list of signs to design. The avatar animation module includes a new editor for rapidly design of the required signs. These developments have been necessary to reduce the effort when adapting a Spanish into Spanish sign language (LSE: Lengua de Signos Española) translation system to a new domain. The whole translation presents a SER (Sign Error Rate) lower than 10% and a BLEU higher than 90% while the effort for adapting the system to a new domain has been reduced more than 50%. 相似文献

16.

Towards situated speech understanding: visual context priming of language models

《Computer Speech and Language》2005,19(2):227-248

相似文献

17.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence

Ananthakrishnan S. Narayanan S.S. 《IEEE transactions on audio, speech, and language processing》2008,16(1):216-228

With the advent of prosody annotation standards such as tones and break indices (ToBI), speech technologists and linguists alike have been interested in automatically detecting prosodic events in speech. This is because the prosodic tier provides an additional layer of information over the short-term segment-level features and lexical representation of an utterance. As the prosody of an utterance is closely tied to its syntactic and semantic content in addition to its lexical content, knowledge of the prosodic events within and across utterances can assist spoken language applications such as automatic speech recognition and translation. On the other hand, corpora annotated with prosodic events are useful for building natural-sounding speech synthesizers. In this paper, we build an automatic detector and classifier for prosodic events in American English, based on their acoustic, lexical, and syntactic correlates. Following previous work in this area, we focus on accent (prominence, or ldquostressrdquo) and prosodic phrase boundary detection at the syllable level. Our experiments achieved a performance rate of 86.75% agreement on the accent detection task, and 91.61% agreement on the phrase boundary detection task on the Boston University Radio News Corpus. 相似文献

18.

Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification

Stefan Scherer John Kane Christer Gobl Friedhelm Schwenker 《Computer Speech and Language》2013,27(1):263-287

The dynamic use of voice qualities in spoken language can reveal useful information on a speakers attitude, mood and affective states. This information may be very desirable for a range of, both input and output, speech technology applications. However, voice quality annotation of speech signals may frequently produce far from consistent labeling. Groups of annotators may disagree on the perceived voice quality, but whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. Further, the study looks to include these features as inputs to a fuzzy-input fuzzy-output support vector machine (F²SVM) algorithm, which is in turn capable of softly categorizing voice quality recordings. The F²SVM is compared in a thorough analysis to standard crisp approaches and shows promising results, while outperforming for example standard support vector machines with the sole difference being that the F²SVM approach receives fuzzy label information during training. Overall, it is possible to achieve accuracies of around 90% for both speaker dependent (cross validation) and speaker independent (leave one speaker out validation) experiments. Additionally, the approach using F²SVM performs at an accuracy of 82% for a cross corpus experiment (i.e. training and testing on entirely different recording conditions) in a frame-wise analysis and of around 97% after temporally integrating over full sentences. Furthermore, the output of fuzzy measures gave performances close to that of human annotators. 相似文献

19.

Handling Knowledge Sources in Human-Machine Interaction

Wolfgang Minker Françoise Néel 《International Journal of Speech Technology》2002,5(2):171-188

This article describes the various knowledge sources that, in general, are required to handle multimodal human-machine interaction efficiently: these are called the task, user, dialogue, environment and system models. The first part discusses the content of these models. Special emphasis is given on problems that occur when speech is combined with other modalities. The second part focuses on spoken language characteristics and proposes an adapted semantic representation for the task model. It also describes a stochastic method to collect and process the information related to this model. The conclusion discusses an extension of such a stochastic method to multimodality. 相似文献

20.

Introducing Syntax Information in a Stochastically-Based Semantic Case Grammar Parser

Wolfgang Minker 《International Journal of Speech Technology》2004,7(1):45-54

We study the impact of introducing syntax information into a stochastic component for natural language understanding that is based on a purely semantic case grammar formalism. The parser operates in an application for train travel information retrieval, the French ARISE (Automatic Railway Information Systems for Europe) task. This application supports the development of schedule inquiry services by telephone. The semantic case grammar has been chosen in order to enhance robustness facing spontaneous speech effects. However, this robustness is likely to turn into a drawback, if the semantic analysis ignores information that is propagated by syntactic relations. Introducing additional syntax information, whose complexity is well adapted to the size of the stochastic model, may disambiguate and therefore improve the decoding. 相似文献