期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

徐洋蒋玉茹张禹尧《中文信息学报》2022,36(4):73-80

幽默识别是自然语言处理的新兴研究领域之一。对话的特殊结构使得在对话中的幽默识别相较于短文本幽默识别更具有挑战性。在对话中,除了当前话语以外,上下文语境信息对于幽默的识别也至关重要。因此,该文在已有研究的基础上结合对话的结构特征,提出基于BERT的强化语境与语义信息的对话幽默识别模型。模型首先使用BERT对发言人信息和话语信息进行编码,其次分别使用句级别的BiLSTM、CNN和Attention机制强化语境信息,使用词级别的BiLSTM和Attention机制强化语义信息。实验结果表明,该文方法能有效提升机器识别对话中幽默的能力。相似文献

2.

口语对话中的语句主题分析 总被引：1，自引：0，他引：1

徐为群徐波黄泰翼《中文信息学报》2005,19(4):90-97

本文研究如何根据浅层的语义分析确定自然口语对话中的语句主题。首先将对话中的语句主题定义为说话者所关注的显著语义实体,并讨论了这样的语句主题所具有的两个特点(即话语性和连续性) 以及语句主题跟(扩展) 句子类型的关系(因而也介绍了句子类型及其扩展和扩展句子类型的识别) 。然后根据这些建立了语句主题分析算法,并在实际的对话语料中进行分析。实验结果表明,语句主题的分析正确率可达到6111～8716 % ,取决于不同的扩展句子类型和不同的正确率定义。相似文献

3.

基于段级特征的对话环境下说话人分段算法

王波徐毅琼李弼程《计算机工程与设计》2007,28(10):2401-2402,2416

提出了一种使用段级语音特征对测试进行说话人分段从而实现对话环境下说话人分段算法,算法实现中基于车比雪夫和不等式提出了基于协方差模型的段级特征的距离测度描述.该识别方法根据实验选择了合适的段级特征语音段长度,实验结果表明基于段级特征的说话人识别方法在有效地在对话环境下将多人的语音进行分段,从而提高了说话人识别系统的精度和识别速度. 相似文献

4.

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Kotti M. Benetos E. Kotropoulos C. 《IEEE transactions on audio, speech, and language processing》2008,16(5):920-933

An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches. 相似文献

5.

An analysis of prosodic information for the recognition of dialogue acts in a multimodal corpus in Mexican Spanish

Sergio R. Coria Luis A. Pineda 《Computer Speech and Language》2009,23(3):277-310

This paper presents empirical results of an analysis on the role of prosody in the recognition of dialogue acts and utterance mood in a practical dialogue corpus in Mexican Spanish. The work is configured as a series of machine-learning experimental conditions in which models are created by using intonational and other data as predictors and dialogue act tagging data as targets. We show that utterance mood can be predicted from intonational information, and that this mood information can then be used to recognize the dialogue act. 相似文献

6.

A plan-based agent architecture for interpreting natural language dialogue

《International journal of human-computer studies》2000,52(4):583-635

This paper describes a plan-based agent architecture for modelling NL cooperative dialogue; in particular, the paper focuses on the interpretation of dialogue and on the explanation of its coherence by means of the recognition of the speakers' underlying intentions. The approach we propose makes it possible to analyze an explain in a uniform way several apparently unrelated linguistic phenomena, which have been often studied separately and treated via ad-hoc methods in the models of dialogue presented in the literature. Our model of linguistic interaction is based on the idea that dialogue can be seen as any other interaction among agents: therefore, domain-level and linguistic actions are treated in a similar way.Our agent architecture is based on a two-level representation of the knowledge about acting: at the metalevel, the agent modelling (AM) plans describe the recipes for plan formation and execution (they are a declarative representation of a reactive planner); at the object level, the domain and communicative actions are defined. The AM plans are used to identify the goals underlying the actions performed by an observed agent; the recognized plans constitute the dialogue context, where the intentions of all participants are stored in a structured way, in order to be used in the interpretation of the subsequent dialogue turns. 相似文献

7.

基于听说知识融合网络的多模态对话情绪识别

刘琴谢珺胡勇郝戍峰郝雅卉《控制与决策》2024,39(6):2031-2040

多模态对话情绪识别旨在根据多模态对话语境判别出目标话语所表达的情绪类别,是构建共情对话系统的基础任务.现有工作中大多数方法仅考虑多模态对话本身信息,忽略了对话中与倾听者和说话者相关的知识信息,从而限制了目标话语情绪特征的捕捉.为解决该问题,提出一种基于听说知识融合网络的多模态对话情绪识别模型(LSKFN),引入与倾听者和说话者相关的外部常识知识,实现多模态上下文信息和知识信息的有机融合.LSKFN包含多模态上下文感知、听说知识融合、情绪信息汇总和情绪决策4个阶段,分别用于提取多模态上下文特征、融入听说知识特征、消除冗余特征和预测情绪分布.在两个公开数据集上的实验结果表明,与其他基准模型相比,LSKFN能够为目标话语提取到更加丰富的情绪特征,并且获得较好的对话情绪识别效果. 相似文献

8.

On the dynamic adaptation of language models based on dialogue information

J.M. Lucas-Cuesta J. Ferreiros F. Fernández-Martı´nez J.D. Echeverry S. Lutfi 《Expert systems with applications》2013,40(4):1069-1085

We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to significantly improve the performance of the speech recognition, which leads to an improvement in both the language understanding and the dialogue management tasks. 相似文献

9.

基于段级特征主成分分析的说话人识别算法

储雯李银国徐洋孟祥涛《计算机应用》2013,33(7):1935-1937

为了提高说话人识别(SR)系统的运算速度,增强其鲁棒性,以现有的帧级语音特征为基础,提出了一种基于段级特征主成分分析的说话人识别算法。该算法在训练和识别阶段以段级特征代替帧级特征,然后用主成分分析方法对段级特征进行降维、去相关。实验结果表明,该算法的系统训练时间、测试时间分别为基线系统的47.8%、40.0%,同时识别率略有提高,抑制了噪声对说话人识别系统的影响。该结果验证了基于段级特征主成分分析的说话人识别算法在识别率有所提高的情况下取得了较快的识别速度,同时在不同噪声环境下的不同信噪比情况下均可以提高系统识别率。相似文献

10.

Error Detection in Spoken Human-Machine Interaction

E. Krahmer M. Swerts M. Theune M. Weegels 《International Journal of Speech Technology》2001,4(1):19-30

相似文献

11.

Monaural speech separation based on MAXVQ and CASA for robust speech recognition 总被引：1，自引：0，他引：1

Peng Li Yong Guan Shijin Wang Bo Xu Wenju Liu 《Computer Speech and Language》2010,24(1):30-44

Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly. 相似文献

12.

基于多特征i-vector的短语音说话人识别算法

孙念张毅林海波黄超《计算机应用》2018,38(10):2839-2843

当测试语音时长充足时,单一特征的信息量和区分性足够完成说话人识别任务,但是在测试语音很短的情况下,语音信号里缺乏充分的说话人信息,使得说话人识别性能急剧下降。针对短语音条件下的说话人信息不足的问题,提出一种基于多特征i-vector的短语音说话人识别算法。该算法首先提取不同的声学特征向量组合成一个高维特征向量,然后利用主成分分析（PCA）去除高维特征向量的相关性,使特征之间正交化,最后采用线性判别分析（LDA）挑选出最具区分性的特征,并且在一定程度上降低空间维度,从而实现更好的说话人识别性能。结合TIMIT语料库进行实验,同一时长的短语音（2 s）条件下,所提算法比基于i-vector的单一的梅尔频率倒谱系数（MFCC）、线性预测倒谱系数（LPCC）、感知对数面积比系数（PLAR）特征系统在等错误率（EER）上分别有相对72.16%、69.47%和73.62%的下降。不同时长的短语音条件下,所提算法比基于i-vector的单一特征系统在EER和检测代价函数（DCF）上大致都有50%的降低。基于以上两种实验的结果充分表明了所提算法在短语音说话人识别系统中可以充分提取说话人的个性信息,有利地提高说话人识别性能。相似文献

13.

广播语音的音频分割 总被引：1，自引：2，他引：1

贾磊穆向禺徐波《中文信息学报》2002,16(1):38-43

本文的广播电视新闻的分割系统分为三部分:分割、分类和聚类。分割部分是采用本文提出的基于检测熵变化趋势的分割算法来检测连续语音音频信号的声学特征跳变点,从而实现不同性质的音频信号的分割。这种检测方法不同于传统的需要门限的跳变点检测方法,它是以检测一定窗长的信号内部的每一个可能的分割点所分割的两段信号的信号熵的变化趋势来检测音频信号声学特征跳变点的,可以避免由于门限的选择不当所带来的分割错误。分类部分是采用传统的基于高斯混合模型(GMM)的高斯分类器进行分类,聚类部分采用基于矢量量化(VQ)的说话人聚类算法进行说话人聚类。应用此系统分割三段30分钟的新闻,成功的实现了连续音频信号的分割,去除掉了所有的背景音乐,以较高的精度把属于同一个人的说话语音划归为一类,为广播语音的分类识别打下了良好的基础。相似文献

14.

Hybridized estimations of support vector machine free parameters C and γ using a fuzzy learning strategy for microphone array-based speaker recognition in a Kinect sensor-deployed environment

Ding Ing-Jr Shi Jia-Yi 《Multimedia Tools and Applications》2017,76(23):25297-25319

The support vector machine (SVM) is a popular classification model for speaker verification. However, although SVM is suitable for classifying speakers, the uncertain values of the free parameters C and γ of the SVM model have been a challenging technique problem. An improper value set provided for the free parameter pair (C, γ) can cause dissatisfactory performance in the recognition accuracy of speaker verification. Moreover, the sound source localization information of the collected acoustic data has a large effect on the recognition performance of SVM speaker verification. In response, this study developed a sound source localization-driven fuzzy scheme to help determine the optimal value set of (C, γ) for the establishment of an SVM model. Specifically, this scheme adopts the estimated information of time difference of arrival (TDOA) derived from the Kinect microphone array (containing both the angle and distance information of the acoustic data of the speaker), to optimally calculate the value set of the SVM free parameters C and γ. It was demonstrated that speaker verification using the SVM with a properly estimated parameter pair (C, γ) is more accurate than that with only an arbitrarily given value set for the parameter pair (C, γ) on recognition rate.

相似文献

15.

Pragmatic modeling: toward a robust natural language interface

Sandra Carberry 《Computational Intelligence》1987,3(1):117-136

One of the most important ways in which an information-provider can assimilate an information-seeking dialogue is by inferring the underlying task-related plan motivating the information-seeker's queries. This paper presents a strategy for hypothesizing and tracking the changing task-level goals of an information-seeker and building a model of his task-related plan as the dialogue progresses.
Naturally occurring utterances are often imperfect. The information-provider often appears to use acquired knowledge about the information-seeker's underlying task-related plan to remedy many of the information-seeker's faulty utterances and enable the dialogue to continue without interruption. This paper presents a strategy for understanding one kind of defective utterance. Our approach relies on the information-seeker's inferred task-related plan as the primary mechanism for suggesting how an utterance should be understood, thereby considering only interpretations that are relevant to what the information-seeker is trying to accomplish. If multiple interpretations are suggested, relevance to the current focus of attention in the dialogue and similarity to the information-seeker's actual utterance are used to select the interpretation that is most likely to represent his intended meaning or satisfy his needs. 相似文献

16.

一种基于PCA的段级特征

张兴明王科人黄山奇《电子技术应用》2011,37(5)

提出了一种基于PCA的段级特征(PCAULF)。该特征以现有的帧级语音特征为基础,通过计算段级特征引入了语音的长时特性。对段级特征使用PCA降维,一方面去除由于引入段级特征带来的冗余,实现数据降维,提高识别速度;另一方面抑制了噪声对识别系统的影响,提高了段级特征的鲁棒性。在训练阶段,计算所有语音的段级特征,使用PCA方法得到变换矩阵;在测试阶段,先使用变换矩阵对段级特征进行降维,再进行判别。实验结果表明,采用该特征有效地提高了识别精度和速度,更加适用于实时说话人识别系统。相似文献

17.

基于对话结构和联合学习的情感和意图分类

张伟生王中卿李寿山周国栋《中文信息学报》1986,34(8):105-112

在社交媒体中存在大量的对话文本,而在这些对话中,说话人的情感和意图通常是相关的。不仅如此,对话的整体结构也会影响对话的情感和意图,因此,需要对对话中的情感和意图进行联合学习。为此,该文提出了基于对话结构的情感、意图联合学习模型,考虑对话内潜在的情感与意图的关联性,并且利用对话的内在结构与说话人的情感和意图之间的关系,提升多轮对话文本的每一子句情感及其意图的分类性能。同时,通过使用注意力机制,利用对话的前后联系来综合考虑上下文对对话情感的影响。实验表明,联合学习模型能有效地提高对话子句情感及意图分类的性能。相似文献

18.

Errors in Pragmatics

Anton Benz 《Journal of Logic, Language and Information》2012,21(1):97-116

In this paper we are going to show that error coping strategies play an essential role in linguistic pragmatics. We study the effect of noisy speaker strategies within a framework of signalling games with feedback loop. We distinguish between cases in which errors occur in message selection and cases in which they occur in signal selection. The first type of errors affects the content of an utterance, and the second type its linguistic expression. The general communication model is inspired by the Shannon–Weaver communication model. We test the model by a number of benchmark examples, including examples of relevance implicatures, quantity implicatures, and presupposition accommodation. 相似文献

19.

Changes in linguistic behaviors based on smart speaker task performance and pragmatic skills in multiple turn-taking interactions

Park Chaewon Lim Yoonseob Choi Jongsuk Sung Jee Eun 《Intelligent Service Robotics》2021,14(3):357-372

In the current study, we conducted a Wizard-of-Oz experiment using a smart speaker to investigate how smart speakers’ task performance (success vs. failure) and pragmatic levels (high vs. low) alter users’ linguistic behaviors during multiple turn-taking conversations. The linguistic behaviors analyzed in this study included the mean length of utterance, give-up and topic development frequency. Furthermore, we examined what kinds of pragmatic skills smart speakers need to sustain multiple turn-taking interactions. The results suggest that smart speakers’ performance levels and pragmatic skills have different effects on linguistic behaviors. Task performance and the pragmatic levels of smart speaker did not change participants’ utterance lengths. Giving up on conversations when tasks were not successfully completed occurred more frequently with smart speakers with low pragmatic capabilities. Topic development occurred more frequently when people interacted with smart speakers with high pragmatic capabilities or when tasks were accomplished. The notable requisite pragmatic skills for smart speakers included the abilities to specify and describe information, react to indirect behavior, and appreciate humor/ironic humor. The findings of this study may have implications for designing dialogue for artificial conversational agents in various conversational settings.

相似文献

20.

Speaker verification using excitation source information

Debadatta Pati S. R. Mahadeva Prasanna 《International Journal of Speech Technology》2012,15(2):241-257

In this work we develop a speaker recognition system based on the excitation source information and demonstrate its significance by comparing with the vocal tract information based system. The speaker-specific excitation information is extracted by the subsegmental, segmental and suprasegmental processing of the LP residual. The speaker-specific information from each level is modeled independently using Gaussian mixture modeling—universal background model (GMM-UBM) modeling and then combined at the score level. The significance of the proposed speaker recognition system is demonstrated by conducting speaker verification experiments on the NIST-03 database. Two different tests, namely, Clean test and Noisy test are conducted. In case of Clean test, the test speech signal is used as it is for verification. In case of Noisy test, the test speech is corrupted by factory noise (9 dB) and then used for verification. Even though for Clean test case, the proposed source based speaker recognition system still provides relatively poor performance than the vocal tract information, its performance is better for Noisy test case. Finally, for both clean and noisy cases, by providing different and robust speaker-specific evidences, the proposed system helps the vocal tract system to further improve the overall performance. 相似文献