首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
为进一步提高学前教育对话机器人交互过程的准确性,结合多模态融合思想,提出一种基于面部表情情感和语音情感融合的识别技术。其中,为解决面部表情异常视频帧的问题,采用卷积神经网络对人脸进行检测,然后基于Gabor小波变换对人脸表情进行特征提取,最后通过残差网络对面部表情情感进行识别;为提高情感识别的准确性,协助学前教育机器人更好地理解儿童情感,在采用MFCC对连续语音特征进行提取后,通过残差网络对连续语音情感进行识别;利用多元线性回归算法对面部和语音情感识别结果进行融合。在AVEC2019数据集上的验证结果表明,表情情感识别和连续语音情感识别均具有较高识别精度;与传统的单一情感识别相比,多模态融合识别的一致性相关系数最高,达0.77。由此得出,将多模态情感识别的方法将有助于提高学前教育对话机器人交互过程中的情感识别水平,提高对话机器人的智能化。  相似文献   

2.
为了解决复杂课堂场景下学生表情识别的遮挡的问题,同时发挥深度学习在智能教学评估应用上的优势,提出了一种基于深度注意力网络的课堂教学视频中学生表情识别模型与智能教学评估算法.构建了课堂教学视频库、表情库和行为库,利用裁剪和遮挡策略生成多路人脸图像,在此基础上构建了多路深度注意力网络,并通过自注意力机制为多路网络分配不同权...  相似文献   

3.
基于层次分析法语义知识的人脸表情识别新方法   总被引:1,自引:1,他引:0       下载免费PDF全文
在目前的人脸表情识别系统中,人脸表情的机器识别和人类感知之间存在着本质的差异,造成人脸表情识别率不高。为了减小人脸图像底层视觉特征与高层语义之间的语义鸿沟,提出一种基于层次分析法(AHP)语义知识的人脸表情识别新方法。该方法首先采用层次分析法对训练集中人脸图像进行高层语义描述,建立语义特征向量,在底层视觉特征提取阶段,提出一种二阶PCA(principal component analysis)方法来提取人脸图像的纹理特征;在识别阶段,仅利用输入人脸图像的底层视觉特征,采用K-NN(k-nearest neighbor)算法并结合学习阶段建立的语义特征向量,进行人脸表情分类识别。提出的人脸表情识别方法结合底层视觉特征和高层语义知识,减小了人脸图像底层视觉特征与高层语义之间的语义鸿沟。在JAFFE人脸表情数据库中进行实验,获得了93.92%的平均识别率。理论分析和实验结果表明,与其他的人脸表情识别方法相比,该方法具有更好的识别效果。  相似文献   

4.
人脸表情视频数据库的设计与实现   总被引:4,自引:0,他引:4  
人脸表情分析与合成是近年来人机交互领域的研究热点。由于不同研究小组使用的数据集各异,不同方法的效果和适用性缺乏统一的测试基础。为此,该文研究了人脸表情分析与合成的问题空间,建立了一个大型人脸表情视频数据库,并制定了一套人脸表情视频数据库技术规范。该数据库采集时采用正面充分光照,从3个不同的视角记录图像数据,内容包括了70个人的1000段脸部表情视频,涵盖了常见的8类情感类表情和中文语音发音的说话类表情,提供了一个中文语音动态视位系统,并对表情序列采用了FACS(脸部动作编码系统)评价。该数据库是我国目前人脸表情研究较全面的基础资源库,可以作为人脸识别、人脸表情识别与合成算法的一个标准测试平台。该文介绍了建立人脸表情视频数据库的一系列关键技术,并给出了该数据库的技术标准。  相似文献   

5.
为了解决语言障碍者与健康人之间的交流障碍问题,提出了一种基于神经网络的手语到情感语音转换方法。首先,建立了手势语料库、人脸表情语料库和情感语音语料库;然后利用深度卷积神经网络实现手势识别和人脸表情识别,并以普通话声韵母为合成单元,训练基于说话人自适应的深度神经网络情感语音声学模型和基于说话人自适应的混合长短时记忆网络情感语音声学模型;最后将手势语义的上下文相关标注和人脸表情对应的情感标签输入情感语音合成模型,合成出对应的情感语音。实验结果表明,该方法手势识别率和人脸表情识别率分别达到了95.86%和92.42%,合成的情感语音EMOS得分为4.15,合成的情感语音具有较高的情感表达程度,可用于语言障碍者与健康人之间正常交流。  相似文献   

6.
近年来,表情识别逐渐成为计算机视觉和模式识别领域的研究热点之一。给出了一个包含人脸特征提取和表情识别的计算机视觉系统,通过对视频中人脸兼容运动特征的跟踪,提取人脸运动特征向量序列,与以往的方法不同,提取到的特征向量流被分割为两类,一类是表情特征向量流,另一类是视觉语音特征向量流。然后,利用基于CHMM(Couple Hidden Markov Model)的表情识别模型,进行人脸表情的识别,该模型允许两个向量流根据其各自的时域特征以异步方式进行处理,同时保持这两个向量流在时域上的自然关联。实验表明该方法优于传统的单通道处理方法。  相似文献   

7.
复杂环境下人脸表情识别由于人脸姿势、遮挡及光照等因素影响,相较于可控环境下的人脸表情识别具有更高的挑战性。针对复杂环境下人脸表情识别精度低以及现阶段用于表情识别的网络结构复杂造成的识别效率低等问题,提出了一种基于人脸分割的复杂环境下表情识别实时框架。该框架包括用于人脸区域分割的FsNet(Face segmentation Network)和用于表情识别的TcNet(Tiny classification Network)。FsNet旨在分割出对表情识别最相关的人脸区域以提升TcNet识别精度,其训练数据集基于已有数据集构建。两个网络的结构设计均趋于精简化以保证整体框架的实时性需求。在FER-2013和RAD-DB两个复杂场景人脸表情数据库上的实验表明,人脸区域分割的方式有利于提高复杂环境下人脸表情的识别率,且整体框架在保证实时性的同时达到了良好的识别效果。  相似文献   

8.
随着人工智能技术的发展,人脸表情识别可以从图像或视频中抽取表情状态,识别对象心理情绪,从而达到更好的人机交互效果。然而多数方法只关注正面无遮挡的人脸表情识别,并不能适用于客观复杂的场景,极大地限制了算法的实用性。近几年,针对光照遮挡、噪声遮挡、姿态遮挡、实物遮挡等不同类型的遮挡,研究者们提出了各种新方法来挑战人脸部分遮挡下的表情识别,综述了这些方法的主要原理并进行了对比分析,同时对未来的研究和发展方向进行了展望。  相似文献   

9.
基于MPEG-4标准,实现了一种由彩铃语音及蕴含情感共同驱动生成人脸动画的方法和系统.选用HMM作为分类器,训练使其识别语音库中嗔怒、欣喜、可爱、无奈和兴奋5类情感,并对每类情感建立一组与之对应的表情人脸动画参数(FAP).分析语音强弱得到综合表情函数,并用此函数融合表情FAP与唇动FAP,实现人脸表情多源信息合成,得到综合FAP驱动人脸网格生成动画.实验结果表明,彩铃语音情感识别率可达94.44%,该系统生成的人脸动画也具有较高的真实感.  相似文献   

10.
孙冬梅  张飞飞  毛启容 《计算机工程》2020,46(5):267-273,281
传统的人脸表情识别方法主要针对实验室环境下的基本表情,难以应对现实场景中人类微妙和复杂的表情变化,并且目前自然环境人脸表情识别数据集普遍缺乏足够的训练数据。针对该问题,利用实验室环境下的数据库样本,提出以标签引导的生成对抗网络表情识别域适应方法。将情感标签作为辅助条件,训练生成对抗网络的生成模型,把实验室环境的数据库样本转化为类似自然环境数据库的样本,以扩充自然环境数据库,同时基于扩充的数据库样本训练基本分类器VGG、Resnet等,从而学习自然环境的数据库的情感特征。在RAF_DB等自然环境人脸表情数据库上的实验结果表明,与Boosting-POOF和PixelDA方法相比,该方法扩充得到的数据库可使人脸表情识别率取得6%~9%的提升。  相似文献   

11.
Recognizing speakers in emotional conditions remains a challenging issue, since speaker states such as emotion affect the acoustic parameters used in typical speaker recognition systems. Thus, it is believed that knowledge of the current speaker emotion can improve speaker recognition in real life conditions. Conversely, speech emotion recognition still has to overcome several barriers before it can be employed in realistic situations, as is already the case with speech and speaker recognition. One of these barriers is the lack of suitable training data, both in quantity and quality—especially data that allow recognizers to generalize across application scenarios (‘cross-corpus’ setting). In previous work, we have shown that in principle, the usage of synthesized emotional speech for model training can be beneficial for recognition of human emotions from speech. In this study, we aim at consolidating these first results in a large-scale cross-corpus evaluation on eight of most frequently used human emotional speech corpora, namely ABC, AVIC, DES, EMO-DB, eNTERFACE, SAL, SUSAS and VAM, covering natural, induced and acted emotion as well as a variety of application scenarios and acoustic conditions. Synthesized speech is evaluated standalone as well as in joint training with human speech. Our results show that the usage of synthesized emotional speech in acoustic model training can significantly improve recognition of arousal from human speech in the challenging cross-corpus setting.  相似文献   

12.
语音是人们传递信息内容的同时又表达情感态度的媒介,语音情感识别是人机交互的重要组成部分。由语音情感识别的概念和历史发展进程入手,从6个角度逐步展开对语音情感识别研究体系进行综述。分析常用的情感描述模型,归纳常用的情感语音数据库和不同类型数据库的特点,研究语音情感特征的提取技术。通过比对3种语音情感识别方法的众多学者的多方面研究,得出语音情感识别方法可期望应用场景的态势,展望语音情感识别技术的挑战和发展趋势。  相似文献   

13.
面向虚实融合的人机交互涉及计算机科学、认知心理学、人机工程学、多媒体技术和虚拟现实等领域,旨在提高人机交互的效率,同时响应人类认知与情感的需求,在办公教育、机器人和虚拟/增强现实设备中都有广泛应用。本文从人机交互涉及感知计算、人与机器人交互及协同、个性化人机对话和数据可视化等4个维度系统阐述面向虚实融合人机交互的发展现状。对国内外研究现状进行对比,展望未来的发展趋势。本文认为兼具可迁移与个性化的感知计算、具备用户行为深度理解的人机协同、用户自适应的对话系统等是本领域的重要研究方向。  相似文献   

14.
Cultural dependency analysis for understanding speech emotion   总被引:1,自引:0,他引:1  
Speech has been one of the major communication medium for years and will continue to do so until video communication becomes widely available and easily accessible. Although numerous technologies have been developed to improve the effectiveness of speech communication system, human interaction with machines and robots are still far from ideal. It is acknowledged that human can communicate effectively with each other through the telephony system. This situation motivates many researchers to study in depth the human communication system, with emphasis on its ability to express and infer emotion for effective social communication. Understanding the interlocutors’ emotion and recognizing the listeners’ perception is the key to boost communication effectiveness and interaction. Nonetheless, the perceived emotion is subjective and very much dependent on culture, environment and the pre-emotional state of the listener. Attempts have been made to understand the influence of culture in speech emotion and researchers have reported mixed findings that lead us to believe there are some common acoustical characteristics that enable similar emotion to be discriminated universally across culture. Yet there are unique speech attributes that facilitate exclusive emotion recognition of a particular culture. Understanding culture dependency is thus important to the performance of the speech emotion recognition system.In this paper three different speech emotion databases; namely: Berlin Emo-db, NTU_American and NTU_Asian dataset were selected to represent three different cultures of European, American and Asian respectively focusing on three basic emotions of anger, happiness and sadness with neutral acting as a reference. Different data arrangements with accordance to varying degree of culture dependency were designed for the experimental setup to provide better understanding of inter-cultural and intra-cultural effect in recognizing the speech emotion. Features were extracted using Mel Frequency Cepstral Co-efficient (MFCC) method and classified with neural network (Multi Layer Perceptron (MLP)) and fuzzy neural networks; namely: Adaptive Network Fuzzy Inference System (ANFIS) and Generic Self-Organizing Fuzzy Neural Network (GenSOFNN) representing precise and linguistic fuzzy rule conjuncts respectively. From the experimental results, it can be observed that culture influences the speech emotion recognition accuracy. 75% accuracy performance was recorded for generalized homogeneous intra-cultural experiments whereas the accuracy performance dropped to almost as low as chance probability (25% for 4 classes) for both homogeneous and heterogeneous mixed-cultural inter-culture experiments. The two-stage culture-sensitive speech emotion recognition approach was subsequently proposed to discriminate culture and speech emotion. Results of the analysis show potential of using the proposed technique to recognize culture-influenced speech emotion, which can be extended in many applications, for instance call center and intelligent vehicle. Such analysis may help us to better understand the culture dependency of speech emotion and as a result the accuracy performance of the speech emotion recognition system can be boosted.  相似文献   

15.
人类的语音情感变化是一个抽象的动态过程,难以使用静态信息对其情感交互进行描述,而人工智能的兴起为语音情感识别的发展带来了新的契机。从语音情感识别的概念和在国内外发展的历史进程入手,分别从5个方面对近些年关于语音情感识别的研究成果进行了归纳总结。介绍了语音情感特征,归纳总结了各种语音特征参数对语音情感识别的意义。分别对语音情感数据库的分类及特点、语音情感识别算法的分类及优缺点、语音情感识别的应用以及语音情感识别现阶段所遇到的挑战进行了详细的阐述。立足于研究现状对语音情感识别的未来研究及其发展进行了展望。  相似文献   

16.
Multimedia materials are now increasingly used in curricula. However, individual preferences for multimedia materials based on visual and verbal cognitive styles may affect learners' emotions and performance. Therefore, in-depth studies that investigate how different multimedia materials affect learning performance and the emotions of learners with visual and verbal cognitive styles are needed. Additionally, many education scholars have argued that emotions directly affect learning performance. Therefore, a further study that confirms the relationships between learners' emotions and performance for learners with visual and verbal cognitive styles will provide useful knowledge in terms of designing an emotion-based adaptive multimedia learning system for supporting personalized learning. To investigate these issues, the study applies the Style of Processing (SOP) scale to identify verbalizers and visualizers. Moreover, the emotion assessment instrument emWave, which was developed by HeartMath, is applied to assess variations in emotional states for verbalizers and visualizers during learning processes. Three different multimedia materials, static text and image-based multimedia material, video-based multimedia material, and animated interactive multimedia material, were presented to verbalizers and visualizers to investigate how different multimedia materials affect individual learning performance and emotion, and to identify relationships between learning performance and emotion. Experimental results show that video-based multimedia material generates the best learning performance and most positive emotion for verbalizers. Moreover, dynamic multimedia materials containing video and animation are more appropriate for visualizers than static multimedia materials containing text and image. Finally, a partial correlation exists between negative emotion and learning performance; that is, negative emotion and pretest scores considered together and negative emotion alone can predict learning performance of visualizers who use video-based multimedia material for learning.  相似文献   

17.
A social robot should be able to autonomously interpret human affect and adapt its behavior accordingly in order for successful social human–robot interaction to take place. This paper presents a modular non-contact automated affect-estimation system that employs support vector regression over a set of novel facial expression parameters to estimate a person’s affective states using a valence-arousal two-dimensional model of affect. The proposed system captures complex and ambiguous emotions that are prevalent in real-world scenarios by utilizing a continuous two-dimensional model, rather than a traditional discrete categorical model for affect. As the goal is to incorporate this recognition system in robots, real-time estimation of spontaneous natural facial expressions in response to environmental and interactive stimuli is an objective. The proposed system can be combined with affect detection techniques using other modes, such as speech, body language and/or physiological signals, etc., in order to develop an accurate multi-modal affect estimation system for social HRI applications. Experiments presented herein demonstrate the system’s ability to successfully estimate the affect of a diverse group of unknown individuals exhibiting spontaneous natural facial expressions.  相似文献   

18.
《Computers & Education》2013,60(4):1273-1285
Multimedia materials are now increasingly used in curricula. However, individual preferences for multimedia materials based on visual and verbal cognitive styles may affect learners' emotions and performance. Therefore, in-depth studies that investigate how different multimedia materials affect learning performance and the emotions of learners with visual and verbal cognitive styles are needed. Additionally, many education scholars have argued that emotions directly affect learning performance. Therefore, a further study that confirms the relationships between learners' emotions and performance for learners with visual and verbal cognitive styles will provide useful knowledge in terms of designing an emotion-based adaptive multimedia learning system for supporting personalized learning. To investigate these issues, the study applies the Style of Processing (SOP) scale to identify verbalizers and visualizers. Moreover, the emotion assessment instrument emWave, which was developed by HeartMath, is applied to assess variations in emotional states for verbalizers and visualizers during learning processes. Three different multimedia materials, static text and image-based multimedia material, video-based multimedia material, and animated interactive multimedia material, were presented to verbalizers and visualizers to investigate how different multimedia materials affect individual learning performance and emotion, and to identify relationships between learning performance and emotion. Experimental results show that video-based multimedia material generates the best learning performance and most positive emotion for verbalizers. Moreover, dynamic multimedia materials containing video and animation are more appropriate for visualizers than static multimedia materials containing text and image. Finally, a partial correlation exists between negative emotion and learning performance; that is, negative emotion and pretest scores considered together and negative emotion alone can predict learning performance of visualizers who use video-based multimedia material for learning.  相似文献   

19.
Liu  Zihe  Hou  Weiying  Zhang  Jiayi  Cao  Chenyu  Wu  Bin 《Multimedia Tools and Applications》2022,81(4):4909-4934

Automatically interpreting social relations, e.g., friendship, kinship, etc., from visual scenes has huge potential application value in areas such as knowledge graphs construction, person behavior and emotion analysis, entertainment ecology, etc. Great progress has been made in social analysis based on structured data. However, existing video-based methods consider social relationship extraction as a general classification task and categorize videos into only predefined types. Such methods are unable to recognize multiple relations in multi-person videos, which is obviously not consistent with the actual application scenarios. At the same time, videos are inherently multimodal. Subtitles in the video also provide abundant cues for relationship recognition that is often ignored by researchers. In this paper, we introduce and define a new task named “Multiple-Relation Extraction in Videos (MREV)”. To solve the MREV task, we propose the Visual-Textual Fusion (VTF) framework for jointly modeling visual and textual information. For the spatial representation, we not only adopt a SlowFast network to learn global action and scene information, but also exploit the unique cues of face, body and dialogue between characters. For the temporal domain, we propose a Temporal Feature Aggregation module to perform temporal reasoning, which assesses the quality of different frames adaptively. After that, we use a Multi-Conv Attention module to capture the inter-modal correlation and map the features of different modes to a coordinated feature space. By this means, our VTF framework comprehensively exploits abundant multimodal cues for the MREV task and achieves 49.2% and 50.4% average accuracy on a self-constructed Video Multiple-Relation(VMR) dataset and ViSR dataset, respectively. Extensive experiments on VMR dataset and ViSR dataset demonstrate the effectiveness of the proposed framework.

  相似文献   

20.
葛磊  强彦  赵涓涓 《软件学报》2016,27(S2):130-136
语音情感识别是人机交互中重要的研究内容,儿童自闭症干预治疗中的语音情感识别系统有助于自闭症儿童的康复,但是由于目前语音信号中的情感特征多而杂,特征提取本身就是一项具有挑战性的工作,这样不利于整个系统的识别性能.针对这一问题,提出了一种语音情感特征提取算法,利用无监督自编码网络自动学习语音信号中的情感特征,通过构建一个3层的自编码网络提取语音情感特征,把多层编码网络学习完的高层特征作为极限学习机分类器的输入进行分类,其识别率为84.14%,比传统的基于提取人为定义特征的识别方法有所提高.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号