首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
In this paper, we propose a recursive framework to recognize facial expressions from images in real scenes. Unlike traditional approaches that typically focus on developing and refining algorithms for improving recognition performance on an existing dataset, we integrate three important components in a recursive manner: facial dataset generation, facial expression recognition model building, and interactive interfaces for testing and new data collection. To start with, we first create candid images for facial expression (CIFE) dataset. We then apply a convolutional neural network (CNN) to CIFE and build a CNN model for web image expression classification. In order to increase the expression recognition accuracy, we also fine-tune the CNN model and thus obtain a better CNN facial expression recognition model. Based on the fine-tuned CNN model, we design a facial expression game engine and collect a new and more balanced dataset, GaMo. The images of this dataset are collected from the different expressions our game users make when playing the game. Finally, we run yet another recursive step—a self-evaluation of the quality of the data labeling and propose a self-cleansing mechanism for improve the quality of the data. We evaluate the GaMo and CIFE datasets and show that our recursive framework can help build a better facial expression model for dealing with real scene facial expression tasks.  相似文献   

2.
3.
陈师哲  王帅  金琴 《软件学报》2018,29(4):1060-1070
自动情感识别是一个非常具有挑战性的课题,并且有着广泛的应用价值.本文探讨了在多文化场景下的多模态情感识别问题.我们从语音声学和面部表情等模态分别提取了不同的情感特征,包括传统的手工定制特征和基于深度学习的特征,并通过多模态融合方法结合不同的模态,比较不同单模态特征和多模态特征融合的情感识别性能.我们在CHEAVD中文多模态情感数据集和AFEW英文多模态情感数据集进行实验,通过跨文化情感识别研究,我们验证了文化因素对于情感识别的重要影响,并提出3种训练策略提高在多文化场景下情感识别的性能,包括:分文化选择模型、多文化联合训练以及基于共同情感空间的多文化联合训练,其中基于共同情感空间的多文化联合训练通过将文化影响与情感特征分离,在语音和多模态情感识别中均取得最好的识别效果.  相似文献   

4.
雾天是影响高速公路交通安全的重要因素。研究从监控图像进行高速公路雾天能见度的自动识别方法可以为交通管理部门的智能管理和决策提供技术支持。根据大气散射模型分析出与雾浓度相关的多个物理因素,提出了综合这些物理因素的多通路融合识别网络。该网络使用三个通路联合学习深度视觉特征、传输矩阵特征和场景深度特征,并设计注意力融合模块来自适应地融合这三类特征以进行能见度等级识别。同时构建了一个合成数据集和一个真实的高速公路场景数据集,用于网络参数学习和性能评估。实景数据集中的图像是从中国多条高速公路的监控视频中收集的。在这两个数据集上的实验表明,所提方法可以适应不同的监控拍摄场景,能够比现有方法更准确地识别能见度等级,有效提升了识别精度。  相似文献   

5.
行为识别是当前计算机视觉方向中视频理解领域的重要研究课题。从视频中准确提取人体动作的特征并识别动作,能为医疗、安防等领域提供重要的信息,是一个十分具有前景的方向。本文从数据驱动的角度出发,全面介绍了行为识别技术的研究发展,对具有代表性的行为识别方法或模型进行了系统阐述。行为识别的数据分为RGB模态数据、深度模态数据、骨骼模态数据以及融合模态数据。首先介绍了行为识别的主要过程和人类行为识别领域不同数据模态的公开数据集;然后根据数据模态分类,回顾了RGB模态、深度模态和骨骼模态下基于传统手工特征和深度学习的行为识别方法,以及多模态融合分类下RGB模态与深度模态融合的方法和其他模态融合的方法。传统手工特征法包括基于时空体积和时空兴趣点的方法(RGB模态)、基于运动变化和外观的方法(深度模态)以及基于骨骼特征的方法(骨骼模态)等;深度学习方法主要涉及卷积网络、图卷积网络和混合网络,重点介绍了其改进点、特点以及模型的创新点。基于不同模态的数据集分类进行不同行为识别技术的对比分析。通过类别内部和类别之间两个角度对比分析后,得出不同模态的优缺点与适用场景、手工特征法与深度学习法的区别和融合多模态的优...  相似文献   

6.
针对日常生活中双人交互行为因运动区域难以分割,造成无法准确识别的问题,提出了一种基于分层结构的双人交互行为识别方法。该方法首先按照交互行为双方身体是否接触作为分界点,将整个交互行为分为开始阶段、执行阶段和结束阶段。将开始阶段与结束阶段左右两侧人体所在矩形区域分别提取作为该兴趣区域,将执行阶段双人所在矩形区域整体提取作为感兴趣区域,分别提取HOG特征。使用1NN分类器获得每个阶段的每个对象的识别概率,最终通过加权融合各个阶段各个对象的识别概率实现对该交互行为的识别。利用UT-interaction数据库对该方法进行测试的实验结果表明,该方法实现简单,并具有良好的识别效果。  相似文献   

7.
目的 红外与可见光图像融合的目标是将红外图像与可见光图像的互补信息进行融合,增强源图像中的细节场景信息。然而现有的深度学习方法通常人为定义源图像中需要保留的特征,降低了热目标在融合图像中的显著性。此外,特征的多样性和难解释性限制了融合规则的发展,现有的融合规则难以对源图像的特征进行充分保留。针对这两个问题,本文提出了一种基于特有信息分离和质量引导的红外与可见光图像融合算法。方法 本文提出了基于特有信息分离和质量引导融合策略的红外与可见光图像融合算法。设计基于神经网络的特有信息分离以将源图像客观地分解为共有信息和特有信息,对分解出的两部分分别使用特定的融合策略;设计权重编码器以学习质量引导的融合策略,将衡量融合图像质量的指标应用于提升融合策略的性能,权重编码器依据提取的特有信息生成对应权重。结果 实验在公开数据集RoadScene上与6种领先的红外与可见光图像融合算法进行了对比。此外,基于质量引导的融合策略也与4种常见的融合策略进行了比较。定性结果表明,本文算法使融合图像具备更显著的热目标、更丰富的场景信息和更多的信息量。在熵、标准差、差异相关和、互信息及相关系数等指标上,相较于对比算法...  相似文献   

8.
In this paper, we present a clustering method called clustering by sorting influence power, which incorporates the concept of influence power as measurement among points. In our method, clustering is performed in an efficient tree-growing fashion exploiting both the hypothetical influence powers of data points and the distances among data points. Since influence powers among data points evolve over time, we adopt a PageRank-like algorithm to calculate them iteratively to avoid the issue of improper initial exemplar preference. The experimental results show that our proposed method outperforms four well-known clustering methods across seven complex and non-isotropic datasets. Moreover, our simple clustering method can be easily applied to several practical clustering problems. We evaluate the effectiveness of our algorithm on two real-world datasets, i.e. an open dataset of Alzheimers disease protein–protein interaction network and a dataset for race walking recognition collected by ourselves, and we find our method outperforms other methods reported in the literature.  相似文献   

9.
开放域问答是自然语言处理中的重要任务之一。目前的开放域问答模型总是倾向于在问题和文章之间做浅层的文本匹配,经常在一些简单问题上出错。这些错误的原因部分是由于阅读理解数据集缺少一些真实场景下常见的模式。该文提出了几种能够提高开放域问答鲁棒性的数据增广方法,能有效减少这些常见模式的影响。此外,我们还构造并公开发布了一个新的开放域问答数据集,能够评估模型在真实场景下的实际效果。实验结果表明,该文提出的方法在实际场景下带来了性能提升。  相似文献   

10.
Because stress has such a powerful impact on human health, we must be able to identify it automatically in our everyday lives. The human activity recognition (HAR) system use data from several kinds of sensors to try to recognize and evaluate human actions automatically recognize and evaluate human actions. Using the multimodal dataset DEAP (Database for Emotion Analysis using Physiological Signals), this paper presents deep learning (DL) technique for effectively detecting human stress. The combination of vision-based and sensor-based approaches for recognizing human stress will help us achieve the increased efficiency of current stress recognition systems and predict probable actions in advance of when fatal. Based on visual and EEG (Electroencephalogram) data, this research aims to enhance the performance and extract the dominating characteristics of stress detection. For the stress identification test, we utilized the DEAP dataset, which included video and EEG data. We also demonstrate that combining video and EEG characteristics may increase overall performance, with the suggested stochastic features providing the most accurate results. In the first step, CNN (Convolutional Neural Network) extracts feature vectors from video frames and EEG data. Feature Level (FL) fusion that combines the features extracted from video and EEG data. We use XGBoost as our classifier model to predict stress, and we put it into action. The stress recognition accuracy of the proposed method is compared to existing methods of Decision Tree (DT), Random Forest (RF), AdaBoost, Linear Discriminant Analysis (LDA), and K-Nearest Neighborhood (KNN). When we compared our technique to existing state-of-the-art approaches, we found that the suggested DL methodology combining multimodal and heterogeneous inputs may improve stress identification.  相似文献   

11.
Over the last few years, activity recognition in the smart home has become an active research area due to the wide range of human centric-applications. With the development of machine learning algorithms for activity classification, dataset is significantly important for algorithms testing and validation. Collection of real data is a challenging process due to involved budget, human resources, and annotation cost that’s why mostly researchers prefer to utilize existing datasets for evaluation purposes. However, openly available smart home datasets indicate variation in terms of performed activities, deployed sensors, and environment settings. Unfortunately, the analysis of existing datasets characteristic is a bottleneck for researchers while selecting datasets of their intent. In this paper, we develop a Framework for Smart Homes Dataset Analysis (FSHDA) to reflect their diverse dimensions in predefined format. It analyzes a list of data dimensions that covers the variations in time, activities, sensors, and inhabitants. For validation, we examine the effects of proposed data dimension on state-of-the-art activity recognition techniques. The results show that dataset dimensions highly affect the classifiers’ individual activity label assignments and their overall performances. The outcome of our study is helpful for upcoming researchers to develop a better understanding about the smart home datasets characteristics with classifier’s performance.  相似文献   

12.
大多数现存的谱聚类方法均使用传统距离度量计算样本之间的相似性, 这样仅仅考虑了两两样本之间的相似性而忽略了周围的近邻信息, 更没有顾及数据的全局性分布结构. 因此, 本文提出一种新的融合欧氏距离和 Kendall Tau距离的谱聚类方法. 该方法通过融合两两样本之间的直接距离以及其周围的近邻信息, 充分利用了不同的相似性度量可以从不同角度抓取数据之间结构信息的优势, 更加全面地反映数据的底层结构信息. 通过与传统聚类算法在UCI标准数据集上的实验结果作比较, 验证了本文的方法可以显著提高聚类效果.  相似文献   

13.
With the hyperspectral sensor technology evolving and becoming more cost-effective, hyperspectral imaging offers new opportunities for robust face recognition. Hyperspectral face cubes contain much more spectral information than face images from common RGB color cameras. Hyperspectral face recognition is robust to the impacts, such as illumination, pose, occlusion, and spoofing, which can heavily avoid the limitations of the visible-image-based face recognition.In this paper, we summarize the spectrum properties of hyperspectral face cubes and survey the hyperspectral face recognition methods in the literature. We categorize them into major groups for better understanding. We overview the existing hyperspectral face datasets, and establish our own dataset. We also discuss efficient neural networks used for mobile face recognition and conduct experiments on mobile hyperspectral face recognition. Results show that under harsh conditions like large illumination changing and pose variation, hyperspectral-cube-based methods have higher recognition accuracy than visible-image-based methods. Finally, we deliver insightful discussions and prospects for future works on mobile hyperspectral face recognition.  相似文献   

14.
目的 在行为识别任务中,妥善利用时空建模与通道之间的相关性对于捕获丰富的动作信息至关重要。尽管图卷积网络在基于骨架信息的行为识别方面取得了稳步进展,但以往的注意力机制应用于图卷积网络时,其分类效果并未获得明显提升。基于兼顾时空交互与通道依赖关系的重要性,提出了多维特征嵌合注意力机制(multi-dimensional feature fusion attention mechanism, M2FA)。方法 不同于现今广泛应用的行为识别框架研究理念,如卷积块注意力模块(convolutional block attention module, CBAM)、双流自适应图卷积网络(two-stream adaptive graph convolutional network, 2s-AGCN)等,M2FA通过嵌入在注意力机制框架中的特征融合模块显式地获取综合依赖信息。对于给定的特征图,M2FA沿着空间、时间和通道维度使用全局平均池化操作推断相应维度的特征描述符。特征图使用多维特征描述符的融合结果进行过滤学习以达到细化自适应特征的目的,并通过压缩全局动态信息的全局特征分支与仅使用逐点卷积层的局...  相似文献   

15.
针对传统航拍视频图像CNN模型天气分类效果差、无法满足移动设备应用以及现有天气图像数据集匮乏且场景单一的问题, 构建了晴天、雨天、雪天、雾天4类面向多场景的无人机航拍天气图像数据集, 并提出了基于轻量级迁移学习的无人机航拍视频图像天气场景分类模型. 该模型采用迁移学习的方法, 在ImageNet数据集上训练好两种轻量级CNN, 并设计3个轻量级CNN分支进行特征提取. 特征提取首先采用ECANet注意力机制改进的EfficientNet-b0作为主分支提取整幅图像特征, 并使用两个MobileNetv2分支分别对天空和非天空局部独有的深层特征进行提取. 其次, 通过Concatenate将这3个区域进行特征融合. 最后, 使用Softmax层对4类天气场景实现分类. 实验结果表明, 该方法应用于移动等计算受限设备时对于天气场景分类的识别准确率达到了97.3%, 有着较好的分类效果.  相似文献   

16.
In recent years, crowd counting has increasingly drawn attention due to its widespread applications in the field of computer vision. Most of the existing methods rely on datasets with scarce labeled images to train networks. They are prone to suffer from the over-fitting problem. Further, these existing datasets usually just give manually labeled annotations related to the head center position. This kind of annotation provides limited information. In this paper, we propose to exploit virtual synthetic crowd scenes to improve the performance of the counting network in the real world. Since we can obtain people masks easily in a synthetic dataset, we first learn to distinguish people from the background via a segmentation network using the synthetic data. Then we transfer the learned segmentation priors from synthetic data to real-world data. Finally, we train a density estimation network on real-world data by utilizing the obtained people masks. Our experiments on two crowd counting datasets demonstrate the effectiveness of the proposed method.  相似文献   

17.
现有人脸识别模型受口罩等遮挡因素影响导致准确率无法提升。当前主流研究方法将有无遮挡场景分开训练后,整合应用于多场景。针对遮挡人脸识别模型的局限性,提出一种改进人脸特征矫正网络(FFR-Net)模型。该模型可同时用于有无遮挡人脸识别并应用于口罩与眼镜遮挡两种识别场景中。人脸特征矫正网络模型提出了一种人脸特征矫正模块,为保证充分利用无遮挡区域特征信息,在该模块中的空间分支引入involution算子扩大图像信息交互区域,增强在空间范围内面部特征信息;在通道分支引入坐标注意力机制,捕获跨通道信息以增强特征表示,利于模型准确地定位识别目标区域;将Meta-ACON作为该模块新的动态激活函数,通过动态调整线性或非线性程度以提高模型泛化能力和计算准确度。最后,利用改进的人脸特征矫正网络模型在CASIA-Webface经处理的有无口罩遮挡人脸数据集上进行训练,其在LFW经处理的有无口罩遮挡数据集、Meglass数据集上的测试结果准确率分别达到了82.50%和89.75%,优于现有算法,验证了所提方法的有效性。  相似文献   

18.
The explosion of the Internet provides us with a tremendous resource of images shared online. It also confronts vision researchers the problem of finding effective methods to navigate the vast amount of visual information. Semantic image understanding plays a vital role towards solving this problem. One important task in image understanding is object recognition, in particular, generic object categorization. Critical to this problem are the issues of learning and dataset. Abundant data helps to train a robust recognition system, while a good object classifier can help to collect a large amount of images. This paper presents a novel object recognition algorithm that performs automatic dataset collecting and incremental model learning simultaneously. The goal of this work is to use the tremendous resources of the web to learn robust object category models for detecting and searching for objects in real-world cluttered scenes. Humans contiguously update the knowledge of objects when new examples are observed. Our framework emulates this human learning process by iteratively accumulating model knowledge and image examples. We adapt a non-parametric latent topic model and propose an incremental learning framework. Our algorithm is capable of automatically collecting much larger object category datasets for 22 randomly selected classes from the Caltech 101 dataset. Furthermore, our system offers not only more images in each object category but also a robust object category model and meaningful image annotation. Our experiments show that OPTIMOL is capable of collecting image datasets that are superior to the well known manually collected object datasets Caltech 101 and LabelMe.  相似文献   

19.
人脸识别是生物特征识别领域的一项关键技术,长期以来得到研究者的广泛关注.视频人脸识别任务特指从一段视频中提取出人脸的关键信息,从而完成身份识别.相较于基于图像的人脸识别任务来说,视频数据中的人脸变化模式更为多样且视频帧之间存在较大差异,如何从冗长而复杂的视频中抽取到人脸的关键特征成为当前的研究重点.以视频人脸识别技术为...  相似文献   

20.
The development of smart wearable devices has driven the rapid progress of activity recognition. However, existing activity recognition methods are still struggling to recognize single arm swings due to coarse-grained sensor data segmentation. Refined arm-swing-wise data segmentation is vital in some specific cases, such as the rehabilitation of disabled patients. In this paper, we propose a smartwatch-based arm-swing-wise data segmentation approach for human activity recognition, which converts original sensor signals into square-wave signals to detect the cut-off points of each arm swing. Particularly, our method can adaptively adjust the window size and step size of a sliding window without considering the change of swing speed. Empirical evaluation on two datasets, a self-collected dataset and a publicly-available benchmark dataset, shows superior performance of our approach over other methods under different settings, such as classifiers, features, and wearing positions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号