首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

The use of the iris and periocular region as biometric traits has been extensively investigated, mainly due to the singularity of the iris features and the use of the periocular region when the image resolution is not sufficient to extract iris information. In addition to providing information about an individual’s identity, features extracted from these traits can also be explored to obtain other information such as the individual’s gender, the influence of drug use, the use of contact lenses, spoofing, among others. This work presents a survey of the databases created for ocular recognition, detailing their protocols and how their images were acquired. We also describe and discuss the most popular ocular recognition competitions (contests), highlighting the submitted algorithms that achieved the best results using only iris trait and also fusing iris and periocular region information. Finally, we describe some relevant works applying deep learning techniques to ocular recognition and point out new challenges and future directions. Considering that there are a large number of ocular databases, and each one is usually designed for a specific problem, we believe this survey can provide a broad overview of the challenges in ocular biometrics.

  相似文献   

2.
3.
Facial expression and emotion recognition from thermal infrared images has attracted more and more attentions in recent years. However, the features adopted in current work are either temperature statistical parameters extracted from the facial regions of interest or several hand-crafted features that are commonly used in visible spectrum. Till now there are no image features specially designed for thermal infrared images. In this paper, we propose using the deep Boltzmann machine to learn thermal features for emotion recognition from thermal infrared facial images. First, the face is located and normalized from the thermal infrared images. Then, a deep Boltzmann machine model composed of two layers is trained. The parameters of the deep Boltzmann machine model are further fine-tuned for emotion recognition after pre-training of feature learning. Comparative experimental results on the NVIE database demonstrate that our approach outperforms other approaches using temperature statistic features or hand-crafted features borrowed from visible domain. The learned features from the forehead, eye, and mouth are more effective for discriminating valence dimension of emotion than other facial areas. In addition, our study shows that adding unlabeled data from other database during training can also improve feature learning performance.  相似文献   

4.
5.
ABSTRACT

A new architecture of deep neural networks, directed acyclic graph convolutional neural networks (DAG-CNNs), is used to classify heartbeats from electrocardiogram (ECG) signals into different subject-based classes. DAG-CNNs not only fuse the feature extraction and classification stages of the ECG classification into a single automated learning procedure, but also utilized multi-scale features and perform score-level fusion of multiple classifiers automatically. Therefore, DAG-CNN negates the necessity to extract hand-crafted features. In most of the current approaches, only the high level features which extracted by the last layer of CNN are used. Instead of performing feature level fusion manually and feeding the results into a classifier, the proposed multi-scale system can automatically learn different level of features, combine them and predict the output label. The results over the MIT-BIH arrhythmia benchmarks database demonstrate that the proposed system achieves a superior classification performance compared to most of the state-of-the-art methods.  相似文献   

6.
摘 要:近年来,车标识别因其在智能交通系统中的重要作用,受到研究者的广泛关注。 传统的车标识别算法多基于手工描述子,需要丰富的先验知识,且难以适应复杂多变的现实应 用场景。相比手工描述子,特征学习方法在解决复杂场景的计算机视觉问题时具有更优性能。 因此,提出一种基于目标优化学习的车标识别方法,基于从原图像中提取的像素梯度差矩阵, 通过目标优化,自主学习特征参数。然后将像素梯度差矩阵映射为紧凑的二值矩阵,通过特征 码本的方式对特征信息进行编码,生成鲁棒的特征向量。基于公开车标数据集 HFUT-VL1 和 XMU 进行实验,并与其他车标识别方法进行比较。实验结果表明,与基于传统特征描述子的 方法相比,该算法识别率更高,与基于深度学习的方法相比,训练和测试时间更少。  相似文献   

7.
8.
陈师哲  王帅  金琴 《软件学报》2018,29(4):1060-1070
自动情感识别是一个非常具有挑战性的课题,并且有着广泛的应用价值.本文探讨了在多文化场景下的多模态情感识别问题.我们从语音声学和面部表情等模态分别提取了不同的情感特征,包括传统的手工定制特征和基于深度学习的特征,并通过多模态融合方法结合不同的模态,比较不同单模态特征和多模态特征融合的情感识别性能.我们在CHEAVD中文多模态情感数据集和AFEW英文多模态情感数据集进行实验,通过跨文化情感识别研究,我们验证了文化因素对于情感识别的重要影响,并提出3种训练策略提高在多文化场景下情感识别的性能,包括:分文化选择模型、多文化联合训练以及基于共同情感空间的多文化联合训练,其中基于共同情感空间的多文化联合训练通过将文化影响与情感特征分离,在语音和多模态情感识别中均取得最好的识别效果.  相似文献   

9.
Li  Chaobo  Zhou  Ze  Li  Hongjun  Xie  Zhengguang  Zhang  Guoan 《Multimedia Tools and Applications》2022,81(15):21027-21045

In order to integrate the ability of feature extraction of deep structure and short training time of broad structure, we propose a novel Vertical-Cross-Horizontal Network (VCHN) for data recognition, which mainly contains vertical operation, horizontal operation, nonlinear mapping and recognition decision. For vertical operation, we design a hierarchical structure, which is responsible for providing structural conditions for features evolution that are significant for classification decision. For horizontal operation, we use the fuzzy system with interpretability to design an expandable group of fuzzy subsystems to extract diverse features as much as possible, trying to replace the high-level features extracted via cascading more hidden layers. In that way, it mitigates the time-consuming burden generated by vertically deepening network blindly. The nonlinear mapping is used to transform extracted features into nonlinear ones, which are utilized to calculate the outputs for recognition decision. Extensive experiments show that the recognition accuracy of proposed method are 99.37% and 98.47% on ORL and EYaleB datasets, respectively. The proposed VCHN can not only mine the discriminative features via vertical operation, but also shorten the training time via the horizontal operation, which outperforms the other methods.

  相似文献   

10.

The ever-growing video streaming services require accurate quality assessment with often no reference to the original media. One primary challenge in developing no-reference (NR) video quality metrics is achieving real-timeliness while retaining the accuracy. A real-time no-reference video quality assessment (VQA) method is proposed for videos encoded by H.264/AVC codec. Temporal and spatial features are extracted from the encoded bit-stream and pixel values to train and validate a fully connected neural network. The hand-crafted features and network dynamics are designed in a manner to ensure a high correlation with human judgment of quality as well as minimizing the computational complexities. Proof-of-concept experiments are conducted via comparison with: 1) video sequences rated by a full-reference quality metric, and 2) H.264-encoded sequences from the LIVE video dataset which are subjectively evaluated through differential mean opinion scores (DMOS). The performance of the proposed method is verified by correlation measurements with the aforementioned objective and subjective scores. The framework achieves real-time execution while outperforming state-of-art full-reference and no-reference video quality assessment methods.

  相似文献   

11.
12.
目前人脸正面化研究主要解决人脸偏转问题,而对监控视频等现实场景中同时受偏转和俯仰变化影响的侧脸的正面化生成关注较少,针对这个问题和多角度侧脸生成的正面人脸图存在身份信息保留不全的问题,提出了一种基于特征图对称模块和眼周特征保留损失的生成对抗网络(GAN)。首先,根据人脸对称性先验,提出特征图对称模块,先使用人脸关键点检测器检测出侧脸鼻尖点位置,再将编码器提取到的特征图依照鼻尖位置进行镜像对称,从而在特征层面上缓解面部信息缺失的问题。其次,借鉴眼周识别思想,在现有的生成图身份保留方法中加入了眼周特征保留损失以训练生成器生成逼真的且保留身份信息的人脸正面图像。实验结果表明,所提算法得到的生成图面部细节保留较好,且在CAS-PEAL-R1数据集的所有俯角下人脸的平均Rank-1识别率为99.03%,可见该算法能够有效解决多角度侧脸的正面化问题。  相似文献   

13.

Emotion recognition from speech signals is an interesting research with several applications like smart healthcare, autonomous voice response systems, assessing situational seriousness by caller affective state analysis in emergency centers, and other smart affective services. In this paper, we present a study of speech emotion recognition based on the features extracted from spectrograms using a deep convolutional neural network (CNN) with rectangular kernels. Typically, CNNs have square shaped kernels and pooling operators at various layers, which are suited for 2D image data. However, in case of spectrograms, the information is encoded in a slightly different manner. Time is represented along the x-axis and y-axis shows frequency of the speech signal, whereas, the amplitude is indicated by the intensity value in the spectrogram at a particular position. To analyze speech through spectrograms, we propose rectangular kernels of varying shapes and sizes, along with max pooling in rectangular neighborhoods, to extract discriminative features. The proposed scheme effectively learns discriminative features from speech spectrograms and performs better than many state-of-the-art techniques when evaluated its performance on Emo-DB and Korean speech dataset.

  相似文献   

14.
The periocular region is the part of the face immediately surrounding the eye, and researchers have recently begun to investigate how to use the periocular region for recognition. Understanding how humans recognize faces helped computer vision researchers develop algorithms for face recognition. Likewise, understanding how humans analyze periocular images could benefit researchers developing algorithms for periocular recognition. We conducted two experiments to determine how humans analyze periocular images. In these experiments, we presented pairs of images and asked volunteers to determine whether the two images showed eyes from the same subject or from different subjects. In the first experiment, subjects were paired randomly to create different-subject queries. Our volunteers correctly determined the relationship between the two images in 92% of the queries. In the second experiment, we considered multiple factors in forming different-subject pairs; queries were formed from pairs of subjects with the same gender and race, and with similar eye color, makeup, eyelash length, and eye occlusion. In addition, we limited the amount of time volunteers could view a query pair. On this harder experiment, the correct verification rate was 79%. We asked volunteers to describe what features in the images were helpful to them in making their decisions. In both experiments, eyelashes were reported to be the most helpful feature.  相似文献   

15.
Smile or happiness is one of the most universal facial expressions in our daily life. Smile detection in the wild is an important and challenging problem, which has attracted a growing attention from affective computing community. In this paper, we present an efficient approach for smile detection in the wild with deep learning. Different from some previous work which extracted hand-crafted features from face images and trained a classifier to perform smile recognition in a two-step approach, deep learning can effectively combine feature learning and classification into a single model. In this study, we apply the deep convolutional network, a popular deep learning model, to handle this problem. We construct a deep convolutional network called Smile-CNN to perform feature learning and smile detection simultaneously. Experimental results demonstrate that although a deep learning model is generally developed for tackling “big data,” the model can also effectively deal with “small data.” We further investigate into the discriminative power of the learned features, which are taken from the neuron activations of the last hidden layer of our Smile-CNN. By using the learned features to train an SVM or AdaBoost classifier, we show that the learned features have impressive discriminative ability. Experiments conducted on the GENKI4K database demonstrate that our approach can achieve a promising performance in smile detection.  相似文献   

16.
目的 视频行为识别一直广受计算机视觉领域研究者的关注,主要包括个体行为识别与群体行为识别。群体行为识别以人群动作作为研究对象,对其行为进行有效表示及分类,在智能监控、运动分析以及视频检索等领域有重要的应用价值。现有的算法大多以多层递归神经网络(RNN)模型作为基础,构建出可表征个体与所属群体之间关系的群体行为特征,但是未能充分考虑个体之间的相互影响,致使识别精度较低。为此,提出一种基于非局部卷积神经网络的群体行为识别模型,充分利用个体间上下文信息,有效提升了群体行为识别准确率。方法 所提模型采用一种自底向上的方式来同时对个体行为与群体行为进行分层识别。首先从原始视频中沿着个人运动的轨迹导出个体附近的图像区块;随后使用非局部卷积神经网络(CNN)来提取包含个体间影响关系的静态特征,紧接着将提取到的个体静态特征输入多层长短期记忆(LSTM)时序模型中,得到个体动态特征并通过个体特征聚合得到群体行为特征;最后利用个体、群体行为特征同时完成个体行为与群体行为的识别。结果 本文在国际通用的Volleyball Dataset上进行实验。实验结果表明,所提模型在未进行群体精细划分条件下取得了77.6%的准确率,在群体精细划分的条件下取得了83.5%的准确率。结论 首次提出了面向群体行为识别的非局部卷积网络,并依此构建了一种非局部群体行为识别模型。所提模型通过考虑个体之间的相互影响,结合个体上下文信息,可从训练数据中学习到更具判别性的群体行为特征。该特征既包含个体间上下文信息、也保留了群体内层次结构信息,更有利于最终的群体行为分类。  相似文献   

17.
With the rapid growth of the Internet of Things (IoT), smart systems and applications are equipped with an increasing number of wearable sensors and mobile devices. These sensors are used not only to collect data but, more importantly, to assist in tracking and analyzing the daily human activities. Sensor-based human activity recognition is a hotspot and starts to employ deep learning approaches to supersede traditional shallow learning that rely on hand-crafted features. Although many successful methods have been proposed, there are three challenges to overcome: (1) deep model’s performance overly depends on the data size; (2) deep model cannot explicitly capture abundant sample distribution characteristics; (3) deep model cannot jointly consider sample features, sample distribution characteristics, and the relationship between the two. To address these issues, we propose a meta-learning-based graph prototypical model with priority attention mechanism for sensor-based human activity recognition. This approach learns not only sample features and sample distribution characteristics via meta-learning-based graph prototypical model, but also the embeddings derived from priority attention mechanism that mines and utilizes relations between sample features and sample distribution characteristics. What is more, the knowledge learned through our approach can be seen as a priori applicable to improve the performance for other general reasoning tasks. Experimental results on fourteen datasets demonstrate that the proposed approach significantly outperforms other state-of-the-art methods. On the other hand, experiments of applying our model to two other tasks show that our model effectively supports other recognition tasks related to human activity and improves performance on the datasets of these tasks.  相似文献   

18.
19.
Traditional algorithms to design hand-crafted features for action recognition have been a hot research area in the last decade. Compared to RGB video, depth sequence is more insensitive to lighting changes and more discriminative due to its capability to catch geometric information of object. Unlike many existing methods for action recognition which depend on well-designed features, this paper studies deep learning-based action recognition using depth sequences and the corresponding skeleton joint information. Firstly, we construct a 3D-based Deep Convolutional Neural Network (3D2CNN) to directly learn spatio-temporal features from raw depth sequences, then compute a joint based feature vector named JointVector for each sequence by taking into account the simple position and angle information between skeleton joints. Finally, support vector machine (SVM) classification results from 3D2CNN learned features and JointVector are fused to take action recognition. Experimental results demonstrate that our method can learn feature representation which is time-invariant and viewpoint-invariant from depth sequences. The proposed method achieves comparable results to the state-of-the-art methods on the UTKinect-Action3D dataset and achieves superior performance in comparison to baseline methods on the MSR-Action3D dataset. We further investigate the generalization of the trained model by transferring the learned features from one dataset (MSR-Action3D) to another dataset (UTKinect-Action3D) without retraining and obtain very promising classification accuracy.  相似文献   

20.

In this article, we are addressing the question of effective usage of the feature set extracted from deep learning models pre-trained on ImageNet. Exploring this option will offer very fast and attractive alternative to transfer learning strategies. The traditional task of skin lesion recognition consists of several stages, where the automated system is typically trained on preprocessed images with known diagnosis, which allows classification of new samples to predefined categories. For this task, we are proposing here an improved melanoma detection method based on the combination of linear discriminant analysis (LDA) and the features extracted from the deep learning approach. We are examining the usage of the LDA approach on activation of the fully-connected layer of deep learning in order to increase the classification accuracy and at the same time to reduce the feature space dimensionality. We tested our method on five different classifiers and evaluated results using various metrics. The presented comparison demonstrates the very high effectiveness of the suggested feature reduction, which leads not only to the significant lowering of employed features but also to the increasing performance of all tested classifiers in almost all measured characteristics.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号