首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
冯姝 《计算机应用》2017,37(2):512-516
特征表示是人脸识别的关键问题,由于人脸图像在拍摄过程中受光照、遮挡、姿势等因素的影响,如何提取鲁棒的图像特征成了研究的重点。受卷积网络框架的启发,结合K-means算法在卷积滤波器学习中所具有的效果稳定、收敛速度快等优点,提出了一种简单有效的人脸识别方法,主要包含三个部分:卷积滤波器学习、非线性处理和空间平均值池化。具体而言,首先在训练图像中提取局部图像块,预处理后,使用K-means算法快速学习滤波器,每个滤波器与图像进行卷积运算;然后通过双曲正切函数对卷积图像进行非线性变换;最后利用空间平均值池化对图像特征进行去噪和降维。分类阶段仅采用简单的线性回归分类器。在AR和ExtendedYaleB数据集上的评估实验结果表明所提方法虽然简单却非常有效,而且对光照和遮挡表现出了强鲁棒性。  相似文献   

2.
Multimodal representation learning has gained increasing importance in various real-world multimedia applications. Most previous approaches focused on exploring inter-modal correlation by learning a common or intermediate space in a conventional way, e.g. Canonical Correlation Analysis (CCA). These works neglected the exploration of fusing multiple modalities at higher semantic level. In this paper, inspired by the success of deep networks in multimedia computing, we propose a novel unified deep neural framework for multimodal representation learning. To capture the high-level semantic correlations across modalities, we adopted deep learning feature as image representation and topic feature as text representation respectively. In joint model learning, a 5-layer neural network is designed and enforced with a supervised pre-training in the first 3 layers for intra-modal regularization. The extensive experiments on benchmark Wikipedia and MIR Flickr 25K datasets show that our approach achieves state-of-the-art results compare to both shallow and deep models in multimodal and cross-modal retrieval.  相似文献   

3.
Extensive research has been carried out in the past on face recognition, face detection, and age estimation. However, age-invariant face recognition (AIFR) has not been explored that thoroughly. The facial appearance of a person changes considerably over time that results in introducing significant intraclass variations, which makes AIFR a very challenging task. Most of the face recognition studies that have addressed the ageing problem in the past have employed complex models and handcrafted features with strong parametric assumptions. In this work, we propose a novel deep learning framework that extracts age-invariant and generalized features from facial images of the subjects. The proposed model trained on facial images from a minor part (20–30%) of lifespan of subjects correctly identifies them throughout their lifespan. A variety of pretrained 2D convolutional neural networks are compared in terms of accuracy, time, and computational complexity to select the most suitable network for AIFR. Extensive experimental results are carried out on the popular and challenging face and gesture recognition network ageing dataset. The proposed method achieves promising results and outperforms the state-of-the-art AIFR models by achieving an accuracy of 99%, which proves the effectiveness of deep learning in facial ageing research.  相似文献   

4.
Multimedia Tools and Applications - Nowadays, digital protection has become greater prominence for daily digital activities. It’s far vital for people to keep new passwords in their minds and...  相似文献   

5.
International Journal on Document Analysis and Recognition (IJDAR) - In recent past, there has been a steep increase in the use of online platforms for the search of desired products. Real estate...  相似文献   

6.
为了充分利用人脸图像的潜在信息,提出一种通过设置不同尺寸的卷积核来得到图像多尺度特征的方法,多尺度卷积自动编码器(Multi-Scale Convolutional Auto-Encoder,MSCAE)。该结构所提取的不同尺度特征反映人脸的本质信息,可以更好地还原人脸图像。这种特征提取框架是一个卷积和采样交替的层级结构,使得特征对旋转、平移、比例缩放等具有高度不变性。MSCAE以encoder-decoder模式训练得到特征提取器,用它提取特征,并融合形成用于分类的特征向量。BP神经网络在ORL和Yale人脸库上的分类结果表明,多尺度特征在识别率和性能上均优于单尺度特征。此外,MSCAE特征与HOG(Histograms of Oriented Gradients)的融合特征取得了比单一特征更高的识别率。  相似文献   

7.
8.
9.
With the development of deep learning, numerous models have been proposed for human activity recognition to achieve state-of-the-art recognition on wearable sensor data. Despite the improved accuracy achieved by previous deep learning models, activity recognition remains a challenge. This challenge is often attributed to the complexity of some specific activity patterns. Existing deep learning models proposed to address this have often recorded high overall recognition accuracy, while low recall and precision are often recorded on some individual activities due to the complexity of their patterns. Some existing models that have focused on tackling these issues are always bulky and complex. Since most embedded systems have resource constraints in terms of their processor, memory and battery capacity, it is paramount to propose efficient lightweight activity recognition models that require limited resources consumption, and still capable of achieving state-of-the-art recognition of activities, with high individual recall and precision. This research proposes a high performance, low footprint deep learning model with a squeeze and excitation block to address this challenge. The squeeze and excitation block consist of a global average-pooling layer and two fully connected layers, which were placed to extract the flattened features in the model, with best-fit reduction ratios in the squeeze and excitation block. The squeeze and excitation block served as channel-wise attention, which adjusted the weight of each channel to build more robust representations, which enabled our network to become more responsive to essential features while suppressing less important ones. By using the best-fit reduction ratio in the squeeze and excitation block, the parameters of the fully connected layer were reduced, which helped the model increase responsiveness to essential features. Experiments on three publicly available datasets (PAMAP2, WISDM, and UCI-HAR) showed that the proposed model outperformed existing state-of-the-art with fewer parameters and increased the recall and precision of some individual activities compared to the baseline, and the existing models.  相似文献   

10.
Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition. Existing approaches use directional pairwise attention or a message hub to fuse language, visual, and audio modalities. However, these fusion methods are often quadratic in complexity with respect to the modal sequence length, bring redundant information and are not efficient. In this paper, we propose an efficient neural network to learn modality-fused representations with CB-Transformer (LMR-CBT) for multimodal emotion recognition from unaligned multi-modal sequences. Specifically, we first perform feature extraction for the three modalities respectively to obtain the local structure of the sequences. Then, we design an innovative asymmetric transformer with cross-modal blocks (CB-Transformer) that enables complementary learning of different modalities, mainly divided into local temporal learning, cross-modal feature fusion and global self-attention representations. In addition, we splice the fused features with the original features to classify the emotions of the sequences. Finally, we conduct word-aligned and unaligned experiments on three challenging datasets, IEMOCAP, CMU-MOSI, and CMU-MOSEI. The experimental results show the superiority and efficiency of our proposed method in both settings. Compared with the mainstream methods, our approach reaches the state-of-the-art with a minimum number of parameters.  相似文献   

11.
Gao  Guangwei  Wang  Yannan  Huang  Pu  Chang  Heyou  Lu  Huimin  Yue  Dong 《Multimedia Tools and Applications》2020,79(21-22):14903-14917
Multimedia Tools and Applications - Matching sketch facial images to mug-shot images have crucial significance in law enforcement and digital entertainment. Conventional methods always assume that...  相似文献   

12.
In the actual working site, the equipment often works in different working conditions while the manufacturing system is rather complicated. However, traditional multi-label learning methods need to use the pre-defined label sequence or synchronously predict all labels of the input sample in the fault diagnosis domain. Deep reinforcement learning (DRL) combines the perception ability of deep learning and the decision-making ability of reinforcement learning. Moreover, the curriculum learning mechanism follows the learning approach of humans from easy to complex. Consequently, an improved proximal policy optimization (PPO) method, which is a typical algorithm in DRL, is proposed as a novel method on multi-label classification in this paper. The improved PPO method could build a relationship between several predicted labels of input sample because of designing an action history vector, which encodes all history actions selected by the agent at current time step. In two rolling bearing experiments, the diagnostic results demonstrate that the proposed method provides a higher accuracy than traditional multi-label methods on fault recognition under complicated working conditions. Besides, the proposed method could distinguish the multiple labels of input samples following the curriculum mechanism from easy to complex, compared with the same network using the pre-defined label sequence.  相似文献   

13.
14.
Linear discriminant analysis (LDA) is one of the most popular supervised feature extraction techniques used in machine learning and pattern classification. However, LDA only captures global geometrical structure information of the data and ignores the geometrical structure information of local data points. Though many articles have been published to address this issue, most of them are incomplete in the sense that only part of the local information is used. We show here that there are total three kinds of local information, namely, local similarity information, local intra-class pattern variation, and local inter-class pattern variation. We first propose a new method called enhanced within-class LDA (EWLDA) algorithm to incorporate the local similarity information, and then propose a complete framework called complete global–local LDA (CGLDA) algorithm to incorporate all these three kinds of local information. Experimental results on two image databases demonstrate the effectiveness of our algorithms.  相似文献   

15.
Multimedia Tools and Applications - The progressive growth of today’s digital world has made news spread exponentially faster on social media platforms like Twitter, Facebook, and Weibo....  相似文献   

16.
This paper presents an online learning approach to video-based face recognition that does not make any assumptions about the pose, expressions or prior localization of facial landmarks. Learning is performed online while the subject is imaged and gives near realtime feedback on the learning status. Face images are automatically clustered based on the similarity of their local features. The learning process continues until the clusters have a required minimum number of faces and the distance of the farthest face from its cluster mean is below a threshold. A voting algorithm is employed to pick the representative features of each cluster. Local features are extracted from arbitrary keypoints on faces as opposed to pre-defined landmarks and the algorithm is inherently robust to large scale pose variations and occlusions. During recognition, video frames of a probe are sequentially matched to the clusters of all individuals in the gallery and its identity is decided on the basis of best temporally cohesive cluster matches. Online experiments (using live video) were performed on a database of 50 enrolled subjects and another 22 unseen impostors. The proposed algorithm achieved a recognition rate of 97.8% and a verification rate of 100% at a false accept rate of 0.0014. For comparison, experiments were also performed using the Honda/UCSD database and 99.5% recognition rate was achieved.  相似文献   

17.
A novel cascade face recognition system using hybrid feature extraction is proposed. Three sets of face features are extracted. The merits of Two-Dimensional Complex Wavelet Transform (2D-CWT) are analyzed. For face recognition feature extraction, it has proved that 2D-CWT compares favorably with the traditionally used 2D Gabor transform in terms of the computational complexity and features? stability. The proposed recognition system congregates three Artificial Neural Network classifiers (ANNs) and a gating network trained by the three feature sets. A computationally efficient fitness function of the genetic algorithms is proposed to evolve the best weights of the ensemble classifier. Experiments demonstrated that the overall recognition rate and reliability have been significantly improved in both still face recognition and video-based face recognition.  相似文献   

18.
Multimedia Tools and Applications - Age variation is a major problem in the area of face recognition under uncontrolled environment such as pose, illumination, expression. Most of the works of this...  相似文献   

19.
Pattern Analysis and Applications - Video-based group emotion recognition is an important research area in computer vision and is of great significance for the intelligent understanding of videos...  相似文献   

20.
Sharma  Sahil  Kumar  Vijay 《Multimedia Tools and Applications》2020,79(25-26):17303-17330
Multimedia Tools and Applications - In this paper, a novel 3D face reconstruction technique is proposed along with a sequential deep learning-based framework for face recognition. It uses the...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号