共查询到20条相似文献,搜索用时 15 毫秒
1.
Multimodal representation learning has gained increasing importance in various real-world multimedia applications. Most previous approaches focused on exploring inter-modal correlation by learning a common or intermediate space in a conventional way, e.g. Canonical Correlation Analysis (CCA). These works neglected the exploration of fusing multiple modalities at higher semantic level. In this paper, inspired by the success of deep networks in multimedia computing, we propose a novel unified deep neural framework for multimodal representation learning. To capture the high-level semantic correlations across modalities, we adopted deep learning feature as image representation and topic feature as text representation respectively. In joint model learning, a 5-layer neural network is designed and enforced with a supervised pre-training in the first 3 layers for intra-modal regularization. The extensive experiments on benchmark Wikipedia and MIR Flickr 25K datasets show that our approach achieves state-of-the-art results compare to both shallow and deep models in multimodal and cross-modal retrieval. 相似文献
2.
Multimedia Tools and Applications - Nowadays, digital protection has become greater prominence for daily digital activities. It’s far vital for people to keep new passwords in their minds and... 相似文献
3.
Sharma D. Gupta N. Chattopadhyay C. Mehta S. 《International Journal on Document Analysis and Recognition》2019,22(4):417-429
International Journal on Document Analysis and Recognition (IJDAR) - In recent past, there has been a steep increase in the use of online platforms for the search of desired products. Real estate... 相似文献
4.
5.
6.
Gao Guangwei Wang Yannan Huang Pu Chang Heyou Lu Huimin Yue Dong 《Multimedia Tools and Applications》2020,79(21-22):14903-14917
Multimedia Tools and Applications - Matching sketch facial images to mug-shot images have crucial significance in law enforcement and digital entertainment. Conventional methods always assume that... 相似文献
7.
Linear discriminant analysis (LDA) is one of the most popular supervised feature extraction techniques used in machine learning and pattern classification. However, LDA only captures global geometrical structure information of the data and ignores the geometrical structure information of local data points. Though many articles have been published to address this issue, most of them are incomplete in the sense that only part of the local information is used. We show here that there are total three kinds of local information, namely, local similarity information, local intra-class pattern variation, and local inter-class pattern variation. We first propose a new method called enhanced within-class LDA (EWLDA) algorithm to incorporate the local similarity information, and then propose a complete framework called complete global–local LDA (CGLDA) algorithm to incorporate all these three kinds of local information. Experimental results on two image databases demonstrate the effectiveness of our algorithms. 相似文献
8.
Palani Balasubramanian Elango Sivasankar Viswanathan K Vignesh 《Multimedia Tools and Applications》2022,81(4):5587-5620
Multimedia Tools and Applications - The progressive growth of today’s digital world has made news spread exponentially faster on social media platforms like Twitter, Facebook, and Weibo.... 相似文献
9.
Ajmal Mian Author Vitae 《Pattern recognition》2011,44(5):1068-1075
This paper presents an online learning approach to video-based face recognition that does not make any assumptions about the pose, expressions or prior localization of facial landmarks. Learning is performed online while the subject is imaged and gives near realtime feedback on the learning status. Face images are automatically clustered based on the similarity of their local features. The learning process continues until the clusters have a required minimum number of faces and the distance of the farthest face from its cluster mean is below a threshold. A voting algorithm is employed to pick the representative features of each cluster. Local features are extracted from arbitrary keypoints on faces as opposed to pre-defined landmarks and the algorithm is inherently robust to large scale pose variations and occlusions. During recognition, video frames of a probe are sequentially matched to the clusters of all individuals in the gallery and its identity is decided on the basis of best temporally cohesive cluster matches. Online experiments (using live video) were performed on a database of 50 enrolled subjects and another 22 unseen impostors. The proposed algorithm achieved a recognition rate of 97.8% and a verification rate of 100% at a false accept rate of 0.0014. For comparison, experiments were also performed using the Honda/UCSD database and 99.5% recognition rate was achieved. 相似文献
10.
A novel cascade face recognition system using hybrid feature extraction is proposed. Three sets of face features are extracted. The merits of Two-Dimensional Complex Wavelet Transform (2D-CWT) are analyzed. For face recognition feature extraction, it has proved that 2D-CWT compares favorably with the traditionally used 2D Gabor transform in terms of the computational complexity and features? stability. The proposed recognition system congregates three Artificial Neural Network classifiers (ANNs) and a gating network trained by the three feature sets. A computationally efficient fitness function of the genetic algorithms is proposed to evolve the best weights of the ensemble classifier. Experiments demonstrated that the overall recognition rate and reliability have been significantly improved in both still face recognition and video-based face recognition. 相似文献
11.
Multimedia Tools and Applications - Age variation is a major problem in the area of face recognition under uncontrolled environment such as pose, illumination, expression. Most of the works of this... 相似文献
12.
Pu Ying-Hung Chiu Po-Sheng Tsai Yu-Shiuan Liu Meng-Tsung Hsieh Yi-Zeng Lin Shih-Syun 《The Journal of supercomputing》2022,78(4):5285-5305
The Journal of Supercomputing - Because of the rise of deep learning and neural networks, algorithms based on deep learning have also been developed and subtly applied in daily life. This paper... 相似文献
13.
Multimedia Tools and Applications - In this paper, a novel 3D face reconstruction technique is proposed along with a sequential deep learning-based framework for face recognition. It uses the... 相似文献
14.
Santu Rana Author Vitae Wanquan Liu Author Vitae Author Vitae Svetha Venkatesh Author Vitae 《Pattern recognition》2009,42(11):2850-2862
In this paper we propose a new optimization framework that unites some of the existing tensor based methods for face recognition on a common mathematical basis. Tensor based approaches rely on the ability to decompose an image into its constituent factors (i.e. person, lighting, viewpoint, etc.) and then utilizing these factor spaces for recognition. We first develop a multilinear optimization problem relating an image to its constituent factors and then develop our framework by formulating a set of strategies that can be followed to solve this optimization problem. The novelty of our research is that the proposed framework offers an effective methodology for explicit non-empirical comparison of the different tensor methods as well as providing a way to determine the applicability of these methods in respect to different recognition scenarios. Importantly, the framework allows the comparative analysis on the basis of quality of solutions offered by these methods. Our theoretical contribution has been validated by extensive experimental results using four benchmark datasets which we present along with a detailed discussion. 相似文献
15.
A unified framework for subspace face recognition 总被引:2,自引:0,他引:2
PCA, LDA, and Bayesian analysis are the three most representative subspace face recognition approaches. In this paper, we show that they can be unified under the same framework. We first model face difference with three components: intrinsic difference, transformation difference, and noise. A unified framework is then constructed by using this face difference model and a detailed subspace analysis on the three components. We explain the inherent relationship among different subspace methods and their unique contributions to the extraction of discriminating information from the face difference. Based on the framework, a unified subspace analysis method is developed using PCA, Bayes, and LDA as three steps. A 3D parameter space is constructed using the three subspace dimensions as axes. Searching through this parameter space, we achieve better recognition performance than standard subspace methods. 相似文献
16.
In this paper, a novel learning methodology for face recognition, LearnIng From Testing data (LIFT) framework, is proposed. Considering many face recognition problems featured by the inadequate training examples and availability of the vast testing examples, we aim to explore the useful information from the testing data to facilitate learning. The one-against-all technique is integrated into the learning system to recover the labels of the testing data, and then expand the training population by such recovered data. In this paper, neural networks and support vector machines are used as the base learning models. Furthermore, we integrate two other transductive methods, consistency method and LRGA method into the LIFT framework. Experimental results and various hypothesis testing over five popular face benchmarks illustrate the effectiveness of the proposed framework. 相似文献
17.
The quality of biometric samples plays an important role in biometric authentication systems because it has a direct impact on verification or identification performance. In this paper, we present a novel 3D face recognition system which performs quality assessment on input images prior to recognition. More specifically, a reject option is provided to allow the system operator to eliminate the incoming images of poor quality, e.g. failure acquisition of 3D image, exaggerated facial expressions, etc.. Furthermore, an automated approach for preprocessing is presented to reduce the number of failure cases in that stage. The experimental results show that the 3D face recognition performance is significantly improved by taking the quality of 3D facial images into account. The proposed system achieves the verification rate of 97.09% at the False Acceptance Rate (FAR) of 0.1% on the FRGC v2.0 data set. 相似文献
18.
Gu-Min Jeong Hyun-Sik Ahn Sang-Il Choi Nojun Kwak Chanwoo Moon 《International Journal of Control, Automation and Systems》2010,8(1):141-148
In this paper, we propose a new pattern recognition method using feature feedback and present its application to face recognition.
Conventional pattern recognition methods extract the features employed for classification using PCA, LDA and so on. On the
other hand, in the proposed method, the extracted features are analyzed in the original space using feature feedback. Using
reverse mapping from the extracted features to the original space, we can identify the important part of the original data
that affects the classification. In this way, we can modify the data to obtain a higher classification rate, make it more
compact or abbreviate the required sensors. To verify the applicability of the proposed method, we apply it to face recognition
using the Yale Face Database. Each face image is divided into two parts, the important part and unimportant part, using feature
feedback, and the classification performed using the feature mask obtained from feature feedback. Also, we combine face recognition
with image compression. The experimental results show that the proposed method works well. 相似文献
19.
Alahmadi Amani Hussain Muhammad Aboalsamh Hatim A. Zuair Mansour 《Pattern Analysis & Applications》2020,23(2):673-682
Pattern Analysis and Applications - Human face is a widely used biometric modality for verification and revealing the identity of a person. In spite of a great deal of research on face recognition,... 相似文献
20.
This paper presents an unsupervised deep learning framework that derives spatio-temporal features for human–robot interaction. The respective models extract high-level features from low-level ones through a hierarchical network, viz. the Hierarchical Temporal Memory (HTM), providing at the same time a solution to the curse of dimensionality in shallow techniques. The presented work incorporates the tensor-based framework within the operation of the nodes and, thus, enhances the feature derivation procedure. This is due to the fact that tensors allow the preservation of the initial data format and their respective correlation and, moreover, attain more compact representations. The computational nodes form spatial and temporal groups by exploiting the multilinear algebra and subsequently express the samples according to those groups in terms of proximity. This generic framework may be applied in a diverse of visual data, while it has been examined on sequences of color and depth images, exhibiting remarkable performance. 相似文献