期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Facial expression recognition of a speaker using front-view face judgment, vowel judgment, and thermal image processing 总被引：1，自引：1，他引：0

Tomoko Fujimura Yasunari Yoshitomi Taro Asada Masayoshi Tabuse 《Artificial Life and Robotics》2011,16(3):411-417

For facial expression recognition, we selected three images: (i) just before speaking, (ii) speaking the first vowel, and (iii) speaking the last vowel in an utterance. In this study, as a pre-processing module, we added a judgment function to distinguish a front-view face for facial expression recognition. A frame of the front-view face in a dynamic image is selected by estimating the face direction. The judgment function measures four feature parameters using thermal image processing, and selects the thermal images that have all the values of the feature parameters within limited ranges which were decided on the basis of training thermal images of front-view faces. As an initial investigation, we adopted the utterance of the Japanese name “Taro,” which is semantically neutral. The mean judgment accuracy of the front-view face was 99.5% for six subjects who changed their face direction freely. Using the proposed method, the facial expressions of six subjects were distinguishable with 84.0% accuracy when they exhibited one of the intentional facial expressions of “angry,” “happy,” “neutral,” “sad,” and “surprised.” We expect the proposed method to be applicable for recognizing facial expressions in daily conversation. 相似文献

2.

A system for facial expression recognition of a speaker using front-view face judgment, vowel judgment, and thermal image processing 总被引：1，自引：1，他引：0

Taro Asada Yasunari Yoshitomi Masayoshi Tabuse 《Artificial Life and Robotics》2012,17(2):263-269

For facial expression recognition, we previously selected three images: (1) just before speaking, and speaking (2) the first vowel and (3) the last vowel in an utterance. A frame of the front-view face in a dynamic image was selected by estimating the face direction. Based on our method, we have been developing an on-line system for recognizing the facial expression of a speaker using front-view face judgment, vowel judgment, and thermal image processing. In the proposed system, we used three personal computers connected by cables to form a local area network. As an initial investigation, we adopted the utterance of the Japanese name ??Taro,?? which is semantically neutral. Using the proposed system, the facial expressions of one male subject were discriminable with 76?% accuracy when he exhibited one of the intentional facial expressions of ??angry,?? ??happy,?? ??neutral,?? ??sad,?? and ??surprised.?? 相似文献

3.

Facial expression recognition of a speaker using thermal image processing and reject criteria in feature vector space

Yuu Nakanishi Yasunari Yoshitomi Taro Asada Masayoshi Tabuse 《Artificial Life and Robotics》2014,19(1):76-88

In our previously developed method for the facial expression recognition of a speaker, the positions of feature vectors in the feature vector space in image processing were generated with imperfections. The imperfections, which caused misrecognition of the facial expression, tended to be far from the center of gravity of the class to which the feature vectors belonged. In the present study, to omit the feature vectors generated with imperfections, a method using reject criteria in the feature vector space was applied to facial expression recognition. Using the proposed method, the facial expressions of two subjects were discriminable with 86.8 % accuracy for the three facial expressions of “happy”, “neutral”, and “others” when they exhibited one of the five intentional facial expressions of “angry”, “happy”, “neutral”, “sad”, and “surprised”, whereas these expressions were discriminable with 78.0 % accuracy by the conventional method. Moreover, the proposed method effectively judged whether the training data were acceptable for facial expression recognition at the moment. 相似文献

4.

Robust facial expression recognition of a speaker using thermal image processing and updating of fundamental training data

Yuu Nakanishi Yasunari Yoshitomi Taro Asada Masayoshi Tabuse 《Artificial Life and Robotics》2013,17(3-4):342-349

We previously developed a method for the facial expression recognition of a speaker. For facial expression recognition, we selected three static images at the timing positions of just before speaking and while speaking the phonemes of the first and last vowels. Then, only the static image of the front-view face was used for facial expression recognition. However, frequent updates of the training data were time-consuming. To reduce the time for updates, we found that the classifications of “neutral”, “happy”, and “others” were efficient and accurate for facial expression recognition. Using the proposed method with updated training data of “happy” and “neutral” after an interval such as approximately three and a half years, the facial expressions of two subjects were discriminable with 87.0 % accuracy for the facial expressions of “happy”, “neutral”, and “others” when exhibiting the intentional facial expressions of “angry”, “happy”, “neutral”, “sad”, and “surprised”. 相似文献

5.

Speech synthesis of emotions using vowel features of a speaker

Kanu Boku Taro Asada Yasunari Yoshitomi Masayoshi Tabuse 《Artificial Life and Robotics》2014,19(1):27-32

Recently, methods for adding emotion to synthetic speech have received considerable attention in the field of speech synthesis research. We previously proposed a case-based method for generating emotional synthetic speech by exploiting the characteristics of the maximum amplitude and the utterance time of vowels, and the fundamental frequency of emotional speech. In the present study, we propose a method in which our reported method is further improved by controlling the fundamental frequency of emotional synthetic speech. As an initial investigation, we adopted the utterance of a Japanese name that is semantically neutral. By using the proposed method, emotional synthetic speech made from the emotional speech of one male subject was discriminable with a mean accuracy of 83.9 % when 18 subjects listened to the emotional synthetic utterances of “angry,” “happy,” “neutral,” “sad,” or “surprised” when the utterance was the Japanese name “Taro,” or “Hiroko.” Further adjustment of fundamental frequency in the proposed method made a much clearer impression on the subjects for emotional synthetic speech. 相似文献

6.

Facial expression recognition using bag of distances

Fu-Song Hsu Wei-Yang Lin Tzu-Wei Tsai 《Multimedia Tools and Applications》2014,73(1):309-326

The automatic recognition of facial expressions is critical to applications that are required to recognize human emotions, such as multimodal user interfaces. A novel framework for recognizing facial expressions is presented in this paper. First, distance-based features are introduced and are integrated to yield an improved discriminative power. Second, a bag of distances model is applied to comprehend training images and to construct codebooks automatically. Third, the combined distance-based features are transformed into mid-level features using the trained codebooks. Finally, a support vector machine (SVM) classifier for recognizing facial expressions can be trained. The results of this study show that the proposed approach outperforms the state-of-the-art methods regarding the recognition rate, using a CK+ dataset. 相似文献

7.

Facial expression recognition using ASM-based post-processing technique

D. J. Kim 《Pattern Recognition and Image Analysis》2016,26(3):576-581

Facial expression recognition is a challenging field in numerous researches, and impacts important applications in many areas such as human-computer interaction and data-driven animation, etc. Therefore, this paper proposes a facial expression recognition system using active shape model (ASM) landmark information and appearance-based classification algorithm, i.e., embedded hidden Markov model (EHMM). First, we use ASM landmark information for facial image normalization and weight factors of probability resulted from EHMM. The weight factor is calculated through investigating Kullback-Leibler (KL) divergence of best feature with high discrimination power. Next, we introduce the appearance-based recognition algorithm for classification of emotion states. Here, appearance-based recognition means the EHMM algorithm using two-dimensional discrete cosine transform (2D-DCT) feature vector. The performance evaluation of proposed method was performed with the CK facial expression database and the JAFFE database. As a result, the method using ASM information showed performance improvements of 6.5 and 2.5% compared to previous method using ASM-based face alignment for CK database and JAFFE database, respectively. 相似文献

8.

基于生物启发特征和SVM的人脸表情识别

穆国旺王阳郭蔚《计算机工程与应用》2014,50(17):164-168

将C1特征应用于静态图像人脸表情识别,提出了一种新的基于生物启发特征和SVM的表情识别算法。提取人脸图像的C1特征,利用PCA+LDA方法对特征进行降维,用SVM进行分类。在JAFFE和Extended Cohn-Kanade（CK+）人脸表情数据库上的实验结果表明,该算法具有较高的识别率,是一种有效的人脸表情识别方法。相似文献

9.

Facial expression recognition using constructive feedforward neural networks 总被引：6，自引：0，他引：6

Ma L. Khorasani K. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2004,34(3):1588-1595

A new technique for facial expression recognition is proposed, which uses the two-dimensional (2D) discrete cosine transform (DCT) over the entire face image as a feature detector and a constructive one-hidden-layer feedforward neural network as a facial expression classifier. An input-side pruning technique, proposed previously by the authors, is also incorporated into the constructive learning process to reduce the network size without sacrificing the performance of the resulting network. The proposed technique is applied to a database consisting of images of 60 men, each having five facial expression images (neutral, smile, anger, sadness, and surprise). Images of 40 men are used for network training, and the remaining images of 20 men are used for generalization and testing. Confusion matrices calculated in both network training and generalization for four facial expressions (smile, anger, sadness, and surprise) are used to evaluate the performance of the trained network. It is demonstrated that the best recognition rates are 100% and 93.75% (without rejection), for the training and generalizing images, respectively. Furthermore, the input-side weights of the constructed network are reduced by approximately 30% using our pruning method. In comparison with the fixed structure back propagation-based recognition methods in the literature, the proposed technique constructs one-hidden-layer feedforward neural network with fewer number of hidden units and weights, while simultaneously provide improved generalization and recognition performance capabilities. 相似文献

10.

Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

Thirumuru Ramakrishna Gangashetty Suryakanth V. Vuppala Anil Kumar 《Multimedia Tools and Applications》2018,77(4):4753-4767

Multimedia Tools and Applications - Vowels are produced with an open configuration of the vocal tract, without any audible friction. The acoustic signal is relatively loud with varying strength of... 相似文献

11.

类间学习神经网络的人脸表情识别*

周书仁梁昔明杨秋芬刘国英《计算机应用研究》2008,25(7):2219-2222

针对目前表情识别类间信息无关状态,提出了一种表情类间学习的神经网络分类识别算法。该算法首先构建一个BP网络学习对和一个距离判据单元,该距离判据单元仅用来计算类间的实际距离,类间期望距离是根据大量实验结果获得的;然后通过类内实际输出和类间期望距离来修正该网络;最后给出一组实例样本进行表情分类识别。实验结果表明,该算法能有效地识别人脸表情,能紧密地将各类表情间的信息联系起来,效率和准确性均有明显提高。相似文献

12.

Pattern recognition in image processing using interpixel correlation

Sclove SL 《IEEE transactions on pattern analysis and machine intelligence》1981,(2):206-208

When within-object interpixel correlation varies appreciably from object to object, it may be important for the classifier to utilize this correlation, as well as the mean and variance of pixel intensities. In this correspondence interpixel correlation is brought into the classification scheme by means of a two-dimensional Markov model. 相似文献

13.

基于VGG-NET的特征融合面部表情识别

李校林钮海涛《计算机工程与科学》2020,42(3):500-509

为了解决在面部表情特征提取过程中卷积神经网络CNN和局部二值模式LBP只能提取面部表情图像的单一特征,难以提取与面部变化高度相关的精确特征的问题,提出了一种基于深度学习的特征融合的表情识别方法。该方法将LBP特征和CNN卷积层提取的特征通过加权的方式结合在改进的VGG-16网络连接层中,最后将融合特征送入Softmax分类器获取各类特征的概率,完成基本的6种表情分类。实验结果表明,所提方法在CK+和JAFFE数据集上的平均识别准确率分别达到了97.5%和97.62%,利用融合特征得到的识别结果明显优于利用单一特征识别的效果。与其他方法相比较,该方法能有效提高表情识别准确率,对光照变化更加鲁棒。相似文献

14.

基于DNN处理的鲁棒性I-Vector说话人识别算法

下载免费PDF全文

王昕张洪冉《计算机工程与应用》2018,54(22):167-172

提出了一种将基于深度神经网络（Deep Neural Network,DNN）特征映射的回归分析模型应用到身份认证矢量（identity vector,i-vector）/概率线性判别分析（Probabilistic Linear Discriminant Analysis,PLDA）说话人系统模型中的方法。DNN通过拟合含噪语音和纯净语音i-vector之间的非线性函数关系,得到纯净语音i-vector的近似表征,达到降低噪声对系统性能影响的目的。在TIMIT数据集上的实验验证了该方法的可行性和有效性。相似文献

15.

The facial expression recognition technology under image processing and neural network

Zhao Dezhu Qian Yufeng Liu Jun Yang Min 《The Journal of supercomputing》2022,78(4):4681-4708

The Journal of Supercomputing - A facial expression recognition (FER) algorithm is built on the advanced convolutional neural network (CNN) to improve the current FER algorithms’ recognition... 相似文献

16.

Multimedia document retrieval using speech and speaker recognition

Mahesh Viswanathan Homayoon S.M. Beigi Satya Dharanipragada Fereydoun Maali Alain Tritschler 《International Journal on Document Analysis and Recognition》2000,2(4):147-162

Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions. 相似文献

17.

Text-independent speaker recognition using LSTM-RNN and speech enhancement

El-Moneim Samia Abd Nassar M. A. Dessouky Moawad I. Ismail Nabil A. El-Fishawy Adel S. Abd El-Samie Fathi E. 《Multimedia Tools and Applications》2020,79(33-34):24013-24028

Speaker recognition revolution has lead to the inclusion of speaker recognition modules in several commercial products. Most published algorithms for speaker recognition focus on text-dependent speaker recognition. In contrast, text-independent speaker recognition is more advantageous as the client can talk freely to the system. In this paper, text-independent speaker recognition is considered in the presence of some degradation effects such as noise and reverberation. Mel-Frequency Cepstral Coefficients (MFCCs), spectrum and log-spectrum are used for feature extraction from the speech signals. These features are processed with the Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) as a classification tool to complete the speaker recognition task. The network learns to recognize the speakers efficiently in a text-independent manner, when the recording circumstances are the same. The recognition rate reaches 95.33% using MFCCs, while it is increased to 98.7% when using spectrum or log-spectrum. However, the system has some challenges to recognize speakers from different recording environments. Hence, different speech enhancement techniques, such as spectral subtraction and wavelet denoising, are used to improve the recognition performance to some extent. The proposed approach shows superiority, when compared to the algorithm of R. Togneri and D. Pullella (2011).

相似文献

18.

人脸性别约束下的深度随机森林表情识别

钟良骥廖海斌《控制与决策》2021,36(7):1693-1698

由于人脸表情类内变化和类间干扰因素的存在,人脸表情识别仍面临着巨大挑战.提出一种基于性别条件约束随机森林的深度人脸表情识别方法,解决人脸表情识别中噪声、性别等变化和干扰问题.首先,采用深度多示例学习方法提取鲁棒性人脸特征,解决人脸光照、遮挡和低分辨率等图像变化问题;其次,采用性别条件随机森林分类方法进行人脸表情分类器设... 相似文献

19.

Facial expression recognition using iterative universum twin support vector machine

《Applied Soft Computing》2019

Facial expressions are one of the most important characteristics of human behaviour. They are very useful in applications on human computer interaction. To classify facial emotions, different feature extraction methods are used with machine learning techniques. In supervised learning, information about the distribution of data is given by data points not belonging to any of the classes. These data points are known as universum data. In this work, we use universum data to perform multiclass classification of facial emotions from human facial images. Moreover, the existing universum based models suffer from the drawback of high training cost, so we propose an iterative universum twin support vector machine (IUTWSVM) using Newton method. Our IUTWSVM gives good generalization performance with less computation cost. To solve the optimization problem of proposed IUTWSVM, no optimization toolbox is required. Further, improper selection of universum points always leads to degraded performance of the model. For generating better universum, a novel scheme is proposed in this work based on information entropy of data. To check the effectiveness of proposed IUTWSVM, several numerical experiments are performed on benchmark real world datasets. For multiclass classification of facial emotions, the performance of IUTWSVM is compared with existing algorithms using different feature extraction techniques. Our proposed algorithm shows better generalization performance with less training cost in both binary as well as multiclass classification problems. 相似文献

20.

Discriminative speaker recognition using large margin GMM

Reda Jourani Khalid Daoudi Régine André-Obrecht Driss Aboutajdine 《Neural computing & applications》2013,22(7-8):1329-1336

Most state-of-the-art speaker recognition systems are based on discriminative learning approaches. On the other hand, generative Gaussian mixture models (GMM) have been widely used in speaker recognition during the last decades. In an earlier work, we proposed an algorithm for discriminative training of GMM with diagonal covariances under a large margin criterion. In this paper, we propose an improvement of this algorithm, which has the major advantage of being computationally highly efficient, thus well suited to handle large-scale databases. We also develop a new strategy to detect and handle the outliers that occur in the training data. To evaluate the performances of our new algorithm, we carry out full NIST speaker identification and verification tasks using NIST-SRE’2006 data, in a Symmetrical Factor Analysis compensation scheme. The results show that our system significantly outperforms the traditional discriminative support vector machines (SVM)-based system of SVM-GMM supervectors, in the two speaker recognition tasks. 相似文献