期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Spoken emotion recognition using hierarchical classifiers

Enrique M. Albornoz Diego H. Milone Hugo L. Rufiner 《Computer Speech and Language》2011,25(3):556-570

The recognition of the emotional state of speakers is a multi-disciplinary research area that has received great interest over the last years. One of the most important goals is to improve the voice-based human–machine interactions. Several works on this domain use the prosodic features or the spectrum characteristics of speech signal, with neural networks, Gaussian mixtures and other standard classifiers. Usually, there is no acoustic interpretation of types of errors in the results. In this paper, the spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations. Standard classifiers based on Gaussian Mixture Models, Hidden Markov Models and Multilayer Perceptron are tested. These classifiers have been evaluated with different configurations and input features, in order to design a new hierarchical method for emotion classification. The proposed multiple feature hierarchical method for seven emotions, based on spectral and prosodic information, improves the performance over the standard classifiers and the fixed features. 相似文献

2.

Pruning optimum-path forest ensembles using metaheuristic optimization for land-cover classification

Silas Evandro Nachif Fernandes André Nunes de Souza Danilo Sinkiti Gastaldello Danillo Roberto Pereira 《International journal of remote sensing》2017,38(20):5736-5762

Machine learning techniques have been actively pursued in the last years, mainly due to the increasing number of applications that make use of some sort of intelligent mechanism for decision-making processes. In this context, we shall highlight ensemble pruning strategies, which provide heuristics to select from a collection of classifiers the ones that can really improve recognition rates and provide efficiency by reducing the ensemble size prior to combining the model. In this article, we present and validate an ensemble pruning approach for Optimum-Path Forest (OPF) classifiers based on metaheuristic optimization over general-purpose data sets to validate the effectiveness and efficiency of the proposed approach using distinct configurations in real and synthetic benchmark data sets, and thereafter, we apply the proposed approach in remote-sensing images to investigate the behaviour of the OPF classifier using pruning strategies. The image data sets were obtained from CBERS-2B, LANDSAT-5 TM, IKONOS-2 MS, and GEOEYE sensors, covering some areas of Brazil. The well-known Indian Pines data set was also used. In this work, we evaluate five different optimization algorithms for ensemble pruning, including that Particle Swarm Optimization, Harmony Search, Cuckoo Search, and Firefly Algorithm. In addition, we performed an empirical comparison between Support Vector Machine and OPF using the strategy of ensemble pruning. Experimental results showed the effectiveness and efficiency of ensemble pruning using OPF-based classification, especially concerning ensemble pruning using Harmony Search, which shows to be effective without degrading the performance when applied to large data sets, as well as a good data generalization. 相似文献

3.

ECG arrhythmia classification based on optimum-path forest

Eduardo José da S. Luz Thiago M. Nunes Victor Hugo C. de Albuquerque João P. Papa David Menotti 《Expert systems with applications》2013,40(9):3561-3573

An important tool for the heart disease diagnosis is the analysis of electrocardiogram (ECG) signals, since the non-invasive nature and simplicity of the ECG exam. According to the application, ECG data analysis consists of steps such as preprocessing, segmentation, feature extraction and classification aiming to detect cardiac arrhythmias (i.e., cardiac rhythm abnormalities). Aiming to made a fast and accurate cardiac arrhythmia signal classification process, we apply and analyze a recent and robust supervised graph-based pattern recognition technique, the optimum-path forest (OPF) classifier. To the best of our knowledge, it is the first time that OPF classifier is used to the ECG heartbeat signal classification task. We then compare the performance (in terms of training and testing time, accuracy, specificity, and sensitivity) of the OPF classifier to the ones of other three well-known expert system classifiers, i.e., support vector machine (SVM), Bayesian and multilayer artificial neural network (MLP), using features extracted from six main approaches considered in literature for ECG arrhythmia analysis. In our experiments, we use the MIT-BIH Arrhythmia Database and the evaluation protocol recommended by The Association for the Advancement of Medical Instrumentation. A discussion on the obtained results shows that OPF classifier presents a robust performance, i.e., there is no need for parameter setup, as well as a high accuracy at an extremely low computational cost. Moreover, in average, the OPF classifier yielded greater performance than the MLP and SVM classifiers in terms of classification time and accuracy, and to produce quite similar performance to the Bayesian classifier, showing to be a promising technique for ECG signal analysis. 相似文献

4.

Efficient supervised optimum-path forest classification for large datasets

João P. Papa Alexandre X. Falcão Victor Hugo C. de Albuquerque João Manuel R.S. Tavares 《Pattern recognition》2012,45(1):512-520

Today data acquisition technologies come up with large datasets with millions of samples for statistical analysis. This creates a tremendous challenge for pattern recognition techniques, which need to be more efficient without losing their effectiveness. We have tried to circumvent the problem by reducing it into the fast computation of an optimum-path forest (OPF) in a graph derived from the training samples. In this forest, each class may be represented by multiple trees rooted at some representative samples. The forest is a classifier that assigns to a new sample the label of its most strongly connected root. The methodology has been successfully used with different graph topologies and learning techniques. In this work, we have focused on one of the supervised approaches, which has offered considerable advantages over Support Vector Machines and Artificial Neural Networks to handle large datasets. We propose (i) a new algorithm that speeds up classification and (ii) a solution to reduce the training set size with negligible effects on the accuracy of classification, therefore further increasing its efficiency. Experimental results show the improvements with respect to our previous approach and advantages over other existing methods, which make the new method a valuable contribution for large dataset analysis. 相似文献

5.

基于声门特征参数的语音情感识别算法研究

何凌黄华刘肖珩《计算机工程与设计》2013,34(6)

为实现更为有效的自动语音情感识别系统,提出了一种基于声门信号特征参数及高斯混合模型的情感识别算法.该算法基于人类发音机理,通过逆滤波器及线性预测方法,实现声门信号的估计,提取声门信号时域特征参数表征不同情感类别.实验采用公开的BES (berlin emotion speech database)情感语料库,对愤怒、无聊、厌恶、害怕、高兴、平静、悲伤这7种情感进行自动识别.实验结果表明,提出的语音情感识别系统能有效的识别各类情感状态,其情感判别正确率接近于人类识别正确率,且优于传统的基音频率及共振峰参数. 相似文献

6.

Active learning paradigms for CBIR systems based on optimum-path forest classification

André Tavares da Silva Alexandre Xavier Falcão Léo Pini MagalhãesAuthor vitae 《Pattern recognition》2011,44(12):2971-2978

This paper discusses methods for content-based image retrieval (CBIR) systems based on relevance feedback according to two active learning paradigms, named greedy and planned. In greedy methods, the system aims to return the most relevant images for a query at each iteration. In planned methods, the most informative images are returned during a few iterations and the most relevant ones are only presented afterward. In the past, we proposed a greedy approach based on optimum-path forest classification (OPF) and demonstrated its gain in effectiveness with respect to a planned method based on support-vector machines and another greedy approach based on multi-point query. In this work, we introduce a planned approach based on the OPF classifier and demonstrate its gain in effectiveness over all methods above using more image databases. In our tests, the most informative images are better obtained from images that are classified as relevant, which differs from the original definition. The results also indicate that both OPF-based methods require less user involvement (efficiency) to satisfy the user's expectation (effectiveness), and provide interactive response times. 相似文献

7.

Improving optimum-path forest learning using bag-of-classifiers and confidence measures

Fernandes Silas Evandro Nachif Papa João Paulo 《Pattern Analysis & Applications》2019,22(2):703-716

Pattern Analysis and Applications - Machine learning techniques have been actively pursued in the last years, mainly due to the great number of applications that make use of some sort of... 相似文献

8.

Spoken emotion recognition via locality-constrained kernel sparse representation

Xiaoming Zhao Shiqing Zhang 《Neural computing & applications》2015,26(3):735-744

相似文献

9.

Incorporating multiple distance spaces in optimum-path forest classification to improve feedback-based learning

André Tavares da Silva Jefersson Alex dos Santos Alexandre Xavier Falcão Ricardo da S. Torres Léo Pini Magalhães 《Computer Vision and Image Understanding》2012,116(4):510-523

In content-based image retrieval (CBIR) using feedback-based learning, the user marks the relevance of returned images and the system learns how to return more relevant images in a next iteration. In this learning process, image comparison may be based on distinct distance spaces due to multiple visual content representations. This work improves the retrieval process by incorporating multiple distance spaces in a recent method based on optimum-path forest (OPF) classification. For a given training set with relevant and irrelevant images, an optimization algorithm finds the best distance function to compare images as a combination of their distances according to different representations. Two optimization techniques are evaluated: a multi-scale parameter search (MSPS), never used before for CBIR, and a genetic programming (GP) algorithm. The combined distance function is used to project an OPF classifier and to rank images classified as relevant for the next iteration. The ranking process takes into account relevant and irrelevant representatives, previously found by the OPF classifier. Experiments show the advantages in effectiveness of the proposed approach with both optimization techniques over the same approach with single distance space and over another state-of-the-art method based on multiple distance spaces. 相似文献

10.

Human emotion recognition from videos using spatio-temporal and audio features

Munaf Rashid S. A. R. Abu-Bakar Musa Mokji 《The Visual computer》2013,29(12):1269-1275

In this paper, we present human emotion recognition systems based on audio and spatio-temporal visual features. The proposed system has been tested on audio visual emotion data set with different subjects for both genders. The mel-frequency cepstral coefficient (MFCC) and prosodic features are first identified and then extracted from emotional speech. For facial expressions spatio-temporal features are extracted from visual streams. Principal component analysis (PCA) is applied for dimensionality reduction of the visual features and capturing 97 % of variances. Codebook is constructed for both audio and visual features using Euclidean space. Then occurrences of the histograms are employed as input to the state-of-the-art SVM classifier to realize the judgment of each classifier. Moreover, the judgments from each classifier are combined using Bayes sum rule (BSR) as a final decision step. The proposed system is tested on public data set to recognize the human emotions. Experimental results and simulations proved that using visual features only yields on average 74.15 % accuracy, while using audio features only gives recognition average accuracy of 67.39 %. Whereas by combining both audio and visual features, the overall system accuracy has been significantly improved up to 80.27 %. 相似文献

11.

Machine learning techniques for speech emotion recognition using paralinguistic acoustic features

Jha Tulika Kavya Ramisetty Christopher Jabez Arunachalam Vasan 《International Journal of Speech Technology》2022,25(3):707-725

International Journal of Speech Technology - Speech emotion recognition is one of the fastest growing areas of interest in the field of affective computing. Emotion detection aids... 相似文献

12.

Spoken character classification using abductive network

Isah Abdullahi Lawal 《International Journal of Speech Technology》2017,20(4):881-890

In this paper, we address the problem of learning a classifier for the classification of spoken character. We present a solution based on Group Method of Data Handling (GMDH) learning paradigm for the development of a robust abductive network classifier. We improve the reliability of the classification process by introducing the concept of multiple abductive network classifier system. We evaluate the performance of the proposed classifier using three different speech datasets including spoken Arabic digit, spoken English letter, and spoken Pashto digit. The performance of the proposed classifier surpasses that reported in the literature for other classification techniques on the same speech datasets. 相似文献

13.

Speech emotion recognition: Features and classification models

Lijiang Chen Xia Mao Yuli Xue Lee Lung Cheng 《Digital Signal Processing》2012,22(6):1154-1160

To solve the speaker independent emotion recognition problem, a three-level speech emotion recognition model is proposed to classify six speech emotions, including sadness, anger, surprise, fear, happiness and disgust from coarse to fine. For each level, appropriate features are selected from 288 candidates by using Fisher rate which is also regarded as input parameter for Support Vector Machine (SVM). In order to evaluate the proposed system, principal component analysis (PCA) for dimension reduction and artificial neural network (ANN) for classification are adopted to design four comparative experiments, including Fisher + SVM, PCA + SVM, Fisher + ANN, PCA + ANN. The experimental results proved that Fisher is better than PCA for dimension reduction, and SVM is more expansible than ANN for speaker independent speech emotion recognition. The average recognition rates for each level are 86.5%, 68.5% and 50.2% respectively. 相似文献

14.

Learning CNN features from DE features for EEG-based emotion recognition

Hwang Sunhee Hong Kibeom Son Guiyoung Byun Hyeran 《Pattern Analysis & Applications》2020,23(3):1323-1335

Pattern Analysis and Applications - Recently, deep neural networks (DNNs) have shown the remarkable success of feature representations in computer vision, audio analysis, and natural language... 相似文献

15.

Vocal emotion recognition in five native languages of Assam using new wavelet features

Aditya Bihar Kandali Aurobinda Routray Tapan Kumar Basu 《International Journal of Speech Technology》2009,12(1):1-13

The present work investigates the following specific research questions concerning voice emotion recognition: whether vocal emotion expressions of discrete emotion (i) can be distinguished from no-emotion (i.e. neutral), (ii) can be distinguished from another, (iii) of surprise, which is actually a cognitive component that could be present with any emotion, can also be recognized as distinct emotion, (iv) can be recognized cross-lingually. This study will enable us to get more information regarding nature and function of emotion. Furthermore, this work will help in developing a generalized voice emotion recognition system, which will increase the efficiency of human-machine interaction systems. In this work an emotional utterance database is created with 140 acted utterances per speaker consisting of short sentences of six full-blown basic emotions and neutral of five native languages of Assam. This database is validated by a Listening Test. Four feature sets are extracted based on WPCC2 (Wavelet-Packet-Cepstral-Coefficients computed by method 2), MFCC (Mel-Frequency-Cepstral-Coefficients), tfWPCC2 (Teager-energy-operated-in-Transform-domain WPCC2) and tfMFCC. The Gaussian Mixture Model (GMM) is used as classifier. The performances of all these feature sets are compared in respect of accuracy of classification in two experiments: (i) text-and-speaker independent vocal emotion recognition in individual languages, and (ii) cross-lingual vocal emotion recognition. tfWPCC2 is a new wavelet feature set proposed by the same authors in one of their recent papers in a National Seminar in India as cited in References. 相似文献

16.

An optimal two stage feature selection for speech emotion recognition using acoustic features

Swarna Kuchibhotla Hima Deepthi Vankayalapati Koteswara Rao Anne 《International Journal of Speech Technology》2016,19(4):657-667

Feature Fusion plays an important role in speech emotion recognition to improve the classification accuracy by combining the most popular acoustic features for speech emotion recognition like energy, pitch and mel frequency cepstral coefficients. However the performance of the system is not optimal because of the computational complexity of the system, which occurs due to high dimensional correlated feature set after feature fusion. In this paper, a two stage feature selection method is proposed. In first stage feature selection, appropriate features are selected and fused together for speech emotion recognition. In second stage feature selection, optimal feature subset selection techniques [sequential forward selection (SFS) and sequential floating forward selection (SFFS)] are used to eliminate the curse of dimensionality problem due to high dimensional feature vector after feature fusion. Finally the emotions are classified by using several classifiers like Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Support Vector Machine (SVM) and K Nearest Neighbor (KNN). The performance of overall emotion recognition system is validated over Berlin and Spanish databases by considering classification rate. An optimal uncorrelated feature set is obtained by using SFS and SFFS individually. Results reveal that SFFS is a better choice as a feature subset selection method because SFS suffers from nesting problem i.e it is difficult to discard a feature after it is retained into the set. SFFS eliminates this nesting problem by making the set not to be fixed at any stage but floating up and down during the selection based on the objective function. Experimental results showed that the efficiency of the classifier is improved by 15–20 % with two stage feature selection method when compared with performance of the classifier with feature fusion. 相似文献

17.

Mammogram classification using contourlet features with forest optimization-based feature selection approach

Mohanty Figlu Rup Suvendu Dash Bodhisattva Majhi Banshidhar Swamy M. N. S. 《Multimedia Tools and Applications》2019,78(10):12805-12834

Multimedia Tools and Applications - Breast cancer continues to be one of the major health issues across the world and it is mostly observed in females. However, the actual cause of this cancer is... 相似文献

18.

An art therapy evaluation method based on emotion recognition using EEG deep temporal features

Tang Zhichuan Li Xintao Xia Dan Hu Yidan Zhang Lingtao Ding Jun 《Multimedia Tools and Applications》2022,81(5):7085-7101

Self-assessment methods are widely used in art therapy evaluation, but emotional recognition methods using physiological signals’ features are more objectively and accurately. In this study, we proposed an electroencephalogram (EEG)-based art therapy evaluation method that could evaluate the therapeutic effect based on the emotional changes before and after the art therapy. Twelve participants were recruited in a two-step experiment (emotion stimulation step and drawing therapy step), and their EEG signals and self-assessment scores were collected. The self-assessment model (SAM) was used to obtain and label the actual emotional states; the long short-term memory (LSTM) network was used to extract the deep temporal features of EEG to recognize emotions. Further, the classification performances in different sequence lengths, time-window lengths and frequency combinations were compared and analyzed. The results showed that emotion recognition models with LSTM deep temporal features achieved the better classification performances than the state-of-the-art methods with non-temporal features; the classification accuracies in high-frequency bands (α, β, and γ bands) were higher than those in low-frequency bands (δ and θ bands); there was a highest emotion classification accuracy (93.24%) in 10-s sequence length, 2-s time-window length and 5-band frequency combination. Our proposed method could be used for emotion recognition effectively and accurately, and was an objective approach to assist therapists or patients in evaluating the effect of art therapy.

相似文献

19.

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Margarita Kotti Fabio Paternò 《International Journal of Speech Technology》2012,15(2):131-150

相似文献

20.

Real-time EEG-based emotion monitoring using stable features

Zirui Lan Olga Sourina Lipo Wang Yisi Liu 《The Visual computer》2016,32(3):347-358

In human–computer interaction (HCI), electroencephalogram (EEG) signals can be added as an additional input to computer. An integration of real-time EEG-based human emotion recognition algorithms in human–computer interfaces can make the users experience more complete, more engaging, less emotionally stressful or more stressful depending on the target of the applications. Currently, the most accurate EEG-based emotion recognition algorithms are subject-dependent, and a training session is needed for the user each time right before running the application. In this paper, we propose a novel real-time subject-dependent algorithm with the most stable features that gives a better accuracy than other available algorithms when it is crucial to have only one training session for the user and no re-training is allowed subsequently. The proposed algorithm is tested on an affective EEG database that contains five subjects. For each subject, four emotions (pleasant, happy, frightened and angry) are induced, and the affective EEG is recorded for two sessions per day in eight consecutive days. Testing results show that the novel algorithm can be used in real-time emotion recognition applications without re-training with the adequate accuracy. The proposed algorithm is integrated with real-time applications “Emotional Avatar” and “Twin Girls” to monitor the users emotions in real time. 相似文献