Spoken emotion recognition through optimum-path forest classification using glottal features |
| |
Authors: | Alexander I Iliev Michael S Scordilis João P Papa Alexandre X Falcão |
| |
Affiliation: | 1. Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, USA;2. Institute of Computing, University of Campinas, Campinas, São Paulo, Brazil;1. Faculty of Engineering, Department of Electrical Engineering, Bu-Ali Sina University, Hamedan, Iran;2. Department of Civil, Faculty of Engineering, Bu-Ali Sina University, Hamedan, Iran;1. Departament of Biochemistry and Molecular Biology, Federal University of Santa Maria, Av. Roraima, 97105-900 Santa Maria, RS, Brazil;2. Departament of Microbiology and Parasitology, Federal University of Santa Maria, Av. Roraima, 97105-900 Santa Maria, RS, Brazil;3. Laboratory of Experimental Surgery, Federal University of Santa Maria, Av. Roraima, 97105-900 Santa Maria, RS, Brazil;4. Department of Biochemistry, Federal University of Technology, P. M. B. 704, Akure 340001, Nigeria;1. Sir Joseph Swan Centre for Energy Research, Newcastle University, Newcastle NE1 7RU, UK;2. Institute of Refrigeration and Cryogenics, Shanghai Jiao Tong University, Shanghai, 200240, China;1. Department of Chemical and Biochemical Engineering, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, Fujian, 361005, China;2. College of Chemical Engineering, Tianjin University, Tianjin, 300072, China;1. Division of Speech and Hearing Sciences, The University of Hong Kong, Prince Philip Dental Hospital, 34 Hospital Road, Hong Kong;2. Department of Electrical Engineering & Computer Science, University of Wisconsin-Milwaukee, 3200 N. Cramer Street, Milwaukee, WI 53211, USA;1. The Key Laboratory of Space Applied Physics and Chemistry, Ministry of Education, School of Science, Northwestern Polytechnical University, Xi’an 710072, China;2. Engineering Research Center for Integrated Mechatronics Materials and Components, Changwon National University, 92 Toechon-ro, Republic of Korea |
| |
Abstract: | A new method for the recognition of spoken emotions is presented based on features of the glottal airflow signal. Its effectiveness is tested on the new optimum path classifier (OPF) as well as on six other previously established classification methods that included the Gaussian mixture model (GMM), support vector machine (SVM), artificial neural networks – multi layer perceptron (ANN-MLP), k-nearest neighbor rule (k-NN), Bayesian classifier (BC) and the C4.5 decision tree. The speech database used in this work was collected in an anechoic environment with ten speakers (5 M and 5 F) each speaking ten sentences in four different emotions: Happy, Angry, Sad, and Neutral. The glottal waveform was extracted from fluent speech via inverse filtering. The investigated features included the glottal symmetry and MFCC vectors of various lengths both for the glottal and the corresponding speech signal. Experimental results indicate that best performance is obtained for the glottal-only features with SVM and OPF generally providing the highest recognition rates, while for GMM or the combination of glottal and speech features performance was relatively inferior. For this text dependent, multi speaker task the top performing classifiers achieved perfect recognition rates for the case of 6th order glottal MFCCs. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|