首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To solve the speaker independent emotion recognition problem, a three-level speech emotion recognition model is proposed to classify six speech emotions, including sadness, anger, surprise, fear, happiness and disgust from coarse to fine. For each level, appropriate features are selected from 288 candidates by using Fisher rate which is also regarded as input parameter for Support Vector Machine (SVM). In order to evaluate the proposed system, principal component analysis (PCA) for dimension reduction and artificial neural network (ANN) for classification are adopted to design four comparative experiments, including Fisher + SVM, PCA + SVM, Fisher + ANN, PCA + ANN. The experimental results proved that Fisher is better than PCA for dimension reduction, and SVM is more expansible than ANN for speaker independent speech emotion recognition. The average recognition rates for each level are 86.5%, 68.5% and 50.2% respectively.  相似文献   

2.
3.
Automatic emotion recognition from speech signals is one of the important research areas, which adds value to machine intelligence. Pitch, duration, energy and Mel-frequency cepstral coefficients (MFCC) are the widely used features in the field of speech emotion recognition. A single classifier or a combination of classifiers is used to recognize emotions from the input features. The present work investigates the performance of the features of Autoregressive (AR) parameters, which include gain and reflection coefficients, in addition to the traditional linear prediction coefficients (LPC), to recognize emotions from speech signals. The classification performance of the features of AR parameters is studied using discriminant, k-nearest neighbor (KNN), Gaussian mixture model (GMM), back propagation artificial neural network (ANN) and support vector machine (SVM) classifiers and we find that the features of reflection coefficients recognize emotions better than the LPC. To improve the emotion recognition accuracy, we propose a class-specific multiple classifiers scheme, which is designed by multiple parallel classifiers, each of which is optimized to a class. Each classifier for an emotional class is built by a feature identified from a pool of features and a classifier identified from a pool of classifiers that optimize the recognition of the particular emotion. The outputs of the classifiers are combined by a decision level fusion technique. The experimental results show that the proposed scheme improves the emotion recognition accuracy. Further improvement in recognition accuracy is obtained when the scheme is built by including MFCC features in the pool of features.  相似文献   

4.
An important tool for the heart disease diagnosis is the analysis of electrocardiogram (ECG) signals, since the non-invasive nature and simplicity of the ECG exam. According to the application, ECG data analysis consists of steps such as preprocessing, segmentation, feature extraction and classification aiming to detect cardiac arrhythmias (i.e., cardiac rhythm abnormalities). Aiming to made a fast and accurate cardiac arrhythmia signal classification process, we apply and analyze a recent and robust supervised graph-based pattern recognition technique, the optimum-path forest (OPF) classifier. To the best of our knowledge, it is the first time that OPF classifier is used to the ECG heartbeat signal classification task. We then compare the performance (in terms of training and testing time, accuracy, specificity, and sensitivity) of the OPF classifier to the ones of other three well-known expert system classifiers, i.e., support vector machine (SVM), Bayesian and multilayer artificial neural network (MLP), using features extracted from six main approaches considered in literature for ECG arrhythmia analysis. In our experiments, we use the MIT-BIH Arrhythmia Database and the evaluation protocol recommended by The Association for the Advancement of Medical Instrumentation. A discussion on the obtained results shows that OPF classifier presents a robust performance, i.e., there is no need for parameter setup, as well as a high accuracy at an extremely low computational cost. Moreover, in average, the OPF classifier yielded greater performance than the MLP and SVM classifiers in terms of classification time and accuracy, and to produce quite similar performance to the Bayesian classifier, showing to be a promising technique for ECG signal analysis.  相似文献   

5.
Secondary phases such as Laves and carbides are formed during the final solidification stages of nickel based superalloy coatings deposited during the gas tungsten arc welding cold wire process. However, when aged at high temperatures, other phases can precipitate in the microstructure, like the γ″ and δ phases. This work presents a new application and evaluation of artificial intelligent techniques to classify (the background echo and backscattered) ultrasound signals in order to characterize the microstructure of a Ni-based alloy thermally aged at 650 and 950 °C for 10, 100 and 200 h. The background echo and backscattered ultrasound signals were acquired using transducers with frequencies of 4 and 5 MHz. Thus with the use of features extraction techniques, i.e., detrended fluctuation analysis and the Hurst method, the accuracy and speed in the classification of the secondary phases from ultrasound signals could be studied. The classifiers under study were the recent optimum-path forest (OPF) and the more traditional support vector machines and Bayesian. The experimental results revealed that the OPF classifier was the fastest and most reliable. In addition, the OPF classifier revealed to be a valid and adequate tool for microstructure characterization through ultrasound signals classification due to its speed, sensitivity, accuracy and reliability.  相似文献   

6.
A method to improve voicing decision using glottal activity features proposed for statistical parametric speech synthesis. In existing methods, voicing decision relies mostly on fundamental frequency F0, which may result in errors when the prediction is inaccurate. Even though F0 is a glottal activity feature, other features that characterize this activity may help in improving the voicing decision. The glottal activity features used in this work are the strength of excitation (SoE), normalized autocorrelation peak strength (NAPS), and higher-order statistics (HOS). These features obtained from approximated source signals like zero-frequency filtered signal and integrated linear prediction residual. To improve voicing decision and to avoid heuristic threshold for classification, glottal activity features are trained using different statistical learning methods such as the k-nearest neighbor, support vector machine (SVM), and deep belief network. The voicing decision works best with SVM classifier, and its effectiveness is tested using the statistical parametric speech synthesis. The glottal activity features SoE, NAPS, and HOS modeled along with F0 and Mel-cepstral coefficients in Hidden Markov model and deep neural network to get the voicing decision. The objective and subjective evaluations demonstrate that the proposed method improves the naturalness of synthetic speech.  相似文献   

7.
Traditional strategies, such as fingerprinting and face recognition, are becoming more and more fraud susceptible. As a consequence, new and more fraud proof biometrics modalities have been considered, one of them being the heartbeat pattern acquired by an electrocardiogram (ECG). While methods for subject identification based on ECG signal work with signals sampled in high frequencies (>100 Hz), the main goal of this work is to evaluate the use of ECG signal in low frequencies for such aim. In this work, the ECG signal is sampled in low frequencies (30 Hz and 60 Hz) and represented by four feature extraction methods available in the literature, which are then feed to a Support Vector Machines (SVM) classifier to perform the identification. In addition, a classification approach based on majority voting using multiple samples per subject is employed and compared to the traditional classification based on the presentation of single samples per subject each time. Considering a database composed of 193 subjects, results show identification accuracies higher than 95% and near to optimality (i.e., 100%) when the ECG signal is sampled in 30 Hz and 60 Hz, respectively, being the last one very close to the ones obtained when the signal is sampled in 360 Hz (the maximum frequency existing in our database). We also evaluate the impact of: (1) the number of training and testing samples for learning and identification, respectively; (2) the scalability of the biometry (i.e., increment on the number of subjects); and (3) the use of multiple samples for person identification.  相似文献   

8.
BackgroundDetection and monitoring of respiratory related illness is an important aspect in pulmonary medicine. Acoustic signals extracted from the human body are considered in detection of respiratory pathology accurately.ObjectivesThe aim of this study is to develop a prototype telemedicine tool to detect respiratory pathology using computerized respiratory sound analysis.MethodsAround 120 subjects (40 normal, 40 continuous lung sounds (20 wheeze and 20 rhonchi)) and 40 discontinuous lung sounds (20 fine crackles and 20 coarse crackles) were included in this study. The respiratory sounds were segmented into respiratory cycles using fuzzy inference system and then S-transform was applied to these respiratory cycles. From the S-transform matrix, statistical features were extracted. The extracted features were statistically significant with p < 0.05. To classify the respiratory pathology KNN, SVM and ELM classifiers were implemented using the statistical features obtained from of the data.ResultsThe validation showed that the classification rate for training for ELM classifier with RBF kernel was high compared to the SVM and KNN classifiers. The time taken for training the classifier was also less in ELM compared to SVM and KNN classifiers. The overall mean classification rate for ELM classifier was 98.52%.ConclusionThe telemedicine software tool was developed using the ELM classifier. The telemedicine tool has performed extraordinary well in detecting the respiratory pathology and it is well validated.  相似文献   

9.
Graphics processing units (GPUs) provide substantial processing power for little cost. We explore the application of GPUs to speech pattern processing, using language identification (LID) to demonstrate their benefits. Realization of the full potential of GPUs requires both effective coding of predetermined algorithms, and, if there is a choice, selection of the algorithm or technique for a specific function that is most able to exploit the GPU. We demonstrate these principles using the NIST LRE 2003 standard LID task, a batch processing task which involves the analysis of over 600 h of speech. We focus on two parts of the system, namely the acoustic classifier, which is based on a 2048 component Gaussian Mixture Model (GMM), and acoustic feature extraction. In the case of the latter we compare a conventional FFT-based analysis with IIR and FIR filter banks, both in terms of their ability to exploit the GPU architecture and LID performance. With no increase in error rate our GPU based system, with an FIR-based front-end, completes the NIST LRE 2003 task in 16 h, compared with 180 h for the conventional FFT-based system on a standard CPU (a speed up factor of more than 11). This includes a 61% decrease in front-end processing time. In the GPU implementation, front-end processing accounts for 8% and 10% of the total computing times during training and recognition, respectively. Hence the reduction in front-end processing achieved in the GPU implementation is significant.  相似文献   

10.
A new architecture of intelligent audio emotion recognition is proposed in this paper. It fully utilizes both prosodic and spectral features in its design. It has two main paths in parallel and can recognize 6 emotions. Path 1 is designed based on intensive analysis of different prosodic features. Significant prosodic features are identified to differentiate emotions. Path 2 is designed based on research analysis on spectral features. Extraction of Mel-Frequency Cepstral Coefficient (MFCC) feature is then followed by Bi-directional Principle Component Analysis (BDPCA), Linear Discriminant Analysis (LDA) and Radial Basis Function (RBF) neural classification. This path has 3 parallel BDPCA + LDA + RBF sub-paths structure and each handles two emotions. Fusion modules are also proposed for weights assignment and decision making. The performance of the proposed architecture is evaluated on eNTERFACE’05 and RML databases. Simulation results and comparison have revealed good performance of the proposed recognizer.  相似文献   

11.
Breast cancer is the most common cancer among women. In CAD systems, several studies have investigated the use of wavelet transform as a multiresolution analysis tool for texture analysis and could be interpreted as inputs to a classifier. In classification, polynomial classifier has been used due to the advantages of providing only one model for optimal separation of classes and to consider this as the solution of the problem. In this paper, a system is proposed for texture analysis and classification of lesions in mammographic images. Multiresolution analysis features were extracted from the region of interest of a given image. These features were computed based on three different wavelet functions, Daubechies 8, Symlet 8 and bi-orthogonal 3.7. For classification, we used the polynomial classification algorithm to define the mammogram images as normal or abnormal. We also made a comparison with other artificial intelligence algorithms (Decision Tree, SVM, K-NN). A Receiver Operating Characteristics (ROC) curve is used to evaluate the performance of the proposed system. Our system is evaluated using 360 digitized mammograms from DDSM database and the result shows that the algorithm has an area under the ROC curve Az of 0.98 ± 0.03. The performance of the polynomial classifier has proved to be better in comparison to other classification algorithms.  相似文献   

12.
The work presented in this paper explores the effectiveness of incorporating the excitation source parameters such as strength of excitation and instantaneous fundamental frequency (\(F_0\)) for emotion recognition task from speech and electroglottographic (EGG) signals. The strength of excitation (SoE) is an important parameter indicating the pressure with which glottis closes at the glottal closure instants (GCIs). The SoE is computed by the popular zero frequency filtering (ZFF) method which accurately estimates the glottal signal characteristics by attenuating or removing the high frequency vocaltract interactions in speech. The arbitrary impulse sequence, obtained from the estimated GCIs, is used to derive the instantaneous \(F_0\). The SoE and the instantaneous \(F_0\) parameters are combined with the conventional mel frequency cepstral coefficients (MFCC) to improve the recognition rates of distinct emotions (Anger, Happy and Sad) using Gaussian mixture models as classifier. The performances of the proposed combination of SoE and instantaneous \(F_0\) and their dynamic features with MFCC coefficients are compared with the emotion utterances (4 emotions and neutral) from classical German full blown emotion speech database (EmoDb) having simultaneous speech and EGG signals and Surrey Audio Visual Expressed Emotion database (3 emotions and neutral) for both speaker dependent and speaker independent emotion recognition scenarios. To reinforce the effectiveness of the proposed features and for better statistical consistency of the emotion analysis, a fairly large emotion speech database of 220 utterances per emotion in Tamil language with simultaneous EGG recordings, is used in addition to EmoDb. The effectiveness of SoE and instantaneous \(F_0\) in characterizing different emotions is also confirmed by the improved emotion recognition performance in Tamil speech-EGG emotion database.  相似文献   

13.
This paper explores the excitation source features of speech production mechanism for characterizing and recognizing the emotions from speech signal. The excitation source signal is obtained from speech signal using linear prediction (LP) analysis, and it is also known as LP residual. Glottal volume velocity (GVV) signal is also used to represent excitation source, and it is derived from LP residual signal. Speech signal has high signal to noise ratio around the instants of glottal closure (GC). These instants of glottal closure are also known as epochs. In this paper, the following excitation source features are proposed for characterizing and recognizing the emotions: sequence of LP residual samples and their phase information, parameters of epochs and their dynamics at syllable and utterance levels, samples of GVV signal and its parameters. Auto-associative neural networks (AANN) and support vector machines (SVM) are used for developing the emotion recognition models. Telugu and Berlin emotion speech corpora are used to evaluate the developed models. Anger, disgust, fear, happy, neutral and sadness are the six emotions considered in this study. About 42 % to 63 % of average emotion recognition performance is observed using different excitation source features. Further, the combination of excitation source and spectral features has shown to improve the emotion recognition performance up to 84 %.  相似文献   

14.
Biodiversity conservation is a global priority where the study of every type of living form is a fundamental task. Inside the huge number of the planet species, spiders play an important role in almost every habitat. This paper presents a comprehensive study on the reliability of the most used features extractors to face the problem of spider specie recognition by using their cobwebs, both in identification and verification modes. We have applied a preprocessing to the cobwebs images in order to obtain only the valid information and compute the optimal size to reach the highest performance. We have used the principal component analysis (PCA), independent component analysis (ICA), Discrete Cosine Transform (DCT), Wavelet Transform (DWT) and discriminative common vectors as features extractors, and proposed the fusion of several of them to improve the system’s performance. Finally, we have used the Least Square Vector Support Machine with radial basis function as a classifier. We have implemented K-Fold and Hold-Out cross-validation techniques in order to obtain reliable results. PCA provided the best performance, reaching a 99.65% ± 0.21 of success rate in identification mode and 99.98% ± 0.04 of the area under de Reveicer Operating Characteristic (ROC) curve in verification mode. The best combination of features extractors was PCA, DCT, DWT and ICA, which achieved a 99.96% ± 0.16 of success rate in identification mode and perfect verification.  相似文献   

15.
Tetrazino-tetrazine-tetraoxide (TTTO) is an attractive high energy compound, but unfortunately, it is not yet experimentally synthesized so far. Isomerization of TTTO leads to its five isomers, bond-separation energies were empolyed to compare the global stability of six compounds, it is found that isomer 1 has the highest bond-separation energy (1204.6 kJ/mol), compared with TTTO (1151.2 kJ/mol); thermodynamic properties of six compounds were theoretically calculated, including standard formation enthalpies (solid and gaseous), standard fusion enthalpies, standard vaporation enthalpies, standard sublimation enthalpies, lattice energies and normal melting points, normal boiling points; their detonation performances were also computed, including detonation heat (Q, cal/g), detonation velocity (D, km/s), detonation pressure (P, GPa) and impact sensitivity (h50, cm), compared with TTTO (Q = 1311.01 J/g, D = 9.228 km/s, P = 40.556 GPa, h50 = 12.7 cm), isomer 5 exhibites better detonation performances (Q = 1523.74 J/g, D = 9.389 km/s, P = 41.329 GPa, h50 =  28.4 cm).  相似文献   

16.

The number of traffic accidents in Brazil has reached alarming levels and is currently one of the leading causes of death in the country. With the number of vehicles on the roads increasing rapidly, these problems will tend to worsen. Consequently, huge investments in resources to increase road safety will be required. The vertical R-19 system for optical character recognition of regulatory traffic signs (maximum speed limits) according to Brazilian Standards developed in this work uses a camera positioned at the front of the vehicle, facing forward. This is so that images of traffic signs can be captured, enabling the use of image processing and analysis techniques for sign detection. This paper proposes the detection and recognition of speed limit signs based on a cascade of boosted classifiers working with haar-like features. The recognition of the sign detected is achieved based on the optimum-path forest classifier (OPF), support vector machines (SVM), multilayer perceptron, k-nearest neighbor (kNN), extreme learning machine, least mean squares, and least squares machine learning techniques. The SVM, OPF and kNN classifiers had average accuracies higher than 99.5 %; the OPF classifier with a linear kernel took an average time of 87 \(\upmu\)s to recognize a sign, while kNN took 11,721 \(\upmu\)s and SVM 12,595 \(\upmu\)s. This sign detection approach found and recognized successfully 11,320 road signs from a set of 12,520 images, leading to an overall accuracy of 90.41 %. Analyzing the system globally recognition accuracy was 89.19 %, as 11,167 road signs from a database with 12,520 signs were correctly recognized. The processing speed of the embedded system varied between 20 and 30 frames per second. Therefore, based on these results, the proposed system can be considered a promising tool with high commercial potential.

  相似文献   

17.
18.
Multilayer perceptron (MLP) (trained with back propagation learning algorithm) takes large computational time. The complexity of the network increases as the number of layers and number of nodes in layers increases. Further, it is also very difficult to decide the number of nodes in a layer and the number of layers in the network required for solving a problem a priori. In this paper an improved particle swarm optimization (IPSO) is used to train the functional link artificial neural network (FLANN) for classification and we name it ISO-FLANN. In contrast to MLP, FLANN has less architectural complexity, easier to train, and more insight may be gained in the classification problem. Further, we rely on global classification capabilities of IPSO to explore the entire weight space, which is plagued by a host of local optima. Using the functionally expanded features; FLANN overcomes the non-linear nature of problems. We believe that the combined efforts of FLANN and IPSO (IPSO + FLANN = ISO ? FLANN) by harnessing their best attributes can give rise to a robust classifier. An extensive simulation study is presented to show the effectiveness of proposed classifier. Results are compared with MLP, support vector machine(SVM) with radial basis function (RBF) kernel, FLANN with gradiend descent learning and fuzzy swarm net (FSN).  相似文献   

19.
Joint moment is one of the most important factors in human gait analysis. It can be calculated using multi body dynamics but might not be straight forward. This study had two main purposes; firstly, to develop a generic multi-dimensional wavelet neural network (WNN) as a real-time surrogate model to calculate lower extremity joint moments and compare with those determined by multi body dynamics approach, secondly, to compare the calculation accuracy of WNN with feed forward artificial neural network (FFANN) as a traditional intelligent predictive structure in biomechanics.To aim these purposes, data of four patients walked with three different conditions were obtained from the literature. A total of 10 inputs including eight electromyography (EMG) signals and two ground reaction force (GRF) components were determined as the most informative inputs for the WNN based on the mutual information technique. Prediction ability of the network was tested at two different levels of inter-subject generalization. The WNN predictions were validated against outputs from multi body dynamics method in terms of normalized root mean square error (NRMSE (%)) and cross correlation coefficient (ρ).Results showed that WNN can predict joint moments to a high level of accuracy (NRMSE < 10%, ρ > 0.94) compared to FFANN (NRMSE < 16%, ρ > 0.89). A generic WNN could also calculate joint moments much faster and easier than multi body dynamics approach based on GRFs and EMG signals which released the necessity of motion capture. It is therefore indicated that the WNN can be a surrogate model for real-time gait biomechanics evaluation.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号