首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Traditional acoustic speech recognition accuracies have been shown to deteriorate in highly noisy environments. A secondary information source is exploited using surface myoelectric signals (MES) collected from facial articulatory muscles during speech. Words are classified at the phoneme level using a hidden Markov model (HMM) classifier. Acoustic and MES data was collected while the words "zero" through "nine" were spoken. An acoustic expert classified the 18 formative phonemes in low noise levels [signal-to-noise ratio (SNR) of 17.5 dB] with an accuracy of 99%, but deteriorated to approximately 38% under simulations with SNR approaching 0 dB. A fused acoustic-myoelectric multiexpert system, without knowledge of SNR, improved on acoustic classification results at all noise levels. A multiexpert system, incorporating SNR information, obtained accuracies of 99% at low noise levels while maintaining accuracies above 94% during low SNR (0 dB) simulations. Results improve on previous full word MES speech recognition accuracies by almost 10%.  相似文献   

2.
In this communication, we present an on-line real-time simulation system used to evaluate a new method of controlling a multifunctional hand prosthesis. The system employs two computers. Acquisition, analysis, and classification of EMG signals are performed by a minicomputer, and the animation model of the hand prosthesis which is displayed on a monitor TV is controlled by a microcomputer. From three EMG signals picked up from the most suitable muscles of the forearm, three amplitude and three frequency components are derived. Six voluntary movements are then classified by the use of a pattern recognition technique based on the linear discriminant analysis method. Besides the six basic motions, the model can also make eight combined motions using the method of selecting control signals. The experimental results tested on healthy adults indicated that recognition rate was above 90 percent, and nearly 100 percent in some of the better cases. Then the subjects could easily operate the model on the TV screen.  相似文献   

3.
Electromyographic (EMG) signals recognition is a complex pattern recognition problem due to its property of large variations in signals and features. This paper proposes a novel EMG classifier called cascaded kernel learning machine (CKLM) to achieve the goal of high-accuracy EMG recognition. First, the EMG signals are acquired by three surface electrodes placed on three different muscles. Second, EMG features are extracted by autoregressive model (ARM) and EMG histogram. After the feature extraction, the CKLM is performed to classify the features. CKLM is composed of two different kinds of kernel learning machines: generalized discriminant analysis (GDA) algorithm and support vector machine (SVM). By using GDA, both the goals of the dimensionality reduction of input features and the selection of discriminating features, named kernel FisherEMG, can be reached. Then, SVM combined with one-against-one strategy is executed to classify the kernel FisherEMG. By cascading SVM with GDA, the input features will be nonlinearly mapped twice by radial-basis function (RBF). As a result, a linear optimal separating hyperplane can be found with the largest margin of separation between each pair of postures' classes in the implicit dot product feature space. In addition, we develop a digital signal processor (DSP)-based EMG classification system for the control of a multi-degrees-of-freedom prosthetic hand for the practical implementation. Based on the clinical experiments, the results show that the proposed CKLM is superior to other frequently used methods, such as k-nearest neighbor algorithm, multilayer neural network, and SVM. The best EMG recognition rate 93.54% is obtained by CKLM.  相似文献   

4.
Identification of the innervation zone is widely used to optimize the accuracy and precision of noninvasive surface electromyography (EMG) signals because the EMG signal is strongly influenced by innervation zones. However, simply structured fusiform muscle, such as biceps brachii muscle, has been employed mainly due to the simplicity with which the propagation from raw EMG signals can be observed. In this study, the optimum electrode location (OEL), free from innervational influence, was investigated by the propagation pattern of action potentials for brachii muscles and more complicated deltoid muscle structures using an automatized signal analysis technique. The technique employed newly developed computer software with additional clinical uses and minimized subjective differences. EMG signals were recorded using surface array electrodes during voluntary isometric contractions obtained from 12 healthy male subjects. Peaks in EMG signals were detected and averaged for each muscle. The propagation patterns and OEL were examined from biceps brachii muscles for all subjects and from deltoid muscles for seven subjects. The estimated locations were partially confirmed by comparing the root mean squares of the EMG signals. These results show that propagation patterns and OEL could be estimated simply and automatically even from the surface EMG signals of deltoid muscles.  相似文献   

5.
Speech-driven facial animation combines techniques from different disciplines such as image analysis, computer graphics, and speech analysis. Active shape models (ASM) used in image analysis are excellent tools for characterizing lip contour shapes and approximating their motion in image sequences. By controlling the coefficients for an ASM, such a model can also be used for animation. We design a mapping of the articulatory parameters used in phonetics into ASM coefficients that control nonrigid lip motion. The mapping is designed to minimize the approximation error when articulatory parameters measured on training lip contours are taken as input to synthesize the training lip movements. Since articulatory parameters can also be estimated from speech, the proposed technique can form an important component of a speech-driven facial animation system.  相似文献   

6.
Electromyographic (EMG) recordings detected over the skin may be mixtures of signals generated by different active muscles due to the phenomena related to volume conduction. Separation of the sources is necessary when single muscle activity has to be detected. Signals generated by different muscles may be considered uncorrelated but in general overlap in time and frequency. Under certain assumptions, mixtures of surface EMG signals can be considered as linear instantaneous but no a priori information about the mixing matrix is available when different muscles are active. In this study, we applied blind source separation (BSS) methods to separate the signals generated by two active muscles during a force-varying task. As the signals are non stationary, an algorithm based on spatial time-frequency distributions was applied on simulated and experimental EMG signals. The experimental signals were collected from the flexor carpi radialis and the pronator teres muscles which could be activated selectively for wrist flexion and rotation, respectively. From the simulations, correlation coefficients between the reference and reconstructed sources were higher than 0.85 for signals largely overlapping both in time and frequency and for signal-to-noise ratios as low as 5 dB. The Choi-Williams and Bessel kernels, in this case, performed better than the Wigner-Ville one. Moreover, the selection of time-frequency points for the procedure of joint diagonalization used in the BSS algorithm significantly influenced the results. For the experimental signals, the interference of the other source in each reconstructed source was significantly attenuated by the application of the BSS method. The ratio between root-mean-square values of the signals from the two sources detected over one of the muscles increased from (mean +/- standard deviation) 2.33 +/- 1.04 to 4.51 +/- 1.37 and from 1.55 +/- 0.46 to 2.72 +/- 0.65 for wrist flexion and rotation, respectively. This increment was statistically significant. It was concluded that the BSS approach applied is promising for the separation of surface EMG signals, with applications ranging from muscle assessment to detection of muscle activation intervals, and to the control of myoelectric prostheses.  相似文献   

7.
Pattern recognition techniques have been applied to extract information from electromyographic (EMG) signals that can be used to control electrical powered hand prostheses. In this paper, optimized spatial filters that enhance separation properties of EMG signals are investigated. In particular, different multiclass extensions of the common spatial patterns algorithm are applied to high-density surface EMG signals acquired from the forearms of ten healthy subjects. Visualization of the obtained filter coefficients provides insight into the physiology of the muscles related to the performed contractions. The CSP methods are compared with a commonly used pattern recognition approach in a six-class classification task. Cross-validation results show a significant improvement in performance and a higher robustness against noise than commonly used pattern recognition methods.  相似文献   

8.
Articulation errors seriously reduce speech intelligibility and the ease of spoken communication. Speech-language pathologists manually identify articulation error patterns based on their clinical experience, which is a time-consuming and expensive process. This study proposes an automatic pronunciation error identification system that uses a novel dependence network (DN) approach. In order to derive a subject's articulatory information, a photo naming task is performed to obtain the subject's speech patterns. Based on clinical knowledge about speech evaluation, a DN scheme was used to model the relationships of a test word, a subject, a speech pattern, and an articulation error pattern. To integrate DN into automatic speech recognition (ASR), a pronunciation confusion network is proposed to model the probability of DN and is then used to guide the search space of the ASR. Further, to increase the accuracy of the ASR, an appropriate threshold based on a histogram of pronunciation errors is selected in order to disregard rare pronunciation errors. Finally, the articulation error patterns were well identified by integrating the likelihoods of the DNs of each phoneme. The results of this study indicate that it is feasible to clinically implement this dynamic network approach to achieve satisfactory performance in articulation evaluation.  相似文献   

9.
Myoelectric signals (MESs) from the speaker's mouth region have been successfully shown to improve the noise robustness of automatic speech recognizers (ASRs), thus promising to extend their usability in implementing noise-robust ASR. In the recognition system presented herein, extracted audio and facial MES features were integrated by a decision fusion method, where the likelihood score of the audio-MES observation vector was given by a linear combination of class-conditional observation log-likelihoods of two classifiers, using appropriate weights. We developed a weighting process adaptive to SNRs. The main objective of the paper involves determining the optimal SNR classification boundaries and constructing a set of optimum stream weights for each SNR class. These two parameters were determined by a method based on a maximum mutual information criterion. Acoustic and facial MES data were collected from five subjects, using a 60-word vocabulary. Four types of acoustic noise including babble, car, aircraft, and white noise were acoustically added to clean speech signals with SNR ranging from -14 to 31 dB. The classification accuracy of the audio ASR was as low as 25.5%. Whereas, the classification accuracy of the MES ASR was 85.2%. The classification accuracy could be further improved by employing the proposed audio-MES weighting method, which was as high as 89.4% in the case of babble noise. A similar result was also found for the other types of noise.  相似文献   

10.
In this study, power spectral density functions (PSDF's) were computed of interference EMG of various facial and jaw-elevator muscles during nonfatiguing submaximal static contractions, recorded with surface electrodes. A distinct peak was found in the PSDF's in the frequency region below 40 Hz. It was shown that the peak was due to genuine EMG activity and that it could not be considered as an artifact, which was caused by electrode displacements during contraction. An increase of contraction strength resulted in a shift of the peak to higher frequencies and a decrease of peak amplitude relative to the power spectral estimates above 40 Hz, which were shown to be determined by the shape of the motor unit (MU) action potentials. In accordance with mathematical models of the EMG PSDF, it was demonstrated that the peak indicates the dominant firing rate of the sampled MU's. Our results suggest that this can be defined as the firing rate of the first recruited low-threshold MU's, which may be expected to dominate the interference EMG signal because of their preponderance in number. The data further suggest that the peak can be more readily observed in PSDF's of facial and jaw-elevator muscles than in PSDF's of limb muscles. This might be related to differences in MU firing statistics.  相似文献   

11.
王彪 《电子设计工程》2011,19(21):59-61
为了提高语音信号的识别率,提出了一种改进的语音信号特征提取算法。该算法在MFCC参数的基础上,增加每帧信号的短时能量和短时过零率,使得新参数能够更为准确地表征语音信号。通过仿真实验。说明了新特征参数取得了较高的识别率。  相似文献   

12.
It is noted that of great importance to the success of the articulatory approach to speech coding is the use of a good distortion measure between a given speech signal and the entries in a stored codebook of impulse responses and corresponding vocal-track shapes (articulatory codebook). One promising distortion measure is the weighted cepstral distortion. Since the impulse responses in the articulatory codebook do not include glottal characteristics, the authors derive optimal weighting functions (cepstral lifters) to reduce the influence of a varying glottal source on the cepstral distortion measure. This is done by examining the ensemble of cepstral coefficients of speech produced by an articulatory speech synthesizer that also includes a vocal-cord model. The obtained cepstral lifters are optimal for the given ensemble of cepstral coefficients and for given constraints on the weighting function. They are different for cepstral coefficients derived from the power spectrum (FFT cepstra) and for those derived from LPC (linear predictive coding) coefficients (LPC cepstra). The performances of the obtained cepstral lifters are compared in an articulatory codebook search  相似文献   

13.
Emotion recognition is a hot research in modern intelligent systems. The technique is pervasively used in autonomous vehicles, remote medical service, and human–computer interaction (HCI). Traditional speech emotion recognition algorithms cannot be effectively generalized since both training and testing data are from the same domain, which have the same data distribution. In practice, however, speech data is acquired from different devices and recording environments. Thus, the data may differ significantly in terms of language, emotional types and tags. To solve such problem, in this work, we propose a bimodal fusion algorithm to realize speech emotion recognition, where both facial expression and speech information are optimally fused. We first combine the CNN and RNN to achieve facial emotion recognition. Subsequently, we leverage the MFCC to convert speech signal to images. Therefore, we can leverage the LSTM and CNN to recognize speech emotion. Finally, we utilize the weighted decision fusion method to fuse facial expression and speech signal to achieve speech emotion recognition. Comprehensive experimental results have demonstrated that, compared with the uni-modal emotion recognition, bimodal features-based emotion recognition achieves a better performance.  相似文献   

14.
邱玉  赵杰煜  汪燕芳 《电子学报》2016,44(6):1307-1313
脸部肌肉之间的时空关系在人脸表情识别中起着重要作用,而当前的模型无法高效地捕获人脸的复杂全局时空关系使其未被广泛应用.为了解决上述问题,本文提出一种基于区间代数贝叶斯网络的人脸表情建模方法,该方法不仅能够捕获脸部的空间关系,也能捕获脸部的复杂时序关系,从而能够更加有效地对人脸表情进行识别.且该方法仅利用基于跟踪的特征且不需要手动标记峰值帧,可提高训练与识别的速度.在标准数据库CK+和MMI上进行实验发现本文方法在识别人脸表情过程中有效提高了准确率.  相似文献   

15.
李永伟  陶建华  李凯 《信号处理》2023,39(4):632-638
语音情感识别是实现自然人机交互不可缺失的部分,是人工智能的重要组成部分。发音器官的调控引起情感语音声学特征的差异,从而被感知到不同的情感。传统的语音情感识别只是针对语音信号中的声学特征或听觉特征进行情感分类,忽略了声门波和声道等发音特征对情感感知的重要作用。在我们前期工作中,理论分析了声门波和声道形状对感知情感的重要影响,但未将声门波与声道特征用于语音情感识别。因此,本文从语音生成的角度重新探讨了声门波与声道特征对语音情感识别的可能性,提出一种基于源-滤波器模型的声门波和声道特征语音情感识别方法。首先,利用Liljencrants-Fant和Auto-Regressive eXogenous(ARX-LF)模型从语音信号中分离出情感语音的声门波和声道特征;然后,将分离出的声门波和声道特征送入双向门控循环单元(BiGRU)进行情感识别分类任务。在公开的情感数据集IEMOCAP上进行了情感识别验证,实验结果证明了声门波和声道特征可以有效的区分情感,且情感识别性能优于一些传统特征。本文从发音相关的声门波与声道研究语音情感识别,为语音情感识别技术提供了一种新思路。  相似文献   

16.
陈雁翔  刘鸣 《电子学报》2010,38(12):2920-2924
 人类对语音的感知是多模态的,会同时受到听觉和视觉的影响.以语音及其视觉特征的融合为研究核心,依据发音机理中揭示的音视频之间非同步关联的深层次成因,采用多个发音特征的非同步关联,去描述表面上观察到的音视频之间的非同步,提出了一个基于动态贝叶斯网络的语音与唇动联合模型,并通过音视频双模态的多层次融合,实现了说话人识别系统鲁棒性的提高.音视频双模态数据库上的实验表明了,在不同语音信噪比的条件下多层次融合均达到了更好的性能.  相似文献   

17.
In this paper, we propose techniques of surface electromyographic (EMG) signal detection and processing for the assessment of muscle fiber conduction velocity (CV) during dynamic contractions involving fast movements. The main objectives of the study are: 1) to present multielectrode EMG detection systems specifically designed for dynamic conditions (in particular, for CV estimation); 2) to propose a novel multichannel CV estimation method for application to short EMG signal bursts; and 3) to validate on experimental signals different choices of the processing parameters. Linear adhesive arrays of electrodes are presented for multichannel surface EMG detection during movement. A new multichannel CV estimation algorithm is proposed. The algorithm provides maximum likelihood estimation of CV from a set of surface EMG signals with a window limiting the time interval in which the mean square error (mse) between aligned signals is minimized. The minimization of the windowed mse function is performed in the frequency domain, without limitation in time resolution and with an iterative computationally efficient procedure. The method proposed is applied to signals detected from the vastus laterialis and vastus medialis muscles during cycling at 60 cycles/min. Ten subjects were investigated during a 4-min cycling task. The method provided reliable assessment of muscle fatigue for these subjects during dynamic contractions.  相似文献   

18.
In applying hidden Markov modeling for recognition of speech signals, the matching of the energy contour of the signal to the energy contour of the model for that signal is normally achieved by appropriate normalization of each vector of the signal prior to both training and recognition. This approach, however, is not applicable when only noisy signals are available for recognition. A unified approach is developed for gain adaptation in recognition of clean and noisy signals. In this approach, hidden Markov models (HMMs) for gain-normalized clean signals are designed using maximum-likelihood (ML) estimates of the gain contours of the clean training sequences. The models are combined with ML estimates of the gain contours of the clean test signals, obtained from the given clean or noisy signals, in performing recognition using the maximum a posteriori decision rule. The gain-adapted training and recognition algorithms are developed for HMMs with Gaussian subsources using the expectation-minimization (EM) approach  相似文献   

19.
In many voice-related applications, the presence of echoes and overlapping speech signals can degrade the quality or intelligibility of a desired speech signal to be processed. It is, therefore, important to cancel the echoes and to separate overlapping speech signals from a mixture of these components, so that a specific function of the system, for instance, transmission, speech identification, or recognition, can be accomplished with better performance. However, in many cases we do not know the properties of the communication channel, and sometimes even the number of speech sources is unknown. In this paper, we propose to use a reference signal to determine the channel characteristics. When the estimated channel parameter matrices are obtained, a recurrence formula can then be used to separate various speech signals including their reverberant counterparts. As a finite impulse response (FIR) model is used to describe the observation model of the sources in the reverberant environment, it is not necessary for the processing speech signals to be uncorrelated. Because it involves only simple computation, our approach can be used in online applications. In this paper, we will investigate the validity of our algorithm and compare it with extended fourth-order blind identification (EFOBI). It is found that our method preserves both signal waveforms and their amplitudes even in a noisy environment, whereas EFOBI has not been able to achieve similar performance.  相似文献   

20.
A constrained point-process filtering mechanism for prediction of electromyogram (EMG) signals from multichannel neural spike recordings is proposed here. Filters from the Kalman family are inherently suboptimal in dealing with non-Gaussian observations, or a state evolution that deviates from the Gaussianity assumption. To address these limitations, we modeled the non-Gaussian neural spike train observations by using a generalized linear model that encapsulates covariates of neural activity, including the neurons' own spiking history, concurrent ensemble activity, and extrinsic covariates (EMG signals). In order to predict the envelopes of EMGs, we reformulated the Kalman filter in an optimization framework and utilized a nonnegativity constraint. This structure characterizes the nonlinear correspondence between neural activity and EMG signals reasonably. The EMGs were recorded from 12 forearm and hand muscles of a behaving monkey during a grip-force task. In the case of limited training data, the constrained point-process filter improved the prediction accuracy when compared to a conventional Wiener cascade filter (a linear causal filter followed by a static nonlinearity) for different bin sizes and delays between input spikes and EMG output. For longer training datasets, results of the proposed filter and that of the Wiener cascade filter were comparable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号