首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
The problem of attending to the health of the aged who live alone has became an important issue in developed countries. One way of solving the problem is to check their health condition by a remote-monitoring technique and support them with well-timed treatment. The purpose of this study is to develop an automatic system that can monitor a health condition in real time using acoustical information and detect an abnormal symptom. In this study, cough sound was chosen as a representative acoustical symptom of abnormal health conditions. For the development of the system distinguishing a cough sound from other environmental sounds, a hybrid model was proposed that consists of an artificial neural network (ANN) model and a hidden Markov model (HMM). The ANN model used energy cepstral coefficients obtained by filter banks based on human auditory characteristics as input parameters representing a spectral feature of a sound signal. Subsequently, an output of this ANN model and a filtered envelope of the signal were used for making an input sequence for the HMM that deals with the temporal variation of the sound signal. Compared with the conventional HMM using Mel-frequency cepstral coefficients, the proposed hybrid model improved recognition rates on low SNR from 5 dB down to –10 dB. Finally, a preliminary prototype of the automatic detection system was simply illustrated.   相似文献   

2.
为了提高海洋哺乳动物声音识别算法的识别率和鲁棒性,提出了一种将梅尔倒谱系数MFCC、线性倒谱系数LFCC和时域特征融合作为特征参数进行声音识别的方法。该方法通过融合不同倒谱系数以增强对不同频段的表征能力,通过融合时域特征来更全面地描述声音信息。声音样本通过基于海洋环境下的预处理、特征提取与融合后,用支持向量机进行分类识别。相对于传统算法只针对一种或几种哺乳动物进行识别,该方法在包含61种海洋哺乳动物声音的样本库中进行测试。测试结果显示该算法较传统的梅尔倒谱系数在识别率上提升了5.5%,且在海洋低信噪比环境下有更好的识别表现。  相似文献   

3.
A discriminative temporal feature processing method for robust speech recognition is presented by combining the knowledge and the statistical methods. The cepstral features are first filtered by a RASTA method based on human hearing perception and then processed using the minimum classification error algorithm. Improved recognition performance can be achieved in both quiet and noisy environments  相似文献   

4.
A segment-based speech recognition scheme is proposed. The basic idea is to model explicitly the correlation among successive frames of speech signals by using features representing contours of spectral parameters. The speech signal of an utterance is regarded as a template formed by directly concatenating a sequence of acoustic segments. Each constituent acoustic segment is of variable length in nature and represented by a fixed dimensional feature vector formed by coefficients of discrete orthonormal polynomial expansions for approximating its spectral parameter contours. In the training, an automatic algorithm is proposed to generate several segment-based reference templates for each syllable class. In the testing, a frame-based dynamic programming procedure is employed to calculate the matching score of comparing the test utterance with each reference template. Performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for 408 highly confusing isolated Mandarin base-syllables. A recognition rate of 81.1% was achieved for the case using 5-segment, 8-reference template models with cepstral and delta-cepstral coefficients as the recognition features. It is 4.5% higher than that of a well-modelled 12-state, 5-mixture CHMM method using cepstral, delta cepstral, and delta-delta cepstral coefficients  相似文献   

5.
为了提高利用梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficients, MFCC)特征向量进行心音信号分类的准确率,本文提出以一种基于独立成分分析(Independent Component Analysis, ICA)及权值优化的MFCC特征向量优化方法。首先,通过消除趋势项、降噪、提取心动周期与基础心音分割等步骤对心音信号预处理;接着,对提取的基础心音信号做Mel频谱变换及倒谱分析提取MFCC特征向量,其中用ICA替代离散余弦变换去除分量间高阶量的相关性,同时采用相关系数为权值优化整体混合矩阵;最后,采用F比衡量特征向量贡献率,并以其为权值优化各维特征向量。通过提取MFCC特征向量采用支持向量机(Support Vector Machine, SVM)的分类器识别第一心音及第二心音,并与人工标注心音状态集进行对比。实验结果表明,基于ICA及权值优化的MFCC特征向量在SVM分类器中识别率得到了有效的提升,且优化算法具备一定抗噪性能。   相似文献   

6.
Heart sounds are the main unavoidable interference in lung sound recording and analysis. Hence, several techniques have been developed to reduce or cancel heart sounds (HS) from lung sound records. The first step in most HS cancellation techniques is to detect the segments including HS. This paper proposes a novel method for HS localization using entropy of the lung sounds. We investigated both Shannon and Renyi entropies and the results of the method using Shannon entropy were superior. Another HS localization method based on multiresolution product of lung sounds wavelet coefficients adopted from was also implemented for comparison. The methods were tested on data from 6 healthy subjects recorded at low (7.5 ml/s/kg) and medium 115 ml/s/kg) flow rates. The error of entropy-based method using Shannon entropy was found to be 0.1 +/- 0.4% and 1.0 +/- 0.7% at low and medium flow rates, respectively, which is significantly lower than that of multiresolution product method and those of other methods reported in previous studies. The proposed method is fully automated and detects HS included segments in a completely unsupervised manner.  相似文献   

7.
丘敬云  李琳 《电子世界》2012,(9):136-138
本文提出了一种新的说话人特征分类方法,基于计算动词相似度理论,建立距离和趋势的评价模型,通过计算特征向量与k-means算法聚类所得的聚类中心的相似度矩阵,将说话人个性特征从MFCC特征域映射到说话人相似度属性空间中,形成新的特征向量集,这样,每个说话人的特征向量将被聚为在距离和变化趋势上最具相似性的k分类。之后,利用GMM模型在属性空间内进行联合概率分析、匹配,建立新的说话人识别系统。本文采用标准TIMIT语音库与NIST语音库在该识别系统中进行一系列实验,结果表明,该基于新的优化特征分类的识别系统,对比传统的说话人识别系统,在等错误率上有很好的提高。  相似文献   

8.
A text independent speaker recognition system based on hard-limited eigenfunctions derived from the Karhunen-Loeve transform is proposed. Two databases, each with 100 Mandarin speakers, are collected for system evaluation. It is demonstrated that >94% correct classification rates can be achieved by the use of the first 32 hard-limited features  相似文献   

9.
Dysarthria is a degenerative disorder of the central nervous system that affects the control of articulation and pitch; therefore, it affects the uniqueness of sound produced by the speaker. Hence, dysarthric speaker recognition is a challenging task. In this paper, a feature-extraction method based on deep belief networks is presented for the task of identifying a speaker suffering from dysarthria. The effectiveness of the proposed method is demonstrated and compared with well-known Mel-frequency cepstral coefficient features. For classification purposes, the use of a multi-layer perceptron neural network is proposed with two structures. Our evaluations using the universal access speech database produced promising results and outperformed other baseline methods. In addition, speaker identification under both text-dependent and text-independent conditions are explored. The highest accuracy achieved using the proposed system is 97.3%.  相似文献   

10.
In this research work, we present a newly fingertip electrocardiogram (ECG) data acquisition device capable of recording the lead-1 ECG signal through the right- and left-hand thumb fingers. The proposed device is high-sensitive, dry-contact, portable, user-friendly, inexpensive, and does not require using conventional components which are cumbersome and irritating such as wet adhesive Ag/AgCl electrodes. One of the other advantages of this device is to make it possible to record and use the lead-1 ECG signal easily in any condition and anywhere incorporating with any platform to use for advanced applications such as biometric recognition and clinical diagnostics. Furthermore, we proposed a biometric identification method based on combining autocorrelation and discrete cosine transform-based features, cepstral features, and QRS beat information. The proposed method was evaluated on three fingertip ECG signal databases recorded by utilizing the proposed device. The experimental results demonstrate that the proposed biometric identification method achieves person recognition rate values of 100% (30 out of 30), 100\(\%\) (45 out of 45), and 98.33\(\%\) (59 out of 60) for 30, 45, and 60 subjects, respectively.  相似文献   

11.
In this paper, a novel subspace projection approach is proposed for analysis of speech signal under stressed condition. The subspace projection method is based on the assumption of orthogonality between speech subspace and stress subspace. Speech and stress subspaces contain speech and stress information, respectively. The projection of stressed speech vectors onto the speech subspace will separate speech-specific information. In this work, the speech subspace consists of neutral speech vectors. Speech and stress recognition techniques are used to verify the orthogonal relation between speech and stress subspaces. The evaluation database consists of 119 word vocabulary under neutral, angry, sad and Lombard conditions. Hidden Markov models for speech and stress recognition are used with mel-frequency cepstral coefficient features for evaluation of estimated speech and stress information.  相似文献   

12.
A new technique for classifying patterns of movement via electromyographic (EMG) signals is presented. Two methods (conventional autoregressive (AR) coefficients and cepstral coefficients) for extracting features from EMG signals and three classification algorithms (Euclidean Distance Measure (EDM), Weighted Distance Measure (WDM), and Maximum Likelihood Method (MLM)) for discriminating signals representative of broad classes of movements are described and compared. These three classifiers are derived from Bayes classifier with some assumptions, the relationship among them is discussed. The conventional MLM is modified to avoid heavy matrix inversion. Six able-bodied subjects with two pairs of surface electrodes located on bilateral sternocleidomastoid and upper trapezius muscles were studied in the experiment. The EMG signals of 20 repetitions of 10 motions were analyzed for each subject. Experimental results showed that mean recognition rate of the cepstral coefficients was at least 5% superior to that of the AR coefficients. The improvement achieved by the cepstral method was statistically significant for all the three classifiers. Reasons for the superiority of cepstral features were investigated from the feature space and frequency domain, respectively. The cepstral coefficients owned better cluster separability in feature space and they emphasized the more informative part in the frequency domain. The discrimination rate of the MLM was the highest among three classifiers. Incorporation of the cepstral features with the MLM could reduce the misclassification rate by 10.6% when compared with the combination of AR coefficients and EDM. Proper choice of five of ten motions could further raise the recognition rate to more than 95%  相似文献   

13.
基于特征融合的人体行为识别   总被引:5,自引:5,他引:0  
为克服单个行为表达方法有效性上的不足,提出了一种基于多特征融合和支持向量机(SVM)的人体行为识别(HAR)方法。首先,利用背景差分提取运动显著区域;然后提取运动显著区域的剪影直方图和光流直方图,并采取一定的融合策略,构建融合特征结合SVM识别人体行为。实验以广泛使用的公开数据集Weizmann为研究对象,正确识别率达到99.8%以上。结果表明,提出的特征融合及识别方法能有效地对人体行为进行识别;而且,由于规避了比较耗时的序列匹配操作,减少了计算量。  相似文献   

14.
音质(Timbre)是音乐感知和言语识别的重要线索。传统音质分析方法无法同时获取理想的时间分辨率和频域分辨率,对音频的非平稳特性没有很好地处理。本文采用时变滤波经验模态分解(Time Varying Filtering based EMD,TVF-EMD)方法提取音频的固有模态函数用于希尔伯特变换,并构建了音质的希尔伯特频谱分布特征和希尔伯特轮廓特征。在乐器分类问题中,将提取的两类音质特征与Mel倒谱系数特征(Mel Frequency Cepstral Coefficients, MFCCs)有效结合,然后构造基于双向长短时记忆网络的音质时序分类器,在公开乐器演奏音频数据库中进行了乐器分类实验。结果表明,所提出的音质特征可以有效补充Mel倒谱特征等传统特征无法表达的非线性非平稳信息,大大提高了本音质表征方法对复杂音频的适应性和鲁棒性。   相似文献   

15.
邬倩  吴飞  骆立志 《电子科技》2009,33(11):79-83
基于人体骨架的动作识别具有鲁棒性和视点不变性的优点,为进一步提高骨架动作识别的识别率,打破以往大部分基于深度学习的方法的输入内容为人体关节坐标的局限性,文中提出一种将几何特征与LSTM网络结合的人体骨架动作识别算法。该算法选择基于关节与选定直线之间距离的几何特征作为网络的输入,引入了一种时间选择LSTM网络进行训练。利用时间选择LSTM网络拥有选出最具识别性时间段特征的能力,在SBU Interaction数据集和UT Kinect数据集上分别取得了99.36%和99.20%的识别率。实验结果证明了该方法对人体骨架动作识别的有效性。  相似文献   

16.
A new method is presented using a wearable wrist sensor to estimate acoustic parameters S1 and S2 of the heart sounds based on the neural network technique. Using the signal processing method, the heart conditions can be analyzed and monitored in real time and potentially in a long term with a wrist device. The velocities and time delays of the cardiac pulse waves in blood vessels were experimentally acquired and calculated at different artery locations on the human body. Signal attenuation of the pulses from the heart to the wrist radial artery was analyzed and a pulse-waveform travel model in blood vessels was proposed. A band-pass filter is applied to the pulse waves at various artery locations to reveal the heart sound features S1 and S2 existed in the pulse waves. In order to obtain accurate acoustic parameters, a neural network with two layers and 500 nonlinear tansig neurons was employed to estimate the heart sounds using the pulse waveforms from the wrist radial artery. It is encouraging to find that the acoustic parameters of estimated heart sounds by the trained neural network have only 1% average errors compared with the original heart sounds. The effects of various analog-to-digital conversion resolutions and sample rates were empirically analyzed. When the maximum value of errors is allowed within 2.15%, a 10,000-Hz sample rate and 12-bit resolution should be an appropriate selection for lower power consumption. Using the trained neural network, the new estimation method has been verified by a sensor with Bluetooth communication strapped on the wrist, thus mobility is not limited for the person whose heart sounds need to be monitored.  相似文献   

17.
Tracheal sound average power is directly related to the breathing flow rate and recently it has attracted considerable attention for acoustical flow estimation. However, the flow-sound relationship is highly variable among people and it also changes for the same person at different flow rates. Hence, a robust model capable of estimating flow from tracheal sounds at different flow rates in a large group of individuals does not exist. In this paper, a model is proposed to estimate respiratory flow from tracheal sounds. The proposed model eliminates the dependence of the previous methods on calibrating the model for every individual and at different flow rates. To validate the model, it was applied to the respiratory sound and flow data of 93 healthy individuals. We investigated the statistical correlation between the model parameters and anthropometric features of the subjects. The results have shown that gender, height, and smoking are the most significant factors that affect the model parameters. Hence, we grouped nonsmoker subjects into four groups based on their gender and height. The average of model parameters in each group was defined as the group-calibrated model parameters. These models were applied to estimate flow from data of subjects within the same group and in the other groups. The results show that flow estimation error based on the group-calibrated model is less than 10%. The low estimation errors confirm the possibility of defining a general flow estimation model for subjects with similar anthropometric features with no need for calibrating the model parameters for every individual. This technique simplifies the acoustical flow estimation in general applications including sleep studies and patients' screening in health care facilities.  相似文献   

18.
Heart sounds are the main obstacle in lung sound analysis. To tackle this obstacle, we propose a diagnosis algorithm that uses singular spectrum analysis (SSA) and frequency features of heart and lung sounds. In particular, we introduce a frequency coefficient that shows the frequency difference between heart and lung sounds. The proposed algorithm is applied to a synthetic mixture of heart and lung sounds. The results show that heart sounds can be extracted successfully and localizations for the first and second heart sounds are remarkably performed. An error analysis of the localization results shows that the proposed algorithm has fewer errors compared to the SSA method, which is one of the most powerful methods in the localization of heart sounds. The presented algorithm is also applied in the cases of recorded respiratory sounds from the chest walls of five healthy subjects. The efficiency of the algorithm in extracting heart sounds from the recorded breathing sounds is verified with power spectral density evaluations and listening. Most studies have used only normal respiratory sounds, whereas we additionally use abnormal breathing sounds to validate the strength of our achievements.  相似文献   

19.
Automatic speech recognition under adverse noise conditions has been a challenging problem. Under noise conditions when the stationarity assumption is valid, effective techniques have been established to provide excellent recognition accuracies. Under the conditions when this assumption cannot hold, recognition performance de- clines rapidly. Missing data, MD, theory is a promising method for robust automatic speech recognition, ASR, under an y noise condition. Unfortunately, the choice of feature used in the recognizer process is commonly limited to spectral based representations. The combination of recognizers approach to MD ASR allows the use of cepstral based features within the MD framework through a fusion of features mechanism in the pat- tern recognition stage. It was found that under two types of non-stationary noise conditions the combined fused effect, experienced by the fusion process, increased recognition accuracies substantially over traditional MD and cepstral based recognizers.  相似文献   

20.
利用抗噪幂归一化倒谱系数的鸟类声音识别   总被引:3,自引:0,他引:3       下载免费PDF全文
颜鑫  李应 《电子学报》2013,41(2):295-300
 针对真实环境中各种背景噪声下的鸟类声音识别问题,提出了一种基于新型抗噪特征提取的鸟类声音识别技术.首先,根据适用于高度非平稳环境下的噪声估计算法求出噪声功率谱.其次,使用多频带谱减法对声音功率谱进行降噪处理.接着,结合降噪的声音功率谱提取抗噪幂归一化倒谱系数(APNCC).最后,采用支持向量机(SVM)分别对提取的APNCC,幂归一化倒谱系数(PNCC)和Mel频率倒谱系数(MFCC)对34种鸟类声音进行不同环境和信噪比情况下的对比实验.实验表明,提取的APNCC具有较好的平均识别效果及较强的噪声鲁棒性,更适用于信噪比低于30dB环境下的鸟类声音识别.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号