首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Acoustical parameters extracted from the recorded voice samples are actively pursued for accurate detection of vocal fold pathology. Most of the system for detection of vocal fold pathology uses high quality voice samples. This paper proposes a hybrid expert system approach to detect vocal fold pathology using the compressed/low quality voice samples which includes feature extraction using wavelet packet transform, clustering based feature weighting and classification. In order to improve the robustness and discrimination ability of the wavelet packet transform based features (raw features), we propose clustering based feature weighting methods including k-means clustering (KMC), fuzzy c-means (FCM) clustering and subtractive clustering (SBC). We have investigated the effectiveness of raw and weighted features (obtained after applying feature weighting methods) using four different classifiers: Least Square Support Vector Machine (LS-SVM) with radial basis kernel, k-means nearest neighbor (kNN) classifier, probabilistic neural network (PNN) and classification and regression tree (CART). The proposed hybrid expert system approach gives a promising classification accuracy of 100% using the feature weighting methods and also it has potential application in remote detection of vocal fold pathology.  相似文献   

2.
为提高构音障碍识别准确率,提出一种基于多特征组合的构音障碍语音识别方法.利用遗传算法进行特征选择,从语音的韵律特征、频谱特征、人耳听觉特征、嗓音质量特征和声道模型特征等5类特征组合成的多特征组合中选择出分类准确率最高的特征子集,通过SVM分类器对选择出的特征进行识别.在Torgo声学和发音数据库对不同的语音刺激类型进行...  相似文献   

3.
针对使用语音变换技术的语音篡改,提出一种自动检测方法。在分析语音变换基本模型和变换语音失真的基础上,提取语音信号的声道参数以及相关的信号统计量,并通过支持向量机递归特征消除法,选择出对语音变换比较敏感的特征作为分类特征,使用支持向量机进行语音变换检测和变换语音的说话人性别判别。对于一种语音变换软件的实验结果表明,该方法具有较高的检测准确率,其中语音变换检测的平均准确率为94.90%,变换语音的说话人性别判别平均准确率为92.09%。  相似文献   

4.
针对实际环境噪声下的手机来源识别问题,提出一种基于线性判别分析和时序卷积网络的手机来源识别方法.首先,通过分析不同手机语音特征在实际环境噪声下的分类性能,基于带能量描述符、常数Q变换域和线性判别分析得到一种新的手机语音混合特征.然后,以此混合特征为输入,基于时序卷积网络进行训练和分类.最后,在10个品牌、47种手机型号...  相似文献   

5.
为了在病理嗓音识别中为特征参数选择提供依据,提出声带非对称力学建模仿真病变声带并进行分析研究。依据声带的分层结构和组织特性,建立声带力学模型,耦合声门气流,求取模型输出的声门源激励波形。采用遗传粒子群 拟牛顿结合优化算法(Genetic particle swarm optimization based on quasi-Newton method, GPSO-QN)将模 型输出的声门源和实际目标声门波相匹配,提取优化模型参数。仿真实验结果表明,该声带模 型能产生与实际声门源相一致的声门波形,同时也证明了左右声带生理组织间的非对称性是产生病理嗓音的重要原因。  相似文献   

6.
Biometric speech recognition systems are often subject to various spoofing attacks, the most common of which are speech synthesis and speech conversion attacks. These spoofing attacks can cause the biometric speech recognition system to incorrectly accept these spoofing attacks, which can compromise the security of this system. Researchers have made many efforts to address this problem, and the existing studies have used the physical features of speech to identify spoofing attacks. However, recent studies have shown that speech contains a large number of physiological features related to the human face. For example, we can determine the speaker’s gender, age, mouth shape, and other information by voice. Inspired by the above researches, we propose a spoofing attack recognition method based on physiological-physical features fusion. This method involves feature extraction, a densely connected convolutional neural network with squeeze and excitation block (SE-DenseNet), and feature fusion strategies. We first extract physiological features in audio from a pre-trained convolutional network. Then we use SE-DenseNet to extract physical features. Such a dense connection pattern has high parameter efficiency, and squeeze and excitation blocks can enhance the transmission of the feature. Finally, we integrate the two features into the classification network to identify the spoofing attacks. Experimental results on the ASVspoof 2019 data set show that our model is effective for voice spoofing detection. In the logical access scenario, our model improves the tandem decision cost function and equal error rate scores by 5% and 7%, respectively, compared to existing methods.  相似文献   

7.
为了提高车载噪声环境下语音端点检测的准确性,提出了一个基于GRU RNN的神经网络结构, 对带噪语音的Log Mel特征序列进行处理,实现语音与噪声的分离,从而恢复出纯净语音的Log Mel特征序列;在此基础上,提出一种新的特征Log Mel Sum,并用该特征进行端点检测。实验结果表明,在车载环境下,本文方法具有很好的端点检测性能。  相似文献   

8.

Parkinson’s disease (PD) is a neurological disorder marked by decreased dopamine levels in the brain. Persons suffering from PD, exhibits vocal symptoms such as dysphonia and dysarthria. Speech impairments in PD are grouped together and called as hypokinetic dysarthria. Traditional PD management is based on a patient’s clinical history and through physical examination as there are currently no known biomarkers for its diagnosis. Automatic analysis techniques aid clinicians in diagnosis and monitoring patients using speech and provide frequent, cost effective and objective assessment. This paper presents pilot experiment to detect presence of dysarthria in speech and detect level of severity based on deep learning approach. Automated feature extraction and classification using convolutional neural network shows 77.48% accuracy on test samples of TORGO database with five fold validation. Using transfer learning, system performance is further analyzed for gender specific performance as well as in detection of severity of disease.

  相似文献   

9.
In this paper, the role of speech recognition system in the assessment of dysarthric speech based on a method called Elman back propagation network (EBN) is studied. Dysarthria is a neurological disability that damages the control of motor speech articulators. The persons who suffer from Dysarthria may have speech intelligibility rate which may vary from low (2 %) to high (95 %). EBN is a Recurrent network, here a fully connected neural network is built such that the speech characteristics are represented simultaneously by neuron activation states. It is an efficient self supervised training algorithm. For parametric representation of the speech signal, we used Glottal feature along with mel frequency cepstral coefficients. Then finally the output of both the features is compared after the evaluation process using different neural networks and modeling methods. Evaluation of the proposed method is done on the subset of the Universal Access Research database. The subset consists of 9 dysarthric speakers out of 19 speakers each uttering 100 words repeatedly 3 times. The promising performance of the proposed system can be successfully applied to help the people who work for the voice disorder persons.  相似文献   

10.
Electroencephalography signals are typically used for analyzing epileptic seizures. These signals are highly nonlinear and nonstationary, and some specific patterns exist for certain disease types that are hard to develop an automatic epileptic seizure detection system. This paper discussed statistical mechanics of complex networks, which inherit the characteristic properties of electroencephalography signals, for feature extraction via a horizontal visibility algorithm in order to reduce processing time and complexity. The algorithm transforms a time series signal into a complex network, which some features are abbreviated. The statistical mechanics are calculated to capture distinctions pertaining to certain diseases to form a feature vector. The feature vector is classified by multiclass classification via a k‐nearest neighbor classifier, a multilayer perceptron neural network, and a support vector machine with a 10‐fold cross‐validation criterion. In performance evaluation of proposed method with healthy, seizure‐free interval, and seizure signals, firstly, input data length is regarded among some practical signal samples by optimizing between accuracy‐processing time, and the proposed method yields outstanding performance on the average classification accuracy for 3‐class problems mainly for detection of seizure‐free interval and seizure signals and acceptable results for 2‐class and 5‐class problems comparing with conventional methods. The proposed method is another tool that can be used for classifying signal patterns, as an alternative to time/frequency analyses.  相似文献   

11.
为充分利用含噪语音特征来提升深度神经网络的语音增强性能,提出一种融合时频域特征的语音增强方法。以含噪语音的波形和纯净语音的对数功率谱分别作为训练特征和训练目标,获取含噪语音时域特征到纯净语音频域特征的映射关系。将含噪语音的波形和对数功率谱共同作为训练特征,构建融合含噪语音时域和频域特征的深度神经网络实现语音增强。实验结果表明,与单纯使用频域特征的语音增强方法相比,该方法能够明显提升增强语音的质量和可懂度,具有更好的语音增强性能。  相似文献   

12.
目的 人脸表情识别是计算机视觉的核心问题之一。一方面,表情的产生对应着面部肌肉的一个连续动态变化过程,另一方面,该运动过程中的表情峰值帧通常包含了能够识别该表情的完整信息。大部分已有的人脸表情识别算法要么基于表情视频序列,要么基于单幅表情峰值图像。为此,提出了一种融合时域和空域特征的深度神经网络来分析和理解视频序列中的表情信息,以提升表情识别的性能。方法 该网络包含两个特征提取模块,分别用于学习单幅表情峰值图像中的表情静态“空域特征”和视频序列中的表情动态“时域特征”。首先,提出了一种基于三元组的深度度量融合技术,通过在三元组损失函数中采用不同的阈值,从单幅表情峰值图像中学习得到多个不同的表情特征表示,并将它们组合在一起形成一个鲁棒的且更具辩识能力的表情“空域特征”;其次,为了有效利用人脸关键组件的先验知识,准确提取人脸表情在时域上的运动特征,提出了基于人脸关键点轨迹的卷积神经网络,通过分析视频序列中的面部关键点轨迹,学习得到表情的动态“时域特征”;最后,提出了一种微调融合策略,取得了最优的时域特征和空域特征融合效果。结果 该方法在3个基于视频序列的常用人脸表情数据集CK+(the e...  相似文献   

13.
由于LTE网络数据量庞大而且种类繁多,人工路测分析已经无法满足当今对基于路测数据质差小区检测的需求.为了提高质差小区检测的效率与正确率,机器学习逐渐在质差小区检测中得到了应用.本文针对小区数量较少的路测数据,提出了一种基于距离的四维特征的质差小区检测方法.该方法采用聚类算法和人工判断相结合的方式对路测数据进行标定,对比分析了基于距离的四维特征和传统的两维特征的提取效果,并在逻辑回归分类器、决策树分类器、支持向量机分类器和k近邻分类器这4种分类器中进行分类.实验结果表明,基于距离的四维特征比传统的二维特征更有利于质差小区检测;使用四维特征进行分类,支持向量机分类器的效果最好.  相似文献   

14.
15.
The evolution of robust speech recognition systems that maintain a high level of recognition accuracy in difficult and dynamically-varying acoustical environments is becoming increasingly important as speech recognition technology becomes a more integral part of mobile applications. In distributed speech recognition (DSR) architecture the recogniser's front-end is located in the terminal and is connected over a data network to a remote back-end recognition server. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to the remote back-end recogniser. DSR provides particular benefits for the applications of mobile devices such as improved recognition performance compared to using the voice channel and ubiquitous access from different networks with a guaranteed level of recognition performance. A feature extraction algorithm integrated into the DSR system is required to operate in real-time as well as with the lowest possible computational costs.In this paper, two innovative front-end processing techniques for noise robust speech recognition are presented and compared, time-domain based frame-attenuation (TD-FrAtt) and frequency-domain based frame-attenuation (FD-FrAtt). These techniques include different forms of frame-attenuation, improvement of spectral subtraction based on minimum statistics, as well as a mel-cepstrum feature extraction procedure. Tests are performed using the Slovenian SpeechDat II fixed telephone database and the Aurora 2 database together with the HTK speech recognition toolkit. The results obtained are especially encouraging for mobile DSR systems with limited sizes of available memory and processing power.  相似文献   

16.
A modified k-nearest neighbour (k-NN) classifier is proposed for supervised remote sensing classification of hyperspectral data. To compare its performance in terms of classification accuracy and computational cost, k-NN and a back-propagation neural network classifier were used. A classification accuracy of 91.2% was achieved by the proposed classifier with the data set used. Results from this study suggest that the accuracy achieved with this classifier is significantly better than the k-NN and comparable to a back-propagation neural network. Comparison in terms of computational cost also suggests the effectiveness of modified k-NN classifier for hyperspectral data classification. A fuzzy entropy-based filter approach was used for feature selection to compare the performance of modified and k-NN classifiers with a reduced data set. The results suggest a significant increase in classification accuracy by the modified k-NN classifier in comparison with k-NN classifier with selected features.  相似文献   

17.
This paper presents an artificial neural network (ANN) for speaker-independent isolated word speech recognition. The network consists of three subnets in concatenation. The static information within one frame of speech signal is processed in the probabilistic mapping subnet that converts an input vector of acoustic features into a probability vector whose components are estimated probabilities of the feature vector belonging to the phonetic classes that constitute the words in the vocabulary. The dynamics capturing subnet computes the first-order cross correlation between the components of the probability vectors to serve as the discriminative feature derived from the interframe temporal information of the speech signal. These dynamic features are passed for decision-making to the classification subnet, which is a multilayer perceptron (MLP). The architecture of these three subnets are described, and the associated adaptive learning algorithms are derived. The recognition results for a subset of the DARPA TIMIT speech database are reported. The correct recognition rate of the proposed ANN system is 95.5%, whereas that of the best of continuous hidden Markov model (HMM)-based systems is only 91.0%  相似文献   

18.
在连续语音识别系统中,针对复杂环境(包括说话人及环境噪声的多变性)造成训练数据与测试数据不匹配导致语音识别率低下的问题,提出一种基于自适应深度神经网络的语音识别算法。结合改进正则化自适应准则及特征空间的自适应深度神经网络提高数据匹配度;采用融合说话人身份向量i-vector及噪声感知训练克服说话人及环境噪声变化导致的问题,并改进传统深度神经网络输出层的分类函数,以保证类内紧凑、类间分离的特性。通过在TIMIT英文语音数据集和微软中文语音数据集上叠加多种背景噪声进行测试,实验结果表明,相较于目前流行的GMM-HMM和传统DNN语音声学模型,所提算法的识别词错误率分别下降了5.151%和3.113%,在一定程度上提升了模型的泛化性能和鲁棒性。  相似文献   

19.
提出基于深层声学特征的端到端单声道语音分离算法,传统声学特征提取方法需要经过傅里叶变换、离散余弦变换等操作,会造成语音能量损失以及长时间延迟.为了改善这些问题,提出了以语音信号的原始波形作为深度神经网络的输入,通过网络模型来学习语音信号的更深层次的声学特征,实现端到端的语音分离.客观评价实验说明,本文提出的分离算法不仅有效地提升了语音分离的性能,也减少了语音分离算法的时间延迟.  相似文献   

20.
为了提高情感识别的正确率,针对单模情感特征及传统特征融合方法识别低的缺陷,提出了一种核典型相关分析算法(KCCA)的多特征(multi-features)融合情感识别方法(MF-KCCA)。分别提取语音韵律特征和分数阶傅里叶域表情特征,利用两种特征互补性,采用KCCA将它们进行融合,降低特征向量的维数,利用最近邻分类器进行情感分类和识别。采用加拿大瑞尔森大学数据库进行仿真实验,结果表明,MF-KCCA有效提高了语音情感的识别率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号