首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
音乐类型(Genre)是应用最普遍的管理数字音乐数据库的方式,提出一种基于隐马尔可夫模型(Hidden Markov Models,HMMs)的音乐自动分类方案。在考虑传统的音色特征(Timbre)的同时,将另一重要特征节奏(Tempo)也加以考虑,并通过bagging训练两组HMM进行分类,达到了良好的效果。从结构、状态数和混合高斯模型数三个方面进行了参数优化,找到了最佳的HMM参数。在音乐数据集GTZAN上对传统模型和新模型分类效果进行了测试,结果表明考虑了节奏特征的HMM分类效果更佳。  相似文献   

Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a robust and effective approach for speech/music discrimination, which relies on a set of features derived from fundamental frequency (F0) estimation. Comparison between the proposed set of features and some commonly used timbral features is performed, aiming to assess the good discriminatory power of the proposed F0-based feature set. The classification scheme is composed of a classical Statistical Pattern Recognition classifier followed by a Fuzzy Rules Based System. Comparison with other well-proven classification schemes is also performed. Experimental results reveal that our speech/music discriminator is robust enough, making it suitable for a wide variety of multimedia applications.  相似文献   

This paper proposes the use of speech-specific features for speech / music classification. Features representing the excitation source, vocal tract system and syllabic rate of speech are explored. The normalized autocorrelation peak strength of zero frequency filtered signal, and peak-to-sidelobe ratio of the Hilbert envelope of linear prediction residual are the two source features. The log mel energy feature represents the vocal tract information. The modulation spectrum represents the slowly-varying temporal envelope corresponding to the speech syllabic rate. The novelty of the present work is in analyzing the behavior of these features for the discrimination of speech and music regions. These features are non-linearly mapped and combined to perform the classification task using a threshold based approach. Further, the performance of speech-specific features is evaluated using classifiers such as Gaussian mixture models, and support vector machines. It is observed that the performance of the speech-specific features is better compared to existing features. Additional improvement for speech / music classification is achieved when speech-specific features are combined with the existing ones, indicating different aspects of information exploited by the former.  相似文献   

通过MFFC计算出的语音特征系数,由于语音信号的动态性,帧之间有重叠,噪声的影响,使特征系数不能完全反映出语音的信息。提出一种隐马尔可夫模型(HMM)和小波神经网络(WNN)混合模型的抗噪语音识别方法。该方法对MFCC特征系数利用小波神经网络进行训练,得到新的MFCC特征系数。实验结果表明,在噪声环境下,该混合模型比单纯HMM具有更强的噪声鲁棒性,明显改善了语音识别系统的性能。  相似文献   

Feature extraction is an important component of pattern classification and speech recognition. Extracted features should discriminate classes from each other while being robust to environmental conditions such as noise. For this purpose, several feature transformations are proposed which can be divided into two main categories: data-dependent transformation and classifier-dependent transformation. The drawback of data-dependent transformation is that its optimization criteria are different from the measure of classification error which can potentially degrade the classifier’s performance. In this paper, we propose a framework to optimize data-dependent feature transformations such as PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis) and HLDA (Heteroscedastic LDA) using minimum classification error (MCE) as the main objective. The classifier itself is based on Hidden Markov Model (HMM). In our proposed HMM minimum classification error technique, the transformation matrices are modified to minimize the classification error for the mapped features, and the dimension of the feature vector is not changed. To evaluate the proposed methods, we conducted several experiments on the TIMIT phone recognition and the Aurora2 isolated word recognition tasks. The experimental results show that the proposed methods improve performance of PCA, LDA and HLDA transformation for mapping Mel-frequency cepstral coefficients (MFCC).  相似文献   

This paper describes a fast training algorithm for feedforward neural nets, as applied to a two-layer neural network to classify segments of speech as voiced, unvoiced, or silence. The speech classification method is based on five features computed for each speech segment and used as input to the network. The network weights are trained using a new fast training algorithm which minimizes the total least squares error between the actual output of the network and the corresponding desired output. The iterative training algorithm uses a quasi-Newtonian error-minimization method and employs a positive-definite approximation of the Hessian matrix to quickly converge to a locally optimal set of weights. Convergence is fast, with a local minimum typically reached within ten iterations; in terms of convergence speed, the algorithm compares favorably with other training techniques. When used for voiced-unvoiced-silence classification of speech frames, the network performance compares favorably with current approaches. Moreover, the approach used has the advantage of requiring no assumption of a particular probability distribution for the input features.  相似文献   

基于人工智能的色差分类技术   总被引:1,自引:0,他引:1  
利用人工智能和图像处理技术,提出了一种新的瓷砖色差自动分类方法。在传统的图像颜色直方图方法基础上,充分考虑了颜色的空间分布对人类视觉的影响,将图像主要颜色的H(色调)、S(饱和度)、P(颜色的概率)及D(颜色的空间分布)作为神经网络的输入,克服了仅仅利用颜色直方图进行颜色分类带来的因颜色空间信息丢失而无法精确分类的问题,采用模糊技术对瓷砖表面色差进行了精确分类。  相似文献   

In this paper, a robust method is proposed for segmentation of medical images by exploiting the concept of information gain. Medical images contain inherent noise due to imaging equipment, operating environment and patient movement during image acquisition. A robust medical image segmentation technique is thus inevitable for accurate results in subsequent stages. The clustering technique proposed in this work updates fuzzy membership values and cluster centroids based on information gain computed from the local neighborhood of a pixel. The proposed approach is less sensitive to noise and produces homogeneous clustering. Experiments are performed on medical and non-medical images and results are compared with state of the art segmentation approaches. Analysis of visual and quantitative results verifies that the proposed approach outperforms other techniques both on noisy and noise free images. Furthermore, the proposed technique is used to segment a dataset of 300 real carotid artery ultrasound images. A decision system for plaque detection in the carotid artery is then proposed. Intima media thickness (IMT) is measured from the segmented images produced by the proposed approach. A feature vector based on IMT values is constructed for making decision about the presence of plaque in carotid artery using probabilistic neural network (PNN). The proposed decision system detects plaque in carotid artery images with high accuracy. Finally, effect of the proposed segmentation technique has also been investigated on classification of carotid artery ultrasound images.  相似文献   

基于模糊聚类算法的神经网络集成   总被引:3,自引:0,他引:3  
基于模糊聚类思想,提出了一种神经网络集成方法。利用隶属度函数,构造了一个分布函数,根据分布函数对训练数据进行抽样,用所抽得的数据作为个体神经网络的训练样本,多个个体神经网络构成神经网络集成,集成的输出采用相对多数投票法。理论分析和实验结果表明,该方法对模式分类能取得较好的效果。  相似文献   

经典的隐马尔可夫模型(HMM)是一种基于统计信号的模型,它在基于内容的音频检索系统中具有重要的作用。根据音频分类重类型轻内容的特性,将单状态的HMM用于音频分类,克服了多状态HMM在模型初始化时状态初始概率和转移概率赋值带有假设不准确的缺点。实验结果表明基于单状态的HMM模型音频分类方法能有效地减少误识率,提高音频分类的精确度。  相似文献   

Many audio signal applications are corrupted by noise. In particular, adaptive filters are frequently applied to white noise reduction in audio. Recent work provides that there exist some insights on using an artificial intelligence method called artificial hydrocarbon networks (AHNs) for filtering audio signals. Thus, the scope of this paper is to design and implement a novel approach of artificial hydrocarbon networks on adaptive filtering for audio signals. Three experiments were developed. Results demonstrate that AHNs can reduce noise from audio signals. A comparison between the proposed algorithm and a FIR-filter is also provided. The short-time objective intelligibility value (STOI) and the signal-to-noise ratio (SNR) were used for evaluation. At last, the proposed training method for finding the parameters involved in the AHN-filter can also be used in other fields of application.  相似文献   

随着用户对于数据挖掘的精确度与准确度要求的日益提高,马尔可夫模型与隐马尔可夫模型被广泛用于数据挖掘领域。本文阐述了马尔可夫模型和隐马尔可夫模型数据挖掘领域的应用,以及隐马尔可夫模型可解决的问题,以供其他研究者借鉴。  相似文献   

随着用户对于数据挖掘的精确度与准确度要求的日益提高,马尔可夫模型与隐马尔可夫模型被广泛用于数据挖掘领域。本文阐述了马尔可夫模型和隐马尔可夫模型数据挖掘领域的应用,以及隐马尔可夫模型可解决的问题,以供其他研究者借鉴。  相似文献   

The kernelized fuzzy c-means algorithm uses kernel methods to improve the clustering performance of the well known fuzzy c-means algorithm by mapping a given dataset into a higher dimensional space non-linearly. Thus, the newly obtained dataset is more likely to be linearly seprable. However, to further improve the clustering performance, an optimization method is required to overcome the drawbacks of the traditional algorithms such as, sensitivity to initialization, trapping into local minima and lack of prior knowledge for optimum paramaters of the kernel functions. In this paper, to overcome these drawbacks, a new clustering method based on kernelized fuzzy c-means algorithm and a recently proposed ant based optimization algorithm, hybrid ant colony optimization for continuous domains, is proposed. The proposed method is applied to a dataset which is obtained from MIT–BIH arrhythmia database. The dataset consists of six types of ECG beats including, Normal Beat (N), Premature Ventricular Contraction (PVC), Fusion of Ventricular and Normal Beat (F), Artrial Premature Beat (A), Right Bundle Branch Block Beat (R) and Fusion of Paced and Normal Beat (f). Four time domain features are extracted for each beat type and training and test sets are formed. After several experiments it is observed that the proposed method outperforms the traditional fuzzy c-means and kernelized fuzzy c-means algorithms.  相似文献   

Content based music genre classification is a key component for next generation multimedia search agents. This paper introduces an audio classification technique based on audio content analysis. Artificial Neural Networks (ANNs), specifically multi-layered perceptrons (MLPs) are implemented to perform the classification task. Windowed audio files of finite length are analyzed to generate multiple feature sets which are used as input vectors to a parallel neural architecture that performs the classification. This paper examines a combination of linear predictive coding (LPC), mel frequency cepstrum coefficients (MFCCs), Haar Wavelet, Daubechies Wavelet and Symlet coefficients as feature sets for the proposed audio classifier. Parallel to MLP, a Gaussian radial basis function (GRBF) based ANN is also implemented and analyzed. The obtained prediction accuracy of 87.3% in determining the audio genres claims the efficiency of the proposed architecture. The ANN prediction values are processed by a rule based inference engine (IE) that presents the final decision.  相似文献   

一种基于HMM和ANN的语音情感识别分类器   总被引:2,自引:0,他引:2  
罗毅 《微计算机信息》2007,23(34):218-219,296
针对在语音情感识别中孤立使用隐马尔科夫模型(HMM)固有的分类特性较差的缺点,本文提出了利用隐马尔科夫模型和径向基函数神经网络(RBF)对惊奇,愤怒,喜悦,悲伤,厌恶5种语音情感进行识别的方法。该方法借助HMM规整语音情感特征向量,并用RBF作为最终的决策分类器。实验结果表明在本文的实验条件下此方法和孤立HMM相比具有更好的性能,厌恶的识别率有了较大改进。  相似文献   

This paper addresses dynamic classification of different ranges of ballistic missiles (BM) for air defense application based on kinematic attributes acquired by radars for taking appropriate measures to intercept them. The problem of dynamic classification is formulated using real-time neural network (RTNN) and hidden Markov model (HMM). The idea behind these algorithms is to calculate the output in one pass rather than training and computing over large number of iterations. Besides, to meet the conflicting requirements of classifying small as well as long-range trajectories, we are also proposing a formulation for partitioning the trajectory by using moving window concept. This concept allows us to use parameters in localized frame which helps in handling wide-range of trajectories to fit into the same network. These algorithms are evaluated using the simulated data generated from 6 degree-of-freedom (6DOF) mathematical model, which models missile trajectories. Experimental results show that both the networks are classifying above 95% with real-time neural network outperforming HMM in terms of time of computation on same data. The small classification time enables the use of real-time classification neural network in complex scenario of multi-radar, multi-target engagement by interceptor missiles. To the best of our knowledge this is the first time an attempt is made to classify ballistic missiles using RTNN and HMM.  相似文献   

语音识别技术目前的技术框架主要基于模式识别,对数据的匹配性要求很高,对方言、口音以及口语的处理能力还存在很大的瓶颈,即使是标准口音,也需要用户较高的配合度。、本文介绍了语音信号处理技术的研究现状及几种常见的技术方法,并且分析探讨了语音信号处理技术的应用和发展前景。  相似文献   

语音驱动唇形动画的同步是人脸动画的难点之一。首先以音节为识别单位,通过严格的声韵母建模方法,利用HTK工具包,识别得到语音文件中的音节序列与时间信息;然后利用基本唇形库和音节到唇形映射表,获得与音节序列对应的唇形序列;利用唇形序列的时间信息插值播放唇形序列,实现语音驱动的唇形动画。实验表明,该方法不仅大大减少了模型数目,而且能准确识别音节序列以及时间信息,可有效地实现语音与唇动的同步。  相似文献   

This paper presents an HMM-MLP hybrid system for segmenting and recognizing complex date images written on Brazilian bank checks. Through the recognition process, the system makes use of an HMM-based approach to segment a date image into subfields. Then the three obligatory date subfields (day, month, and year) are processed. A neural approach has been adopted to decipher strings of digits (day and year) and a Markovian strategy to recognize and verify words (month). The final decision module makes an accept/reject decision. We also introduce the concept of metaclasses of digits to reduce the lexicon size of the day and year and improve the precision of their segmentation and recognition. Experiments show interesting results on date recognition.Received: 17 December 2002, Accepted: 16 July 2003, Published online: 17 November 2003Correspondence to: Marisa Morita  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号