首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Automated audio segmentation and classification play important roles in multimedia content analysis. In this paper, we propose an enhanced approach, called the correlation intensive fuzzy c-means (CIFCM) algorithm, to audio segmentation and classification that is based on audio content analysis. While conventional methods work by considering the attributes of only the current frame or segment, the proposed CIFCM algorithm efficiently incorporates the influence of neighboring frames or segments in the audio stream. With this method, audio-cuts can be detected efficiently even when the signal contains audio effects such as fade-in, fade-out, and cross-fade. A number of audio features are analyzed in this paper to explore the differences between various types of audio data. The proposed CIFCM algorithm works by detecting the boundaries between different kinds of sounds and classifying them into clusters such as silence, speech, music, speech with music, and speech with noise. Our experimental results indicate that the proposed method outperforms the state-of-the-art FCM approach in terms of audio segmentation and classification.  相似文献   

Phonemes are the smallest distinguishable unit of speech signal. Segmentation of a phoneme from its word counterpart is a fundamental and crucial part in speech processing because an initial phoneme is used to activate words starting with that phoneme. This work describes an artificial neural network-based algorithm developed for segmentation and classification of consonant phoneme of the Assamese language. The algorithm uses weight vectors, obtained by training self-organising map (SOM) with different number of iterations, as a segment of different phonemes constituting the word whose linear prediction coefficients samples are used for training. The algorithm shows an abrupt rise in success rate than the conventional discrete wavelet-based speech segmentation. A two-class probabilistic neural network problem carried out with clean Assamese phoneme is used to identify phoneme segment. The classification of the phoneme segment is alone as per the consonant phoneme structure of the Assamese language which consists of six phoneme families. Experimental results establish the superiority of the SOM-based segmentation over the discrete wavelet transform-based approach.  相似文献   

目前说话人聚类时将说话人分割后的语音段作为初始类,直接对这些数量庞大语音段进行聚类的计算量非常大。为了降低说话人聚类时的计算量,提出一种面向说话人聚类的初始类生成方法。提取说话人分割后语音段的特征参数及特征参数的质心,结合层次聚类法和贝叶斯信息准则,对语音段进行具有宽松停止准则的“预聚类”,生成初始类。与直接对说话人分割后的语音段进行聚类的方法相比,该方法能在保持原有聚类性能的情况下,减少40.04%的计算时间;在允许聚类性能略有下降的情形下,减少60.03%以上的计算时间。  相似文献   

This article focuses on the systematic design of a segment database which has been used to support a time-domain speech synthesis system for the Greek language. Thus, a methodology is presented for the generation of a corpus containing all possible instances of the segments for the specific language. Issues such as the phonetic coverage, the sentence selection and iterative evaluation techniques employing custom-built tools, are examined. Emphasis is placed on the comparison of the process-derived corpus to naturally-occurring corpora with respect to their suitability for use in time-domain speech synthesis. The proposed methodology generates a corpus characterised by a near-minimal size and which provides a complete coverage of the Greek language. Furthermore, within this corpus, the distribution of segmental units is similar to that of natural corpora, allowing for the extraction of multiple units in the case of the most frequently-occurring segments. The corpus creation algorithm incorporates mechanisms that enable the fine-tuning of the segment database's language-dependent characteristics and thus assists in the generation of high-quality text-to-speech synthesis.  相似文献   

韩明  李磊民  黄玉清 《计算机应用》2010,30(12):3278-3280
针对粘连或重叠颗粒图像的分割问题,提出了一种基于特征模糊推理的局部形态学重构参数计算方法,对传统的距离变换结合分水岭的算法进行了改进。在传统距离变换结合分水岭方法的基础上,将颗粒图像划分成若干连通区域,每个连通区域单独处理,使用形态学局部重构的方法抑制分水岭的过分割现象。通过对距离图像连通区域极大值进行统计分析,提取该连通区域的颗粒形态特征。将颗粒形态特征作为模糊输入,重构参数特征作为模糊输出,使用模糊推理方法自适应地计算重构参数,解决了重构参数选取的不确定性问题。最后对重构图像进行分水岭变换得到颗粒分割图像。实验结果表明,该方法对各种粘连状态的颗粒分割效果良好,克服了传统方法的过分割与参数自适应选择的问题。  相似文献   

兰蓉  赵强 《控制与决策》2020,35(10):2345-2362
针对抑制式模糊C-均值聚类算法应用于灰度图像分割时出现收敛速度较慢和像素误判的问题,通过挖掘图像同质区域内像素间的相关性与分析像素位置对类别判定的影响,提出一种双中心组合迭代抑制式模糊C-均值聚类图像分割算法.首先在图像上经选点、扩展、提取等环节优选出较好的初始聚类中心;然后按该中心分别查找图像中灰度值与其相等的像素位置并遴选产生隐藏中心;其次采用负指数函数对像素位置与隐藏中心之间的欧氏距离进行归一化,得到位置特征;接着在对该特征赋权后直接修正模糊划分矩阵;最后结合抑制式思想进一步减少算法的迭代次数.与现有的多种相关算法进行对比,实验结果表明,所提出算法在获得致密且分离性较好聚类的同时,能够改善图像分割的准确率和执行效率.  相似文献   

设计了一种腭裂语音的声韵母切分算法。通过主观的波形测试和客观的F检验及t检验,证明了腭裂语音与正常语音具有显著性差异。定义声母具有清音音素特性的音节为I类音节,声母具有浊音音素特性的音节为II类音节。首先基于层次聚类模型自动判别I类、II类音节,然后定义类浊音权重函数和类清音概率函数,实现I类音节的声韵母一级切分,再通过短时自相关函数峰值个数的一阶微分实现I类音节声韵母的二级切分。基于声韵母波形差异性,检测短时自相关函数的能量跳变点,实现II类音节的声韵母切分。通过大样本实验,结果表明提出的腭裂语音声韵母自动判别算法具有较高的正确率,I类音节的正确率达到90.72%,II类音节的正确率为92.90%。  相似文献   

针对三维水声数据背景复杂、受噪声干扰严重等特点,提出一种结合三维FMF的HFCM水声数据分割算法,以提高水声数据分割的精度和效率。该算法首先选取三维滤波窗口,利用最大熵阈值法计算出模糊阈值;再结合半高斯模糊隶属度函数对水声数据进行模糊中值滤波;最后采用HFCM算法对滤波后的数据进行分割。对两组不同的三维水声数据进行分割处理的结果表明该算法能够有效地降低噪声干扰,分割效果要优于未滤波的HFCM以及均衡FMF的HFCM分割算法,并且在分割效率上要明显优于传统的模糊C均值算法。  相似文献   

目的 针对现有区域合并和图割的结合算法没有考虑矿岩图像模糊特性,导致分割精度和运行效率较低,模糊边缘无法有效分割的问题,利用快速递推计算的最大模糊2-划熵信息设置以区域为顶点的图割模型似然能来解决。方法 首先利用双边滤波器和分水岭算法对矿岩图像进行预处理,并将其划分为若干一致性较好的区域;然后利用图像在计算最大模糊2-划分熵时,目标和背景的模糊隶属度函数来设计图割能量函数似然能,使得能量函数更接近模糊图像的真实情况,期间为了提高最大模糊2-划分熵值的搜索效率,提出了时间复杂度为O(n2)的递推算法将模糊熵的计算转化为递推过程,并保留不重复的递推结果用于后续的穷举搜索;最后利用设计的图割算法对区域进行标号,以完成分割。结果 本文算法的分割精度较其他区域合并和图割结合算法提高了约23%,分割后矿岩颗粒个数的统计结果相对于人工统计结果,其误差率约为2%,运行时间较其他算法缩短了约60%。结论 本文算法确保精度同时,有效提高矿岩图像的分割效率,为自动化矿岩图像高效分割的工程实践提供重要指导依据。  相似文献   

提出了一种基于对偶树复小波变换的模糊纹理图像分割算法,该方法包括纹理特征提取和纹理分类两个阶段,其中,特征提取在对偶树复小波变换的基础上进行;纹理分类可以直接用模糊C均值算法进行聚类从而完成纹理的分割,但由于该算法中隶属度函数是基于样本到类中心的距离设计的,这对非球形分布数据很不合理,针对该问题,引入样本与样本的紧致度来度量类中各个样本之间的关系从而修正隶属度函数,并将其用于纹理分类。实验结果表明与模糊C均值算法在运行时间上相差不大的情况下,改进的方法在分割精度、边缘准确性和区域一致性上都得到了明显的改善。  相似文献   

简要地介绍了用于语音分析合成的时城基音同步叠加算法,在此基础上提出一种汉语语音时域声调转换方法。利用这种方法可以将一种声调的语音转换为另一种声调的语音,除微小的音质降低外,仍可保持较好的语音质量。这种方法直接对语音波形进行处理,具有计算简单、能在一般微型计算机上进行实时的特点。将之用于语音合成系统,可以通过相同声韵母的音节只存储一种声调的语音数据而大大降低音库的容量;用这种方法按照汉语语句的语调变化规律来合成语句,还可以较好地改善汉语语句合成的自然度.  相似文献   

高空间分辨率(简称高分辨率)遥感影像除光谱特征外,还包含丰富的纹理特征,为了实现高分辨率遥感影像的高精度分割,提出结合多特征和模糊偏好关系的分割方法.首先,通过像素光谱测度定义多种统计特征,根据定义的各个特征提取特征影像并分别实现影像分割,利用其结果构建模糊决策矩阵;然后,基于像素定义特征间的模糊偏好关系矩阵,计算不同特征对最终分割决策的权重,并对模糊决策矩阵加权以突出优势特征,抑制劣势特征;最后,通过反模糊化决策矩阵得到最优影像分割结果.对合成影像和真实高分辨率遥感影像的分割结果进行定性和定量评价,结果表明,合成影像的分割总精度为99.8%,Kappa值为0.998,说明所提出的算法通过结合各特征的优势部分能够获得高精度的分割结果.  相似文献   

模糊B样条基神经网络磁共振图像分割方法   总被引:1,自引:0,他引:1  
针对磁共振图像分割的特点,提出了一种基于模糊B样条基神经网络的磁共振图像分割方法。该方法采用B样条基函数作为模糊隶属函数,利用神经网络实现模糊推理,并采用反向误差传播算法对网络进行训练。实验结果表明,这种基于模糊B样条基神经网络的磁共振图像分割方法与普通神经网络分割方法相比,具有更高的分割精度和更快的训练收敛速度。  相似文献   

田元  王乘  管涛 《图学学报》2010,31(2):123
为了提高在前景和背景颜色相似情况下图像的分割效果,提出了一种基于模糊C均值聚类(FCM)和图割的交互式图像分割方法。首先,利用分水岭算法对图像进行预处理,将图像分成多个小区域,用区域代替像素点进行分析。然后,采用模糊C均值算法对用户标记的前景区域和背景区域分别进行聚类分析,挖掘用户交互所提供的隐藏信息。用未标记区域的颜色分量到前景区域及背景区域类心的最小距离表示相似能量,用未标记区域与其相邻区域的相关性表示先验能量。最后,利用最大流/最小割算法求能量函数的全局最优解。与其他方法相比,该文方法具有较好的分割性能,能从前景背景相似的图像中较精确地提取感兴趣的物体,且用户操作简单。  相似文献   

There are many speech and language processing problems which require cascaded classification tasks. While model adaptation has been shown to be useful in isolated speech and language processing tasks, it is not clear what constitutes system adaptation for such complex systems. This paper studies the following questions: In cases where a sequence of classification tasks is employed, how important is to adapt the earlier or latter systems? Is the performance improvement obtained in the earlier stages via adaptation carried on to later stages in cases where the later stages perform adaptation using similar data and/or methods? In this study, as part of a larger scale multiparty meeting understanding system, we analyze various methods for adapting dialog act segmentation and tagging models trained on conversational telephone speech (CTS) to meeting style conversations. We investigate the effect of using adapted and unadapted models for dialog act segmentation with those of tagging, showing the effect of model adaptation for cascaded classification tasks. Our results indicate that we can achieve significantly better dialog act segmentation and tagging by adapting the out-of-domain models, especially when the amount of in-domain data is limited. Experimental results show that it is more effective to adapt the models in the latter classification tasks, in our case dialog act tagging, when dealing with a sequence of cascaded classification tasks.  相似文献   

标准模糊C均值聚类算法由于没有考虑任何与图像空间连续性有关的信息,对噪声高度敏感,针对这一问题,提出一种基于图像空间信息的FCM聚类分割算法。该算法将图像像素的空间信息引入到相似性度量和隶属度函数中,其中空间信息由像素的相对位置和邻域内像素的特征决定。实验结果证明,该方法能有效地对含有一定噪声的图像进行分割,具有较好的抗噪性能。  相似文献   

We propose a novel clustering algorithm using fast global kernel fuzzy c-means-F(FGKFCM-F), where F refers to kernelized feature space. This algorithm proceeds in an incremental way to derive the near-optimal solution by solving all intermediate problems using kernel-based fuzzy c-means-F(KFCM-F) as a local search procedure. Due to the incremental nature and the nonlinear properties inherited from KFCM-F, this algorithm overcomes the two shortcomings of fuzzy c-means(FCM): sen- sitivity to initialization and inability to use nonlinear separable data. An accelerating scheme is developed to reduce the compu-tational complexity without significantly affecting the solution quality. Experiments are carried out to test the proposed algorithm on a nonlinear artificial dataset and a real-world dataset of speech signals for consonant/vowel segmentation. Simulation results demonstrate the effectiveness of the proposed algorithm in improving clustering performance on both types of datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号