期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network

Ke Shanfa Hu Ruimin Wang Xiaochen Wu Tingzhao Li Gang Wang Zhongyuan 《Multimedia Tools and Applications》2020,79(43-44):32225-32241

Multimedia Tools and Applications - The recently-proposed deep clustering-based algorithms represent a fundamental advance towards the single-channel multi-speaker speech sep- aration problem.... 相似文献

2.

基于清浊音分离的优化小波阈值去噪方法 总被引：2，自引：0，他引：2

下载免费PDF全文

张君昌刘红姜菲《计算机工程与应用》2009,45(31):130-133

结合小波阈值去噪和清浊音分离技术,提出了一种优化的语音去噪新方法。首先,针对语音清音部分往往包含有许多类似噪声的高频成分的特点,对其直接进行小波阈值去噪很可能误除了这些高频成分,造成失真,因此有必要先对语音进行清浊音分离。其次,通过对不同小波函数、阈值选取规则以及阈值处理函数的优化,选择最佳的小波去噪方法。仿真结果表明,与经典小波阈值去噪方法相比,提出的方法既尽可能地去除噪声,又保留了原来语音的特征,较大地提高了语音质量。相似文献

3.

Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

Mesgarani N. Slaney M. Shamma S.A. 《IEEE transactions on audio, speech, and language processing》2006,14(3):920-930

We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation. 相似文献

4.

Emotion modeling from speech signal based on wavelet packet transform

Varsha N. Degaonkar Shaila D. Apte 《International Journal of Speech Technology》2013,16(1):1-5

The recognition of emotion in human speech has gained increasing attention in recent years due to the wide variety of applications that benefit from such technology. Detecting emotion from speech can be viewed as a classification task. It consists of assigning, out of a fixed set, an emotion category e.g. happiness, anger, to a speech utterance. In this paper, we have tackled two emotions namely happiness and anger. The parameters extracted from speech signal depend on speaker, spoken word as well as emotion. To detect the emotion, we have kept the spoken utterance and the speaker constant and only the emotion is changed. Different features are extracted to identify the parameters responsible for emotion. Wavelet packet transform (WPT) is found to be emotion specific. We have performed the experiments using three methods. Method uses WPT and compares the number of coefficients greater than threshold in different bands. Second method uses energy ratios of different bands using WPT and compares the energy ratios in different bands. The third method is a conventional method using MFCC. The results obtained using WPT for angry, happy and neutral mode are 85 %, 65 % and 80 % respectively as compared to results obtained using MFCC i.e. 75 %, 45 % and 60 % respectively for the three emotions. Based on WPT features a model is proposed for emotion conversion namely neutral to angry and neutral to happy emotion. 相似文献

5.

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial,spectral and temporal modeling of sounds

Marc Delcroix Keisuke Kinoshita Tomohiro Nakatani Shoko Araki Atsunori Ogawa Takaaki Hori Shinji Watanabe Masakiyo Fujimoto Takuya Yoshioka Takanobu Oba Yotaro Kubo Mehrez Souden Seong-Jun Hahm Atsushi Nakamura 《Computer Speech and Language》2013,27(3):851-873

Research on noise robust speech recognition has mainly focused on dealing with relatively stationary noise that may differ from the noise conditions in most living environments. In this paper, we introduce a recognition system that can recognize speech in the presence of multiple rapidly time-varying noise sources as found in a typical family living room. To deal with such severe noise conditions, our recognition system exploits all available information about speech and noise; that is spatial (directional), spectral and temporal information. This is realized with a model-based speech enhancement pre-processor, which consists of two complementary elements, a multi-channel speech–noise separation method that exploits spatial and spectral information, followed by a single channel enhancement algorithm that uses the long-term temporal characteristics of speech obtained from clean speech examples. Moreover, to compensate for any mismatch that may remain between the enhanced speech and the acoustic model, our system employs an adaptation technique that combines conventional maximum likelihood linear regression with the dynamic adaptive compensation of the variance of the Gaussians of the acoustic model. Our proposed system approaches human performance levels by greatly improving the audible quality of speech and substantially improving the keyword recognition accuracy. 相似文献

6.

Feature determination for heart sounds based on divergence analysis

Zümray Dokur Tamer Ölmez 《Digital Signal Processing》2009,19(3):521-531

Heart auscultation (the interpretation of heart sounds by a physician) is a fundamental component of cardiac diagnosis. It is, however, a difficult skill to acquire. In decision making, it is important to analyze heart sounds by an algorithm to give support to medical doctors. In this study, two feature extraction methods are comparatively examined to represent different heart sound (HS) categories. First, a rectangular window is formed so that one period of HS is contained in this window. Then, the windowed time samples are normalized. Discrete wavelet transform is applied to this windowed one period of HS. Based on the wavelet detail coefficients at several bands, the time locations of S1–S2 sounds are determined by an adaptive peak detector. In the first feature extraction method, sub-bands belonging to the detail coefficients are partitioned into ten segments. Powers of the detail coefficients in each segment are computed. In the second feature extraction method, the power of the signal in a window which consists of 64 samples is computed without filtering the HSs. In the study, performances of these two feature extraction methods are comparatively examined by the divergence analysis. The analysis quantitatively measures the distribution of vectors in the feature space. 相似文献

7.

用Bark频谱投影识别低信噪比动物声音

下载免费PDF全文

黄鸿铿李应《智能系统学报》2018,13(4):610-618

复杂环境声影响低信噪比动物声音的自动识别。为解决这一问题,本文提出一种不同声场景下低信噪比动物声音识别的方法。该方法把声音信号进行Bark尺度的小波包分解,再使用分解系数生成重构信号的频谱,并对频谱进行投影生成Bark频谱投影特征,通过随机森林分类器实现低信噪比动物声音的识别。该文分别在流水声环境、公路环境、风声环境和嘈杂说话声环境下,以不同的信噪比,对40种动物声音进行识别实验。结果表明,结合短时谱估计法、Bark频谱投影特征和随机森林的方法对不同信噪比的各种环境声音中动物声音的平均识别率可以达到80.5%,且在-10 dB的情况下依然保持平均60%以上的识别率。相似文献

8.

PC based monitoring of human heart sounds

R.S. Anand^{Author Vitae} 《Computers & Electrical Engineering》2005,31(2):166-173

The paper presents an instrumentation system developed for monitoring of human heart sounds. A condenser microphone senses the heart sounds converting them into an equivalent electrical signal. The signal is suitably amplified and filtered in desired frequency band. This signal is converted to equivalent digital signal by an A/D conversion circuitry developed for the purpose. This digital signal is then fed to PC through the printer port (Syntronix). With the help of the supporting software, the signal for a specific duration is accessed and stored in the PC memory. This is further processed for its frequency contents. The system is simple and compact and does not require any external A/D card for the PC. It can be very useful for monitoring human heart sounds and interpreting the sounds on the basis of their time domain and frequency domain representation to diagnose heart disorders. 相似文献

9.

基于PSoC4的可视化心音采集系统

《电子技术应用》2016,(4)

相似文献

10.

基于HOPE-CTC的端到端语音识别

徐冬冬蒋志翔《计算机工程与设计》2021,42(2):462-467

为增强端到端语音识别模型的鲁棒性和特征提取的有效性,对瓶颈特征提取网络进行研究,提出采用基于联合优化正交投影和估计的端到端语音识别模型.通过连接时序分类损失函数训练瓶颈特征提取网络,摆脱对语言学和对齐信息的先验知识的依赖,在解码输出部分添加注意力机制,实现两种不同的端到端模型的融合.在中文数据集AISHELL-1上的实... 相似文献

11.

基于WaveNet的端到端语音合成方法

邱泽宇屈丹张连海《计算机应用》2019,39(5):1325-1329

针对端到端语音合成系统中Griffin-Lim算法恢复相位信息合成语音保真度较低、人工处理痕迹明显的问题，提出了一种基于WaveNet网络架构的端到端语音合成方法。以序列映射Seq2Seq结构为基础，首先将输入文本转化为one-hot向量，然后引入注意力机制获取梅尔声谱图，最后利用WaveNet后端处理网络重构语音信号的相位信息，从而将梅尔频谱特征逆变换为时域波形样本。实验的测试语料为LJSpeech-1.0和THchs-30，针对英语、汉语两个语种进行了实验，实验结果表明平均意见得分（MOS）分别为3.31、3.02，在合成自然度方面优于采用Griffin-Lim算法的端到端语音合成系统以及参数式语音合成系统。相似文献

12.

人脸语音动画中基于PSOLA的情感语音合成系统

王华樊养余《计算机应用研究》2012,29(3):1002-1004

提出一种基于时域基音同步叠加TD-PSOLA算法的情感语音合成系统。根据情感语音库分析总结情感规则,在此基础上利用TD-PSOLA算法对中性语音的韵律参数进行改变,并提出一种能够对基频曲线尾部形状改变的方法,使句子表达出丰富的情感。实验表明,合成出的语音具有明显的情感色彩,证明了该系统能以简单明了的方式实现情感语音的合成,有助于提高人脸语音动画表达的丰富性和生动性。相似文献

13.

基于能量检测的复杂环境下的鸟鸣识别

张小霞李应《计算机应用》2013,33(10):2945-2949

针对实际环境噪声使得鸟鸣识别准确率受到影响的问题,提出一种基于能量检测的抗噪鸟鸣识别方法。首先,对包含有噪声的鸟鸣信号用能量检测方法检测并筛选出有用鸟鸣信号;其次,根据梅尔尺度的分布,对有用鸟鸣信号提取小波包分解子带倒谱系数（WPSCC）特征;最后,用支持向量机（SVM）分类器分别对提取的小波包分解子带倒谱系数（WPSCC）和梅尔频率倒谱系数（MFCC）特征进行建模分类识别。同时还对比了在添加不同信噪比的噪声下15类鸟鸣在能量检测前后的识别性能差异。实验结果表明,提取的WPSCC特征具有较好的抗噪功能,且经过能量检测后的识别性能更佳,更适用于复杂环境下的鸟鸣识别相似文献

14.

基于语音识别的汉语发音自动评分系统的设计与实现 总被引：6，自引：0，他引：6

吕军曹效英《计算机工程与设计》2007,28(5):1232-1235

语音识别技术的发展使得人与计算机的交互成为可能,针对目前对外汉语中发音教学的不足,在结合了语音识别的相关原理,提出了在对外汉语教学领域中汉语自动发音水平评价系统的设计,详细地描述了系统的结构、功能及流程.介绍了系统实现中的关键技术和步骤:动态时间弯折算法、语料库的建立、声韵分割技术以及评价分级标准.通过小范围的试验,表明该系统对留学生汉语发音水平的测试有一定的参考价值. 相似文献

15.

Monaural speech separation based on MAXVQ and CASA for robust speech recognition 总被引：1，自引：0，他引：1

Peng Li Yong Guan Shijin Wang Bo Xu Wenju Liu 《Computer Speech and Language》2010,24(1):30-44

Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly. 相似文献

16.

Automatic speech recognition based on spectrogram reading

《International journal of man-machine studies》1986,24(6):611-621

相似文献

17.

一种基于帧跳跃的语音识别改进算法

黄翔宇张明《微型机与应用》2014,(7):38-40

引入帧跳跃的概念,从而改进了传统的端点检测算法和DTW算法,实现了一个改进的实时语音识别系统,并在计算机上进行了模拟仿真。实验结果表明,改进后的算法能有效提高孤立词的识别速度和识别精度。相似文献

18.

基于合成语音的计算安全隐写方法

下载免费PDF全文

李梦涵陈可江张卫明俞能海《网络与信息安全学报》2022,8(3):134-141

计算安全的隐写理论很早就被提出,但一直不能用于主流的以多媒体数据为载体的隐写术。原因在于计算安全隐写的前提是可以获得载体的精确分布或可以按照载体分布精确采样,而自然采集的图像、音/视频无法满足这个前提条件。近几年,随着深度学习的发展,多媒体生成技术逐渐成熟且在互联网上的应用越来越普遍,生成媒体成为合理的隐写载体,隐写者可以用正常的生成媒体掩盖秘密通信,即在媒体生成过程中隐写信息,并与正常的生成媒体不可区分。一些生成模型学到的分布是可知或可控的,这将为计算安全隐写推向实用提供契机。以当前广泛应用的合成语音模型为例,设计并实现了计算安全的对称密钥隐写算法,即在音频生成过程中,根据样本点的条件概率,按算术编码的译码过程将消息解压缩到合成音频中,消息接收方拥有相同的生成模型,通过复现音频合成过程完成消息提取。在该算法的基础上进一步设计了公钥隐写算法,为实现包括隐蔽密钥交换在内的全流程隐蔽通信提供了算法支撑,在保证隐写内容安全的同时,还可以实现隐写行为安全。理论分析显示,所提隐写算法的安全性由嵌入消息的随机性决定,隐写分析实验进一步验证了当前技术下攻击者无法区分合成的载体音频与载密音频。相似文献

19.

基于LV-AMDF的自适应基音检测算法 总被引：1，自引：0，他引：1

张康杰赵欢饶居华《计算机应用》2007,27(7):1674-1676,1679

根据语音信号的基音周期范围有限和周期相对稳定的特点,改进了可变长平均幅度差函数法(LV-AMDF),提出一种自适应幅度差法检测基音周期.它在语音非稳定段通过简单的谷值点评选机制,筛选当前谷值点以及历史谷值点,得到较精确的基音周期;在语音稳定段依据历史谷值点缩短语音段的比较范围,减少计算代价.还改进了浊音起止点检测算法,使浊音起止点的定位更精确.实验证明,该方法在不同的信噪比环境下有效地降低了半周期和倍周期点的发生率. 相似文献

20.

一种改进的基于EASI的语音分离算法

孟东霞马建芬乔永凤《计算机工程与应用》2007,43(33):214-216

独立分量分析是近年来发展起来的一门新的数字信号处理方法,因为不需要知道信号的先验信息而得到广泛应用。论文简单介绍了ICA的原理及EASI算法,并根据神经网络理论提出一种改进的EASI语音分离算法。相似文献