期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Improved binaural sound localization and tracking for unknown time-varying number of speakers

Ui-Hyun Kim Hiroshi G. Okuno 《Advanced Robotics》2013,27(15):1161-1173

A method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) has been developed for binaural sound source localization (SSL) and tracking of multiple sound sources. Accurate binaural audition is important for applying inexpensive and widely applicable auditory capabilities to robots and systems. Conventional SSL based on the GCC-PHAT method is degraded by low resolution of the time difference of arrival estimation, by the interference created when the sound waves arrive at a microphone from two directions around the robot head, and by impaired performance when there are multiple speakers. The low-resolution problem is solved by using a maximum-likelihood-based SSL method in the frequency domain. The multipath interference problem is avoided by incorporating a new time delay factor into the GCC-PHAT method with assuming a spherical robot head. The performance when there are multiple speakers was improved by using a multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering. The standard K-means clustering algorithm was extended to enable tracking of an unknown time-varying number of speakers by adding two additional steps that increase the number of clusters automatically and eliminate clusters containing incorrect direction estimations. Experiments conducted on the SIG-2 humanoid robot show that this method outperforms the conventional SSL method; it reduces localization errors by 18.1° on average and by over 37° in the side directions. It also tracks multiple speakers in real time with tracking errors below 4.35°. 相似文献

2.

基于MUSIC/MNM谱估计的鲁棒语音特征提取

张毅汪培培罗元《信息与控制》2016,45(3):355-360

针对语音识别系统受噪声干扰识别率急剧下降的问题,通过分析传统的鲁棒语音特征提取方法在语音信号谱估计方面的不足,提出一种在不同信噪比下都具有较好鲁棒性和识别性能的语音特征提取算法．该算法结合多信号分类法（MUSIC）和最小模法（minimum-norm method,MNM）来进行谱估计．接着在移动机器人平台上进行验证实验,结果表明：该算法能有效的提高语音识别率,增强语音识别鲁棒性能．相似文献

3.

Target Speech Detection and Separation for Communication with Humanoid Robots in Noisy Home Environments

《Advanced Robotics》2013,27(15):2093-2111

People usually talk face to face when they communicate with their partner. Therefore, in robot audition, the recognition of the front talker is critical for smooth interactions. This paper presents an enhanced speech detection method for a humanoid robot that can separate and recognize speech signals originating from the front even in noisy home environments. The robot audition system consists of a new type of voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method and a maximum signal-to-noise ratio (SNR) beamformer. This VAD based on CSCC can classify speech signals that are retrieved at the frontal region of two microphones embedded on the robot. The system works in real-time without needing training filter coefficients given in advance even in a noisy environment (SNR > 0 dB). It can cope with speech noise generated from televisions and audio devices that does not originate from the center. Experiments using a humanoid robot, SIG2, with two microphones showed that our system enhanced extracted target speech signals more than 12 dB (SNR) and the success rate of automatic speech recognition for Japanese words was increased by about 17 points. 相似文献

4.

一种鲁棒的基于小波变换的语音参数提取算法 *

吴翔唐焕华刘锦高《计算机应用研究》2008,25(10):2984-2986

提出了一种基于小波变换的新型语音参数提取算法 ,提高语音识别系统对环境噪声的鲁棒性。由于引入了多分辨率小波分析技术 ,识别既在高频提供高的频率分辨又在低频提供高的时间分辨率。这样 ,提出的改进算法在语音词汇的识别更准确的同时 ,还大大简化了计算。将该算法和传统提取 MFCC系数的算法进行了比较,实验结果表明 ,利用小波计算语音特征具有更优的性能。相似文献

5.

A strategic approach to recognize the speech of the children with hearing impairment: different sets of features and models

Arunachalam Revathi 《Multimedia Tools and Applications》2019,78(15):20787-20808

The automatic speech recognition system is developed and tested for recognizing the speeches of a normal person in various languages. This paper mainly emphasizes the need for the development of a more challenging speaker independent speech recognition system for hearing impaired to recognize the speeches uttered by any Hearing Impaired (HI) speaker. In this work, Gamma tone energy features with filters spaced an equivalent rectangular bandwidth (ERB), MEL & BARK scale, and MFPLPC features are used at the front end and vector quantization (VQ) & multivariate hidden Markov models (MHMM) at the back end for recognizing the speeches uttered by any hearing impaired speaker. Performance of the system is compared for the three modeling techniques VQ, FCM (Fuzzy C means) clustering and MHMM for the recognition of isolated digits and simple continuous sentences in Tamil. Recognition accuracy (RA) is 81.5% with speeches of eight speakers considered for training and speeches of the remaining two speakers considered for testing for speaker independent isolated digit recognition system. Accuracy is found to be 91% and 87.5% for considering 90% of the data for training and 10% for testing for speaker independent isolated digit and continuous speech recognition systems respectively. Accuracy can be further enhanced by having an extensive database for creating models/templates. Receiver operating characteristics (ROC) drawn between True Positive Rate and False Positive Rate is used to assess the performance of the system for HI. This system can be utilized to understand the speech uttered by any hearing impaired speaker and the system facilitates the provision of necessary assistance to them. It ultimately improves the social status of the hearing impaired people and their confidence level will be enhanced.

相似文献

6.

机器人听觉声源定位研究综述 总被引：4，自引：0，他引：4

李晓飞刘宏《智能系统学报》2012,7(1):9-20

声源定位技术定位出外界声源相对于机器人的方向和位置,机器人听觉声源定位系统可以极大地提高机器人与外界交互的能力.总结和分析面向机器人听觉的声源定位技术对智能机器人技术的发展有着重要的意义.首先总结了面向机器人听觉的声源定位系统的特点,综述了机器人听觉声源定位的关键技术,包括到达时间差、可控波束形成、高分辨率谱估计、双耳听觉、主动听觉和视听融合技术.其次对麦克风阵列模型进行了分类,比较了基于三维麦克风阵列、二维麦克风阵列和双耳的7个典型系统的性能.最后总结了机器人听觉声源定位系统的应用,并分析了存在的问题和未来的发展趋势. 相似文献

7.

基于子带GMM-UBM的广播语音多语种识别 总被引：2，自引：0，他引：2

李思一戴蓓蒨王海祥《数据采集与处理》2007,22(1):14-18

提出了一种基于概率统计模型的与语言内容无关的语种识别方法,它不需要掌握各语种的专业语言学知识就可以实现几十种语言的语种识别;并针对广播语音噪声干扰大的特点,采用GMM-UBM模型作为语种模型,提高了系统的噪声鲁棒性;由于广播语音的背景噪声不是简单的全频带加性白噪声,因此本文构建了一种基于子带GMM-UBM模型的多子系统结构的语种识别系统,后端采用神经网络进行系统级融合。本文通过对37种语言及方言的识别实验,证明了子带GMM-UBM方法的有效性。相似文献

8.

嵌入式语音识别系统设计

何燕玲马建国《微计算机信息》2007,23(35):29-30,19

分析嵌入式语音识别系统设计的要点，提出了一种基于凌阳SPCE061A单片机的嵌入式特定人语音识别系统，重点讨论了嵌入式语音识别系统的相关算法及系统组成。该系统识别率高，价格低廉，可移植性好，已成功应用于智能机器人控制平台。相似文献

9.

Speech Analysis in a Model of the Central Auditory System

Woojay Jeon Juang B.-H. 《IEEE transactions on audio, speech, and language processing》2007,15(6):1802-1817

Recently, there is a significant increase in research interest in the area of biologically inspired systems, which, in the context of speech communications, attempt to learn from human's auditory perception and cognition capabilities so as to derive the knowledge and benefits currently unavailable in practice. One particular pursuit is to understand why the human auditory system generally performs with much more robustness than an engineering system, say a state-of-the-art automatic speech recognizer. In this study, we adopt a computational model of the mammalian central auditory system and develop a methodology to analyze and interpret its behavior for an enhanced understanding of its end product, which is a data-redundant, dimension-expanded representation of neural firing rates in the primary auditory cortex (A1). Our first approach is to reinterpret the well-known Mel-frequency cepstral coefficients (MFCCs) in the context of the auditory model. We then present a framework for interpreting the cortical response as a place-coding of speech information, and identify some key advantages of the model's dimension expansion. The framework consists of a model of ldquosourcerdquo-invariance that predicts how speech information is encoded in a class-dependent manner, and a model of ldquoenvironmentrdquo-invariance that predicts the noise-robustness of class-dependent signal-respondent neurons. The validity of these ideas are experimentally assessed under existing recognition framework by selecting features that demonstrate their effects and applying them in a conventional phoneme classification task. The results are quantitatively and qualitatively discussed, and our insights inspire future research on category-dependent features and speech classification using the auditory model. 相似文献

10.

基于最大信噪比的盲源分离算法 总被引：6，自引：0，他引：6

张小兵马建仓陈翠华刘恒《计算机仿真》2006,23(10):72-75

提出一种新的低计算复杂度的瞬时线性混叠信号的盲分离算法,该算法利用统计独立信号完全分离时信噪比量大作为分离准则。源信号用估计信号的滑动平均代替,把源信号和噪声信号协方差矩阵的函数表示成广义特征值问题,通过广义特征值问题求解分离矩阵不需要任何迭代运算。和典型的信息理论方法相比,该算法的优点是具有非常低的计算复杂度。计算机模拟实验证明,该算法能够分离线性混合的超高斯和亚高斯源信号,并且可以有效地分离语音信号。相似文献

11.

Efficient blind dereverberation and echo cancellation based on independent component analysis for actual acoustic signals

Takeda R Nakadai K Takahashi T Komatani K Ogata T Okuno HG 《Neural computation》2012,24(1):234-272

This letter presents a new algorithm for blind dereverberation and echo cancellation based on independent component analysis (ICA) for actual acoustic signals. We focus on frequency domain ICA (FD-ICA) because its computational cost and speed of learning convergence are sufficiently reasonable for practical applications such as hands-free speech recognition. In applying conventional FD-ICA as a preprocessing of automatic speech recognition in noisy environments, one of the most critical problems is how to cope with reverberations. To extract a clean signal from the reverberant observation, we model the separation process in the short-time Fourier transform domain and apply the multiple input/output inverse-filtering theorem (MINT) to the FD-ICA separation model. A naive implementation of this method is computationally expensive, because its time complexity is the second order of reverberation time. Therefore, the main issue in dereverberation is to reduce the high computational cost of ICA. In this letter, we reduce the computational complexity to the linear order of the reverberation time by using two techniques: (1) a separation model based on the independence of delayed observed signals with MINT and (2) spatial sphering for preprocessing. Experiments show that the computational cost grows in proportion to the linear order of the reverberation time and that our method improves the word correctness of automatic speech recognition by 10 to 20 points in a RT??= 670 ms reverberant environment. 相似文献

12.

说话人识别中测试时长与识别率关系研究

孙林慧叶蕾杨震《计算机仿真》2005,22(5):231-234

测试时长是影响说话人识别问题的主要因素之一。该文主要对分布式语音识别中测试时长与说话人识别率的关系进行了研究。文中采用文本无关的训练模板,首先对基本的说话人辨认系统用干净语音和带噪语音进行了测试,结果表明系统识别率随测试时长的增加而提高,并在实验室条件下获得加噪语音最佳测试时长。其次为了减小最佳测试时长采用改进的说话人辨认系统,先对说话人的性别进行分类然后再对其身份进行识别,不仅减少了测试所需的最佳时长,而且提高了系统的抗噪性能。最后对仿真结果进行了分析。相似文献

13.

一种新型的嵌入式语音识别机器人系统 总被引：1，自引：1，他引：0

贾晶华晶《电脑编程技巧与维护》2008,(17)

本文探讨和研究了一种新型的基于嵌入式系统以及DSP的语音识别工业机器人系统。系统采用嵌入式 DSP的方案使机器人的性能、成本、可配置性和可扩展性达到一个更佳的平衡点,同时在语音识别方面采用了改进的MFCC方法进行语音特征提取以及采用基于K均值分段的HMM模型进行实时语音学习与识别使算法的实时性和可移植性提高。相似文献

14.

关于网络语音的自动语言辨识系统研究

王洪海刘刚郭军《电脑与信息技术》2007,15(1):3-6

文章对从网络上采集的英语、德语、日语、法语、西班牙语等5个语种的语音和现有的汉语语音进行了自动语言辨识的研究,利用RASTA-PLP特征参数和贪婪期望最大算法为每个语种建立了高斯混合模型,并用多个说话人的语音进行了开集测试,讨论了网络语音和非网络语音对识别结果的影响,以及识别率与训练数据和GMM模型阶数的关系.实验结果表明,经过改进的基于声学特征的方法可以有效地应用到网络语音的自动语言辨识系统中. 相似文献

15.

基于Julius的机器人语音识别系统构建

付维刘冬闵华松《单片机与嵌入式系统应用》2011,11(8):41-44

随着机器人技术不断发展,本文提出机器人的语音识别这一智能人机交互方式。在研究了基于HMM语音识别基本原理的情况下,在实验室的机器人平台上,利用HTK和Julius开源平台,构建了一个孤立词的语音识别系统。利用该语音识别系统可以提取语音命令用于机器人的控制。相似文献

16.

矢量水听器阵波束域MUSIC算法研究 总被引：2，自引：0，他引：2

下载免费PDF全文

王绪虎陈万平田坦《计算机工程与应用》2009,45(15):147-149

矢量水听器可同时拾取声压和振速信息,成阵后水听器间的相移信息量增大。基于矢量水听器阵的波束形成性能明显由于同条件下的声压水听器阵,但其空间分辨力依然受阵列物理空间的限制。已经有人研究了矢量水听器阵的高分辨谱估计方法（MUSIC算法）,但属于对阵元域信号进行的直接处理,运算量较大。提出一种基于矢量水听器阵的波束域MUSIC算法（BMUSIC）。该算法首先将矢量水听器阵元的空间数据转换到波束空间,然后对转换后的数据再运用MUSIC算法。不但实现了降维处理,减小了运算量,而且可进一步抑制扫描扇面外的噪声。对BMUSIC算法进行了仿真并与常规MUSIC算法进行了比较。结果表明,该方法可得到与阵元域MUSIC算法相当的方位分辨力。相似文献

17.

Statistical multimodal integration for audio-visual speech processing

Nakamura S. 《Neural Networks, IEEE Transactions on》2002,13(4):854-866

Sensory information is indispensable for living things. It is also important for living things to integrate multiple types of senses to understand their surroundings. In human communications, human beings must further integrate the multimodal senses of audition and vision to understand intention. In this paper, we describe speech related modalities since speech is the most important media to transmit human intention. To date, there have been a lot of studies concerning technologies in speech communications, but performance levels still have room for improvement. For instance, although speech recognition has achieved remarkable progress, the speech recognition performance still seriously degrades in acoustically adverse environments. On the other hand, perceptual research has proved the existence of the complementary integration of audio speech and visual face movements in human perception mechanisms. Such research has stimulated attempts to apply visual face information to speech recognition and synthesis. This paper introduces works on audio-visual speech recognition, speech to lip movement mapping for audio-visual speech synthesis, and audio-visual speech translation. 相似文献

18.

Auditory learning: a developmental method

Yilu Zhang Juyang Weng Wey-Shiuan Hwang 《Neural Networks, IEEE Transactions on》2005,16(3):601-616

Motivated by the human autonomous development process from infancy to adulthood, we have built a robot that develops its cognitive and behavioral skills through real-time interactions with the environment. We call such a robot a developmental robot. In this paper, we present the theory and the architecture to implement a developmental robot and discuss the related techniques that address an array of challenging technical issues. As an application, experimental results on a real robot, self-organizing, autonomous, incremental learner (SAIL), are presented with emphasis on its audition perception and audition-related action generation. In particular, the SAIL robot conducts the auditory learning from unsegmented and unlabeled speech streams without any prior knowledge about the auditory signals, such as the designated language or the phoneme models. Neither available before learning starts are the actions that the robot is expected to perform. SAIL learns the auditory commands and the desired actions from physical contacts with the environment including the trainers. 相似文献

19.

一种新的工业机器人语音增强方法设计

贾晶李英《电脑开发与应用》2012,25(2):40-42,46

分析和研究了基于声波耦合和语音增强模块级联的语音增强方法的工业语音识别系统设计和实施过程,并对其进行了算法建模,同时在比较谱减法和MMSE-LSA的语音增强算法的同时进行了实验数据分析,使工业机器人语音识别系统在噪声环境下提高了识别率。相似文献

20.

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Revathi Arunachalam Nagakrishnan R. Sasikaladevi N. 《Multimedia Tools and Applications》2022,81(22):31245-31259

Research on recognizing the speeches of normal speakers is generally in practice for numerous years. Nevertheless, a complete system for recognizing the speeches of persons with a speech impairment is still under development. In this work, an isolated digit recognition system is developed to recognize the speeches of speech-impaired people affected with dysarthria. Since the speeches uttered by the dysarthric speakers are exhibiting erratic behavior, developing a robust speech recognition system would become more challenging. Even manual recognition of their speeches would become futile. This work analyzes the use of multiple features and speech enhancement techniques in implementing a cluster-based speech recognition system for dysarthric speakers. Speech enhancement techniques are used to improve speech intelligibility or reduce the distortion level of their speeches. The system is evaluated using Gamma-tone energy (GFE) features with filters calibrated in different non-linear frequency scales, stock well features, modified group delay cepstrum (MGDFC), speech enhancement techniques, and VQ based classifier. Decision level fusion of all features and speech enhancement techniques has yielded a 4% word error rate (WER) for the speaker with 6% speech intelligibility. Experimental evaluation has provided better results than the subjective assessment of the speeches uttered by dysarthric speakers. The system is also evaluated for the dysarthric speaker with 95% speech intelligibility. WER is 0% for all the digits for the decision level fusion of speech enhancement techniques and GFE features. This system can be utilized as an assistive tool by caretakers of people affected with dysarthria.

相似文献