首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 25 毫秒
1.
Simulations of the human hearing system can help in a number of research fields; including work with the speech and hearing impaired as well as improving the accuracy of speech recognition systems and the naturalness of speech synthesis technology. The results from psychoacoustic experiments carried out over the last few decades have enabled models of the human peripheral hearing system to be developed. Conventionally, analyses such as the Fast Fourier Transform are used to analyze speech and other sounds to establish the acoustic cues which are important for human perception. Such analyses can be shown to be inappropriate in a number of ways. Additional insights could be gained into the importance of various acoustic cues if analyses based on hearing models were used. This paper describes an implementation of a real-time spectrograph based on a contemporary model of the peripheral human hearing system, executing on a network of T9000 transputers. The differences between it and conventional spectrographs are illustrated by means of test signals and speech sounds.  相似文献   

2.
The objective of this study was to achieve the following goals: identifying the initial consonant sounds in Mandarin Chinese that can be selectively lost by aging people and proposing a filtering method to improve the hearing performance of the aging Chinese population. Past literatures reported that high-frequency hearing loss could produce difficulties for the characters like f, s, t and z in English. It was found in this study that the high-frequency initial consonants in Mandarin Chinese including t, f, s, p, q (as ch in chance), ch (no corresponding pronunciation in English), sh (as sh in sheep) and zh (no corresponding pronunciation in English). Furthermore, a filtering method to enhance the hearing performance of aging Chinese people was proposed which filtered the higher spectral components and boosted the lower spectral components for the older male Chinese, and filtered the lower spectral components and boosted the higher spectral components for the older female Chinese. Thirty participants whose native language is Mandarin Chinese voluntarily took part in the experiment. The result indicated that the above-mentioned eight initial consonants were associated with a higher error rate of speech recognition than the other ones in Chinese. At the same time, the proposed method successfully reduced the error rate of speech recognition, which achieved the goal to improve the speech recognition ability of older Chinese.

Relevance for Industry

The results of this study will benefit information technology companies that wish to develop software or information services for older Chinese. In addition, by applying the filtering method proposed in this study, the performance of hearing aids is also expected to be improved.  相似文献   


3.
基于清浊音分离的优化小波阈值去噪方法   总被引:2,自引:0,他引:2       下载免费PDF全文
结合小波阈值去噪和清浊音分离技术,提出了一种优化的语音去噪新方法。首先,针对语音清音部分往往包含有许多类似噪声的高频成分的特点,对其直接进行小波阈值去噪很可能误除了这些高频成分,造成失真,因此有必要先对语音进行清浊音分离。其次,通过对不同小波函数、阈值选取规则以及阈值处理函数的优化,选择最佳的小波去噪方法。仿真结果表明,与经典小波阈值去噪方法相比,提出的方法既尽可能地去除噪声,又保留了原来语音的特征,较大地提高了语音质量。  相似文献   

4.
Quality and intelligibility of narrowband telephone speech can be improved by artificial bandwidth extension (ABE), which extends the speech bandwidth using only information available in the narrowband speech signal. This paper reports a three-language evaluation of an ABE method that has recently been launched in several of Nokia's mobile telephone models. The method extends the speech bandwidth to frequencies above the telephone band by first utilizing spectral folding and then modifying the magnitude spectrum of the extension band with spline curves. The performance of the method was evaluated by formal listening tests in American English, Russian, and Mandarin Chinese. The results of the listening tests indicate that ABE processing improved the subjective quality of coded narrowband speech in all these languages. Differences between bandwidth-extended American English test sentences and their original wideband counterparts were also evaluated using both an objective distance measure that simulates the characteristics of human hearing and a conventional spectral distortion measure. The average objective error was calculated for different categories of speech sounds. The error was found to be smallest in nasals and semivowels and largest in fricative sounds.   相似文献   

5.
The development of an audiovisual pronunciation teaching and training method and software system is discussed in this article. The method is designed to help children with speech and hearing disorders gain better control over their speech production. The teaching method is drawn up for progression from individual sound preparation to practice of sounds in sentences for four languages: English, Swedish, Slovenian, and Hungarian. The system is a general language-independent measuring tool and database editor. This database editor makes it possible to construct modules for all participant languages and for different sound groups. Two modules are under development for the system in all languages: one for teaching and training vowels to hearing-impaired children and the other for correction of misarticulated fricative sounds. In the article we present the measuring methods, the used distance score calculations of the visualized speech spectra, and problems in the evaluation of the new multimedia tool.  相似文献   

6.
Machine hearing is an emerging research field that is analogous to machine vision in that it aims to equip computers with the ability to hear and recognise a variety of sounds. It is a key enabler of natural human–computer speech interfacing, as well as in areas such as automated security surveillance, environmental monitoring, smart homes/buildings/cities. Recent advances in machine learning allow current systems to accurately recognise a diverse range of sounds under controlled conditions. However doing so in real-world noisy conditions remains a challenging task. Several front–end feature extraction methods have been used for machine hearing, employing speech recognition features like MFCC and PLP, as well as image-like features such as AIM and SIF. The best choice of feature is found to be dependent upon the noise environment and machine learning techniques used. Machine learning methods such as deep neural networks have been shown capable of inferring discriminative classification rules from less structured front–end features in related domains. In the machine hearing field, spectrogram image features have recently shown good performance for noise-corrupted classification using deep neural networks. However there are many methods of extracting features from spectrograms. This paper explores a novel data-driven feature extraction method that uses variance-based criteria to define spectral pooling of features from spectrograms. The proposed method, based on maximising the pooled spectral variance of foreground and background sound models, is shown to achieve very good performance for robust classification.  相似文献   

7.
A method is described for estimating the fundamental frequencies of several concurrent sounds in polyphonic music and multiple-speaker speech signals. The method consists of a computational model of the human auditory periphery, followed by a periodicity analysis mechanism where fundamental frequencies are iteratively detected and canceled from the mixture signal. The auditory model needs to be computed only once, and a computationally efficient strategy is proposed for implementing it. Simulation experiments were made using mixtures of musical sounds and mixed speech utterances. The proposed method outperformed two reference methods in the evaluations and showed a high level of robustness in processing signals where important parts of the audible spectrum were deleted to simulate bandlimited interference. Different system configurations were studied to identify the conditions where pitch analysis using an auditory model is advantageous over conventional time or frequency domain approaches.  相似文献   

8.

For traditional broadcasting formats, implementation of accessible audio strategies for hard of hearing people have used a binary, intelligibility-based approach. In this approach, sounds are categorized either as speech, contributing to comprehension of content, or non-speech, which can mask the speech and reduce intelligibility. Audio accessibility solutions have therefore focused on speech enhancement type methods, for which several useful standard objective measures of quality exist. Recent developments in next-generation broadcast audio formats, in particular the roll out of object-based audio, facilitate more in-depth personalisation of the audio experience based on user preferences and needs. Recent research has demonstrated that many non-speech sounds do not strictly behave as maskers but can be critical for comprehension of the narrative for some viewers. This complex relationship between speech, non-speech audio and the viewer necessitate a more holistic approach to understanding quality of experience of accessible media. This paper reviews previous work and outlines such an approach, discussing accessibility strategies using next-generation audio formats and their implications for developing effective assessments of quality.

  相似文献   

9.
Assisted by soft computing methods, the work we present in this paper focuses on the design of energy-efficient algorithms for binaural hearing aids that aim to separate speech from other sounds the hearing impaired person is not interested in. To do this, the right and left hearing aids need to wirelessly transmit to each other some parameters involved in the speech separation algorithm. The problem is that this transmission appreciably reduces the battery life, the most important constrain for designing advanced algorithms in hearing aids. Reducing the number of bits used to represent the parameters to be transmitted will bring down the power consumption, but at the expense of degrading the ability of the system to separate the speech from the other sound sources. Aiming at solving this problem, our approach, based on quantizing the parameters to be transmitted, basically consists in computing the adequate number of quantization bits by means of a combination of neural networks and genetic algorithms in the effort of finding a balance between low bit rate (and thus, low power consumption) and good separation of speech. The results show that even by using only 2 bits/quantized-sample, the quality of the separation is as high as 70% of the limiting non-quantized quality separation factor, which has been found to be 85%.  相似文献   

10.
Electronic hearing protection devices are increasingly used in noisy environments. Theses devices feature a miniaturized external microphone and internal loudspeaker in addition to an analog or digital electronic circuit. They can transmit useful audio signals such as speech and warning signals to the protected ear and can reduce the sound pressure level using dynamic range compression. In the case of a digital electronic circuit, the transmission of audio signals may be noticeably delayed because of the latency introduced by the digital signal processor and by the analog-to-digital and digital-to-analog converters. These delayed audio signals will hence interfere with the audio signals perceived naturally through the passive acoustical path of the device. The proposed study presents an original procedure to evaluate, for two representative passive earplugs, the shortest delay at which human listeners start to perceive two sounds composed of the signal transmitted through the electronic circuit and the passively transmitted signal. This shortest delay is called the echo threshold and represents the delay between the time of perception of one fused sound from two separate sounds. In this study, a transient signal, a clean speech signal, a speech signal corrupted by factory noise, and a speech signal corrupted by babble noise are used to determine the echo thresholds of the two earplugs. Twenty untrained listeners participated in this study, and were asked to determine the echo thresholds using a test software in which attenuated signals are delayed from the original signals in real-time. The findings show that when using hearing devices, the echo threshold depends on four parameters: (a) the attenuation function of the device, (b) the duration of the signal, (c) the level of the background noise and (d) the type of background noise. Defined here as the shortest time delay at which at least 20% of the participants noticed an echo, the echo threshold was found to be 8 ms for a bell signal, 16 ms for clean speech and 22 ms for speech corrupted by babble noise when using a shallow earplug fit. When using a deep fit, the echo threshold was found to be 18 ms for a bell signal and 26 ms for clean speech and 68 ms for speech in factory. No echo threshold could be clearly determined for the speech signal in babble noise with a deep earplug fit.  相似文献   

11.
《Advanced Robotics》2013,27(1-2):47-67
Depending on the emotion of speech, the meaning of the speech or the intention of the speaker differs. Therefore, speech emotion recognition, as well as automatic speech recognition is necessary to communicate precisely between humans and robots for human–robot interaction. In this paper, a novel feature extraction method is proposed for speech emotion recognition using separation of phoneme class. In feature extraction, the signal variation caused by different sentences usually overrides the emotion variation and it lowers the performance of emotion recognition. However, as the proposed method extracts features from speech in parts that correspond to limited ranges of the center of gravity of the spectrum (CoG) and formant frequencies, the effects of phoneme variation on features are reduced. Corresponding to the range of CoG, the obstruent sounds are discriminated from sonorant sounds. Moreover, the sonorant sounds are categorized into four classes by the resonance characteristics revealed by formant frequency. The result shows that the proposed method using 30 different speakers' corpora improves emotion recognition accuracy compared with other methods by 99% significance level. Furthermore, the proposed method was applied to extract several features including prosodic and phonetic features, and was implemented on 'Mung' robots as an emotion recognizer of users.  相似文献   

12.
In the design of hearing aids (HA), the real-time speech-enhancement is done. The digital hearing aids should provide high signal-to-noise ratio, gain improvement and should eliminate feedback. In generic hearing aids the performance towards different frequencies varies and non uniform. Existing noise cancellation and speech separation methods drops the voice magnitude under the noise environment. The performance of the HA for frequency response is non uniform. Existing noise suppression methods reduce the required signal strength also. So, the performance of uniform sub band analysis is poor when hearing aid is concern. In this paper, a speech separation method using Non-negative Matrix Factorization (NMF) algorithm is proposed for wavelet decomposition. The Proposed non-uniform filter-bank was validated by parameters like band power, Signal-to-noise ratio (SNR), Mean Square Error (MSE), Signal to Noise and Distortion Ratio (SINAD), Spurious-free dynamic range (SFDR), error and time. The speech recordings before and after separation was evaluated for quality using objective speech quality measures International Telecommunication Union -Telecommunication standard ITU-T P.862.  相似文献   

13.
提出了一种利用隐马尔可夫模型和支持向量机作为两级分类器的分类方法,实现对语音、杯碟碰撞声、开门和关门声、口哨声以及电话铃声五种环境声音的分类。对于采集和预处理后的环境声音信号,首先在第一级采用HMM模型进行初步分类,找出概率最大的两类,确定每种环境声音最有可能属于的类别,然后采用第二级SVM分类器作出进一步的判断。实验结果表明,相对于单独使用两者中任何一种作为分类器的分类方法,该方法对环境声音的识别具有更高的分类准确性。  相似文献   

14.
Interactive Evolutionary Computation-Based Hearing Aid Fitting   总被引:1,自引:0,他引:1  
An interactive evolutionary computation (EC) fitting method is proposed that applies interactive EC to hearing aid fitting and the method is evaluated using a hearing aid simulator with human subjects. The advantages of the method are that it can optimize a hearing aid based on how a user hears and that it realizes whatever+whenever+wherever (W3) fitting. Conventional fitting methods are based on the user's partially measured auditory characteristics, the fitting engineer's experience, and the user's linguistic explanation of his or her hearing. These conventional methods, therefore, suffer from the fundamental problem that no one can experience another person's hearing. However, as interactive EC fitting uses EC to optimize a hearing aid based on the user's evaluation of his or her hearing, this problem is addressed. Moreover, whereas conventional fitting methods must use pure tones and bandpass noise for measuring hearing characteristics, our proposed method has no such restrictions. Evaluating the proposed method using speech sources, we demonstrate that it shows significantly better results than either the conventional method or the unprocessed case in terms of both speech intelligibility and speech quality. We also evaluate our method using musical sources, unusable for evaluation by conventional methods, and demonstrate that its sound quality is preferable to the unprocessed case  相似文献   

15.
针对应急广播中语音传输效率低的问题,提出了一种基于小波变换和K-奇异值分解(K-SVD)的语音压缩方法,以提升应急广播的信息传输时效性。首先,该方法舍弃语音小波分解得到的高频分量,在小波合成时用随机信号代替;其次,在低频分量的压缩感知过程中,用K-SVD字典学习算法训练的过完备字典对其稀疏表示;最后,采用改进的基于子空间回溯的广义正交匹配追踪算法重构信号。实验结果表明,在压缩效率为50%时,该方法重构应急广播语音的客观语音质量评分(PESQ)达到3.717,比其他对照算法分别提升了3%~47%,说明在保证压缩效率的同时,所提出的方法能提升应急广播语音重构质量,确保应急广播的传输时效性。  相似文献   

16.
Sensorineural hearing loss is associated with widening of auditory filter bandwidths, leading to increased spectral masking and degraded speech perception. Multi-band frequency compression can be used for reducing the effects of spectral masking. In this technique, the speech spectrum is divided into a number of analysis bands and spectral samples in each of these bands are compressed towards the band center by a constant compression factor. Implementation of the scheme with different types of frequency mappings, bandwidths, and segmentation for processing is investigated. Listening tests conducted for assessing the quality and intelligibility of the processed speech gave best results for critical bandwidth based compression using spectral segment mapping and pitch-synchronous processing.  相似文献   

17.
18.
传统的嘴部肌肉建模方法中对肌肉运动划分比较细致与独立,设置的控制参数较多,缺乏对肌肉运动的分解以及运动控制相关性的分析。本文提出一种基于运动分解的嘴部子运动单元模型。该方法首先建立基于标准Cadide-3模型的嘴部改进网格模型,然后根据解剖结构及肌肉运动分析,将嘴部运动分解为下颌骨旋转、唇部的收缩与舒张、凸出与上翘三个基本运动单元,最后根据输入的文本信息与音素间视觉权重函数,并结合嘴部的子运动合成得到语音同步的嘴部动画。实验结果表明,该方法能够快速合成与中文语音相匹配的唇部动画。  相似文献   

19.
Glottal stop sounds in Amharic are produced due to abrupt closure of the glottis without any significant gesture in the accompanying articulatory organs in the vocal tract system. It is difficult to observe the features of the glottal stop through spectral analysis, as the spectral features emphasize mostly the features of the vocal tract system. In order to spot the glottal stop sounds in continuous speech, it is necessary to extract the features of the source of excitation also, which may require some non-spectral methods for analysis. In this paper the linear prediction (LP) residual is used as an approximation to the excitation source signal, and the excitation features are extracted from the LP residual using zero frequency filtering (ZFF). The glottal closure instants (GCIs) or epoch are identified from the ZFF signal. At each GCI, the cross-correlation coefficients of successive glottal cycles of the LP residual, the normalized jitter and the logarithm of the peak normalized excitation strength (LPNES) are calculated. Further, the parameters of Gaussian approximation models are derived from the distributions of the excitation parameters. These model parameters are used to identify the regions of the glottal stop sounds in continuous speech. For the database used in this study 92.89% of the glottal stop regions are identified correctly, with 8.50% false indications.  相似文献   

20.
Although technologies for automatically adjusting the volume of mobile phone ringtones according to the ambient noise level have been developed, few studies have investigated the volume (dB) of the ringtone. This study suggested design recommendations for the ringtone volume under loud ambient noise. Based on signal detection theory, two-alternative forced-choice tracking was performed by thirty subjects to obtain hearing thresholds under noisy conditions. Six experimental conditions were examined: all combination of three pure tone frequencies (500 Hz, 1000 Hz, 4000 Hz) and two white noise levels (70 dB, 80 dB). The results showed that the ringtone volume should increase by 10–15 dB on average when the noise level increases from 70 dB to 80 dB. When adjusting the volume according to the ambient noise level, the volume should be changed differently according to the frequencies of a ringtone. The ringtone should be composed of low-frequency sounds under loud ambient noise because the subjects were very sensitive to the pure tone with frequency of 500 Hz. The results of this study could be used when developing design guidelines for the adaptive ringtone of a mobile phone. Moreover, designers can use this method to design other auditory signals such as notification and emergency alarms that have different chances of signal detectability.Relevance to industryThe results of this study may provide useful information to designers who consider the volume and frequency of a ringtone when adjusting ringtone volume according to ambient noise level. Moreover, the method used in the study could also be widely used to design auditory signals of mobile devices other than mobile phones.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号