首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study deals with the numerical solution of a 2D unsteady flow of a compressible viscous fluid in a channel for low inlet airflow velocity. The unsteadiness of the flow is caused by a prescribed periodic motion of a part of the channel wall with large amplitudes, nearly closing the channel during oscillations. The channel is a simplified model of the glottal space in the human vocal tract and the flow can represent a model of airflow coming from the trachea, through the glottal region with periodically vibrating vocal folds to the human vocal tract.The flow is described by the system of Navier–Stokes equations for laminar flows. The numerical solution is implemented using the finite volume method (FVM) and the predictor–corrector MacCormack scheme with Jameson artificial viscosity using a grid of quadrilateral cells. Due to the motion of the grid, the basic system of conservation laws is considered in the Arbitrary Lagrangian–Eulerian (ALE) form.The authors present the numerical simulations of flow fields in the channel, acquired from a program developed exclusively for this purpose. The numerical results for unsteady flows in the channel are presented for inlet Mach number M = 0.012, Reynolds number Re = 4.5 × 103 and the wall motion frequency 20 and 100 Hz.  相似文献   

2.
《Advanced Robotics》2013,27(1-2):105-120
We developed a three-dimensional mechanical vocal cord model for Waseda Talker No. 7 (WT-7), an anthropomorphic talking robot, for generating speech sounds with various voice qualities. The vocal cord model is a cover model that has two thin folds made of thermoplastic material. The model self-oscillates by airflow exhausted from the lung model and generates the glottal sound source, which is fed into the vocal tract for generating the speech sound. Using the vocal cord model, breathy and creaky voices, as well as the modal (normal) voice, were produced in a manner similar to the human laryngeal control. The breathy voice is characterized by a noisy component mixed with the periodic glottal sound source and the creaky voice is characterized by an extremely low-pitch vibration. The breathy voice was produced by adjusting the glottal opening and generating the turbulence noise by the airflow just above the glottis. The creaky voice was produced by adjusting the vocal cord tension, the sub-glottal pressure and the vibration mass so as to generate a double-pitch vibration with a long pitch interval. The vocal cord model used to produce these voice qualities was evaluated in terms of the vibration pattern as measured by a high-speed camera, the glottal airflow and the acoustic characteristics of the glottal sound source, as compared to the data for a human.  相似文献   

3.
The great majority of current voice technology applications rely on acoustic features, such as the widely used MFCC or LP parameters, which characterize the vocal tract response. Nonetheless, the major source of excitation, namely the glottal flow, is expected to convey useful complementary information. The glottal flow is the airflow passing through the vocal folds at the glottis. Unfortunately, glottal flow analysis from speech recordings requires specific and complex processing operations, which explains why it has been generally avoided. This paper gives a comprehensive overview of techniques for glottal source processing. Starting from analysis tools for pitch tracking, detection of glottal closure instant, estimation and modeling of glottal flow, this paper discusses how these tools and techniques might be properly integrated in various voice technology applications.  相似文献   

4.
This paper presents a new glottal inverse filtering (GIF) method that utilizes a Markov chain Monte Carlo (MCMC) algorithm. First, initial estimates of the vocal tract and glottal flow are evaluated by an existing GIF method, iterative adaptive inverse filtering (IAIF). Simultaneously, the initially estimated glottal flow is synthesized using the Rosenberg–Klatt (RK) model and filtered with the estimated vocal tract filter to create a synthetic speech frame. In the MCMC estimation process, the first few poles of the initial vocal tract model and the RK excitation parameter are refined in order to minimize the error between the synthetic and original speech signals in the time and frequency domain. MCMC approximates the posterior distribution of the parameters, and the final estimate of the vocal tract is found by averaging the parameter values of the Markov chain. Experiments with synthetic vowels produced by a physical modeling approach show that the MCMC-based GIF method gives more accurate results compared to two known reference methods.  相似文献   

5.
This paper describes a robust glottal source estimation method based on a joint source-filter separation technique. In this method, the Liljencrants-Fant (LF) model, which models the glottal flow derivative, is integrated into a time-varying ARX speech production model. These two models are estimated in a joint optimization procedure, in which a Kalman filtering process is embedded for adaptively identifying the vocal tract parameters. Since the formulated joint estimation problem is a multiparameter nonlinear optimization procedure, we separate the optimization procedure into two passes. The first pass initializes the glottal source and vocal tract models by solving a quasi-convex approximate optimization problem. Having robust initial values, the joint estimation procedure determines the accuracy of model estimation implemented with a trust-region descent optimization algorithm. Experiments with synthetic and real voice signals show that the proposed method is a robust glottal source parameter estimation method with a high degree of accuracy.  相似文献   

6.
声门激励信号是语音信号的源信号,可用于语音特征参数的有效提取。研究了从观测语音获取声门激励的两种方法——线性预测法和倒谱法;用实际录制的语音做计算机仿真实验,比较了两种方法的性能和特点。结果表明倒谱法获取声门激励、由它提取基因周期等激励特征参数的精度高,但计算量相对较大;线性预测法由于采用高效算法,不仅获取声门激励的速度快,而且可同时获取声道模型参数、语音功率谱等重要参数,是获取声门激励的常用方法。  相似文献   

7.
A numerical model for the three-dimensional starting jet flow in a channel with a static larynx-shaped constriction is presented. Detailed resolution of this kind of jet flow is necessary in order to understand the complex coupling between flow and acoustics in the process of human phonation. The numerical model is based on the equation of continuity and the Navier–Stokes equations. The investigations are done with the open source CFD package OpenFOAM. Numerical simulations are performed for a square-sectioned channel geometry, which is constricted with a fixed shape conforming to the fully opened human glottis. Time-dependent inflow boundary conditions are applied in order to model transient glottal flow rates. The setup of the numerical simulations corresponds to the configuration of a model experiment in order to allow detailed validation. The numerical results are in good agreement with the experimental data, when the near-wall region in the glottal gap is adequately resolved by the numerical grid. The results illustrate the complex interactions between the jet flow and the surrounding vortices.  相似文献   

8.
Glottal stop sounds in Amharic are produced due to abrupt closure of the glottis without any significant gesture in the accompanying articulatory organs in the vocal tract system. It is difficult to observe the features of the glottal stop through spectral analysis, as the spectral features emphasize mostly the features of the vocal tract system. In order to spot the glottal stop sounds in continuous speech, it is necessary to extract the features of the source of excitation also, which may require some non-spectral methods for analysis. In this paper the linear prediction (LP) residual is used as an approximation to the excitation source signal, and the excitation features are extracted from the LP residual using zero frequency filtering (ZFF). The glottal closure instants (GCIs) or epoch are identified from the ZFF signal. At each GCI, the cross-correlation coefficients of successive glottal cycles of the LP residual, the normalized jitter and the logarithm of the peak normalized excitation strength (LPNES) are calculated. Further, the parameters of Gaussian approximation models are derived from the distributions of the excitation parameters. These model parameters are used to identify the regions of the glottal stop sounds in continuous speech. For the database used in this study 92.89% of the glottal stop regions are identified correctly, with 8.50% false indications.  相似文献   

9.
在研究传统的逆滤波和声源建模相结合的分析方法所存在的问题的基础上,提出了一种改进的声源分析算法和利用该算法自动分析自然语流的方法.该方法对于传统的方法所存在着的鲁棒性低、分析不精确、不能自动分析大规模的自然语流等问题具有较好的改进;文中给出的对合成元音以及自然语流的实验结果验证了所提方法的有效性.  相似文献   

10.
ABSTRACT

Vocal cord diseases can cause irregular vibration of the vocal cords, resulting in abnormalities. Therefore, it is necessary to study abnormal vocal cords in a vocal cord model. Research that focuses on vocal cord diseases mainly combines acoustic parameters and pattern recognition. However, it is also important to study the causes of vocal abnormalities in vocal cord diseases. In this paper, a bionic vocal system is modeled, and the influence of pulmonary airflow changes on glottic vibration excitation is analyzed. The effects of asymmetric vocal polyps on changes to the vocal airflow and flow field are studied, showing that the proposed model can assist in the detection of abnormal voice.  相似文献   

11.
Acoustic transmission in the vocal tract may be simulated in the time domain using the model of Kelly and Lochbaum. A disadvantage of this simulation is that a fixed number of fixed length sections must be used, so that it cannot be used to model variability in vocal tract length, caused by lip protrusion or larynx lowering. This paper describes a simple modification in which digital filters, derived from transmission line T -sections and including glottal and lip impedance models, are appended at each end of a Kelly–Lochbaum filter. The lengths of the sections may be made continuously variable, allowing the lip and larynx segments of the model to be varied, while maintaining a fixed sampling rate. This new technique is compared with the earlier method due to Strube and is found capable of longer extensions and reduced spectral amplitude distortion.  相似文献   

12.
声道的调频-调幅模型及其在语音分析中的应用   总被引:3,自引:0,他引:3  
传统的线性声学理论是基于这样一个假设:流经声带的气流在声道中是以平面波的形式传播,而根据Teager的研究结果,由于在声道中有涡流分布,这个假设将不成立,基于这种非线性现象的存在,Msaragos提出了一个调频-调幅模型来表示语音产生的过程;近年来这个模型已经被成功地应用到语音处理的许多领域中,介绍了调频-调幅模型的产生背景及其产生主要理论核心,重点讨论它在语音分析中的应用有在异语音识别中的应用前景。  相似文献   

13.
We propose a pitch synchronous approach to design the voice conversion system taking into account the correlation between the excitation signal and vocal tract system characteristics of speech production mechanism. The glottal closure instants (GCIs) also known as epochs are used as anchor points for analysis and synthesis of the speech signal. The Gaussian mixture model (GMM) is considered to be the state-of-art method for vocal tract modification in a voice conversion framework. However, the GMM based models generate overly-smooth utterances and need to be tuned according to the amount of available training data. In this paper, we propose the support vector machine multi-regressor (M-SVR) based model that requires less tuning parameters to capture a mapping function between the vocal tract characteristics of the source and the target speaker. The prosodic features are modified using epoch based method and compared with the baseline pitch synchronous overlap and add (PSOLA) based method for pitch and time scale modification. The linear prediction residual (LP residual) signal corresponding to each frame of the converted vocal tract transfer function is selected from the target residual codebook using a modified cost function. The cost function is calculated based on mapped vocal tract transfer function and its dynamics along with minimum residual phase, pitch period and energy differences with the codebook entries. The LP residual signal corresponding to the target speaker is generated by concatenating the selected frame and its previous frame so as to retain the maximum information around the GCIs. The proposed system is also tested using GMM based model for vocal tract modification. The average mean opinion score (MOS) and ABX test results are 3.95 and 85 for GMM based system and 3.98 and 86 for the M-SVR based system respectively. The subjective and objective evaluation results suggest that the proposed M-SVR based model for vocal tract modification combined with modified residual selection and epoch based model for prosody modification can provide a good quality synthesized target output. The results also suggest that the proposed integrated system performs slightly better than the GMM based baseline system designed using either epoch based or PSOLA based model for prosody modification.  相似文献   

14.
针对构音异常,本文提出了使用声道仿真来实现辅助治疗的方法。基于声道是一个弯曲的、三维的具有慢时变特性的声学管道,并且在声道中的声波传播是平面波的特性,可以把声道等效于一个具有不同截面的圆柱体或者椭圆体管道。使用极点形式,在牛顿插值的基础上得到共振峰。对声道进行了60段分段,通过经验公式得到声道在不同部位的面积。定义了描述声道特性的9个参数,进而对这9个参数使用Corana算法进行优化。使用辐射模型描述声音从嘴唇辐射出去以后的特性。最后进行声音的合成,这个声音可用于反馈治疗。经过实验证明,这种声道仿真模型可以为制定合适治疗方法提供参考。  相似文献   

15.
Vocal fry (also called creak, creaky voice, and pulse register phonation) is a voice quality that carries important linguistic or paralinguistic information, depending on the language. We propose a set of acoustic measures and a method for automatically detecting vocal fry segments in speech utterances. A glottal pulse-synchronized method is proposed to deal with the very low fundamental frequency properties of vocal fry segments, which cause problems in the classic short-term analysis methods. The proposed acoustic measures characterize power, aperiodicity, and similarity properties of vocal fry signals. The basic idea of the proposed method is to scan for local power peaks in a ldquovery short-termrdquo power contour for obtaining glottal pulse candidates, check for periodicity properties, and evaluate a similarity measure between neighboring glottal pulse candidates for deciding the possibility of being vocal fry pulses. In the periodicity analysis, autocorrelation peak properties are taken into account for avoiding misdetection of periodicity in vocal fry segments. Evaluation of the proposed acoustic measures in the automatic detection resulted in 74% correct detection, with an insertion error rate of 13%.  相似文献   

16.
Primary voice production occurs in the larynx through vibrational movements carried out by vocal folds. However, many problems can affect this complex system resulting in voice disorders. In this context, time–frequency–shape analysis based on embedding phase space plots and nonlinear dynamics methods have been used to evaluate the vocal fold dynamics during phonation. For this purpose, the present work used high-speed video to record the vocal fold movements of three subjects and extract the glottal area time series using an image segmentation algorithm. This signal is used for an optimization method which combines genetic algorithms and a quasi-Newton method to optimize the parameters of a biomechanical model of vocal folds based on lumped elements (masses, springs and dampers). After optimization, this model is capable of simulating the dynamics of recorded vocal folds and their glottal pulse. Bifurcation diagrams and phase space analysis were used to evaluate the behavior of this deterministic system in different circumstances. The results showed that this methodology can be used to extract some physiological parameters of vocal folds and reproduce some complex behaviors of these structures contributing to the scientific and clinical evaluation of voice production.  相似文献   

17.
The vocal source and the pulse shape of the glottal flow are determined through the regularized ratio of the speech signal spectra at the intervals of the open and closed vocal slit within each period of the fundamental tone. Three databases were used: Russian numerals for 216 men and 177 women, the base obtained by converting the Russian database by the codec on 9.2 kbps, and the TIMIT database. The pitch period and 7 coefficients for the principal components of the glottal flow provide an average error of recognizing males below 8% for a sequence of 6 vowels. The minimum average recognition error for the initial base of Russian numerals for females makes about 15%, for males in the codec database makes about 15%, and for males in the TIMIT makes about 44%. The minimum average error of males’ recognition in the space of 7 coefficients for the principal components in the Russian database makes about 26%, but about 27% of the speakers have an average error of less than 10%.  相似文献   

18.
Time domain articulatory vocal tract modeling in one-dimensional (1-D) is well established. Previous studies into two-dimensional (2-D) simulation of wave propagation in the vocal tract have shown it to present accurate static vowel synthesis. However, little has been done to demonstrate how such a model might accommodate the dynamic tract shape changes necessary in modeling speech. Two methods of applying the area function to the 2-D digital waveguide mesh vocal tract model are presented here. First, a method based on mapping the cross-sectional area onto the number of waveguides across the mesh, termed a widthwise mapping approach is detailed. Discontinuity problems associated with the dynamic manipulation of the model are highlighted. Second, a new method is examined that uses a static-shaped rectangular mesh with the area function translated into an impedance map which is then applied to each waveguide. Two approaches for constructing such a map are demonstrated; one using a linear impedance increase to model a constriction to the tract and another using a raised cosine function. Recommendations are made towards the use of the cosine method as it allows for a wider central propagational channel. It is also shown that this impedance mapping approach allows for stable dynamic shape changes and also permits a reduction in sampling frequency leading to real-time interaction with the model  相似文献   

19.
Speaker recognition is carried out in the space of the functional parameters of the area of the glottal cross-section, found by solving the inverse problem. This problem is solved in two stages: first, the signal obtained by inverse filtering is approximated using the vocal source model, and then the glottal area model parameters, which generate the calculated vocal source impulse, are computed. Speaker recognition is carried out on a database of Russian numerals from 0 to 9 separately for men (48 speakers) and women (37 speakers) at the segments of stressed vowels. Various methods of recognition are studied: the Gaussian mixture model (GMM), support vector machines (SVMs), discriminant analysis, naive Bayes classifier (NB), the method of classification trees (CTREE), and the Parzen window classifier. The best results were obtained using the method of SVMs and the Parzen method: the average total error of identification of men was 4.9% and 5.1%, and that of women—8.2% and 8.8%, respectively.  相似文献   

20.
This paper presents a technique to transform high-effort voices into breathy voices using adaptive pre-emphasis linear prediction (APLP). The primary benefit of this technique is that it estimates a spectral emphasis filter that can be used to manipulate the perceived vocal effort. The other benefit of APLP is that it estimates a formant filter that is more consistent across varying voice qualities. This paper describes how constant pre-emphasis linear prediction (LP) estimates a voice source with a constant spectral envelope even though the spectral envelope of the true voice source varies over time. A listening experiment demonstrates how differences in vocal effort and breathiness are audible in the formant filter estimated by constant pre-emphasis LP. APLP is presented as a technique to estimate a spectral emphasis filter that captures the combined influence of the glottal source and the vocal tract upon the spectral envelope of the voice. A final listening experiment demonstrates how APLP can be used to effectively transform high-effort voices into breathy voices. The techniques presented here are relevant to researchers in voice conversion, voice quality, singing, and emotion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号