首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Zero frequency resonator (ZFR) was proposed earlier for the extraction of glottal closure instants (GCIs) (Murty and Yegnanarayana 2008). The output of ZFR is an exponentially growing/decaying signal. The trend of this signal can be removed to get the required resolution for detecting relevant information. By considering a window size of typical 1–2 pitch periods, the trend removed signal mainly exhibits information related to GCIs. This work proposes two methods for the detection of glottal opening instants (GOIs) using ZFR. In the first method, the window size for trend removing is reduced to a lower level (say, 0.33 \(\times \) pitch period), and the possibility of hypothesizing GOIs is demonstrated. In the second method, window size remains in the range of 1–2 pitch periods, but the input to ZFR is modified to remove GCIs information. The proposed methods are evaluated using CMU-Arctic database and compared with existing methods for GOI detection. The performance for the detection of GOIs is comparable to that of GCIs and also existing methods.  相似文献   

2.
A degradation in the performance of automatic speech recognition systems (ASR) is observed in mismatched training and testing conditions. One of the reasons for this degradation is due to the presence of emotions in the speech. The main objective of this work is to improve the performance of ASR in the presence of emotional conditions using prosody modification. The influence of different emotions on the prosody parameters is exploited in this work. Emotion conversion methods are employed to generate the word level non-uniform prosody modified speech. Modification factors for prosodic components such as pitch, duration and energy are used. The prosody modification is done in two ways. Firstly, emotion conversion is done at the testing stage to generate the neutral speech from the emotional speech. Secondly, the ASR is trained with the generated emotional speech from the neutral speech. In this work, the presence of emotions in speech is studied for the Telugu ASR systems. A new database of IIIT-H Telugu speech corpus is collected to build the large vocabulary neutral Telugu speech ASR system. The emotional speech samples from IITKGP-SESC Telugu corpus are used for testing it. The emotions of anger, happiness and compassion are considered during the evaluation. An improvement in the performance of ASR systems is observed in the prosody modified speech.  相似文献   

3.
Automatic speech recognition (ASR) systems rely almost exclusively on short-term segment-level features (MFCCs), while ignoring higher level suprasegmental cues that are characteristic of human speech. However, recent experiments have shown that categorical representations of prosody, such as those based on the Tones and Break Indices (ToBI) annotation standard, can be used to enhance speech recognizers. However, categorical prosody models are severely limited in scope and coverage due to the lack of large corpora annotated with the relevant prosodic symbols (such as pitch accent, word prominence, and boundary tone labels). In this paper, we first present an architecture for augmenting a standard ASR with symbolic prosody. We then discuss two novel, unsupervised adaptation techniques for improving, respectively, the quality of the linguistic and acoustic components of our categorical prosody models. Finally, we implement the augmented ASR by enriching ASR lattices with the adapted categorical prosody models. Our experiments show that the proposed unsupervised adaptation techniques significantly improve the quality of the prosody models; the adapted prosodic language and acoustic models reduce binary pitch accent (presence versus absence) classification error rate by 13.8% and 4.3%, respectively (relative to the seed models) on the Boston University Radio News Corpus, while the prosody-enriched ASR exhibits a 3.1% relative reduction in word error rate (WER) over the baseline system.  相似文献   

4.
Inge Troch 《Automatica》1973,9(1):117-124
For linear multivariable control systems the question of observing the state by means of sampling with arbitrary but fixed choice of the sampling instants is discussed. The freedom due to this arbitrary choice is used to derive criteria for a choice of the sampling instants which is as advantageous as possible as far as the propagation of measuring errors is concerned. The extended principle of duality is formulated and the dual problem of control by means of step functions is covered. Further topics mentioned are e.g. the identification problem and the influence of uncertainties of parameters.  相似文献   

5.
In this work we develop a speaker recognition system based on the excitation source information and demonstrate its significance by comparing with the vocal tract information based system. The speaker-specific excitation information is extracted by the subsegmental, segmental and suprasegmental processing of the LP residual. The speaker-specific information from each level is modeled independently using Gaussian mixture modeling—universal background model (GMM-UBM) modeling and then combined at the score level. The significance of the proposed speaker recognition system is demonstrated by conducting speaker verification experiments on the NIST-03 database. Two different tests, namely, Clean test and Noisy test are conducted. In case of Clean test, the test speech signal is used as it is for verification. In case of Noisy test, the test speech is corrupted by factory noise (9 dB) and then used for verification. Even though for Clean test case, the proposed source based speaker recognition system still provides relatively poor performance than the vocal tract information, its performance is better for Noisy test case. Finally, for both clean and noisy cases, by providing different and robust speaker-specific evidences, the proposed system helps the vocal tract system to further improve the overall performance.  相似文献   

6.
This paper presents a new approach for solving optimal control problems for switched systems. We focus on problems in which a prespecified sequence of active subsystems is given. For such problems, we need to seek both the optimal switching instants and the optimal continuous inputs. In order to search for the optimal switching instants, the derivatives of the optimal cost with respect to the switching instants need to be known. The most important contribution of the paper is a method which first transcribes an optimal control problem into an equivalent problem parameterized by the switching instants and then obtains the values of the derivatives based on the solution of a two point boundary value differential algebraic equation formed by the state, costate, stationarity equations, the boundary and continuity conditions, along with their differentiations. This method is applied to general switched linear quadratic problems and an efficient method based on the solution of an initial value ordinary differential equation is developed. An extension of the method is also applied to problems with internally forced switching. Examples are shown to illustrate the results in the paper.  相似文献   

7.
The model of prosody used in the Aculab TTS system is unusual in several respects. Firstly, it is based firmly on current metrical theories of prosody. Secondly, it is entirely knowledge-based: there are no stochastic components in the model. Thirdly, it makes use of a quasi-random element to avoid the predictability of conventional synthetic prosody. Fourthly, it is specifically designed for multilingual use: it currently handles several Germanic and Romance languages.  相似文献   

8.
Prosody is the change of F0 and intensity in time and the speed of articulation. The presence or absence of the realization of word accent is also examined as an important feature in prosody generation. During verbal communication various prosody forms contribute to the expression of the textual content of the message on the one hand and of the personal intention of the speaker on the other. In many cases in dialogues the same text can be (must be) pronounced with different intentions. Our goal was to find what kind of prosody patterns and rules are characteristic of these utterance types and what the acoustic relationship among them is for Hungarian. In this article the prosody structures of the most important dialogue components are described, and invariant structures are derived and verified by speech synthesis. Rules are also stated as generalized function structures to show the acoustic relationship of the prosody of these expressions to the prosody of statements. Using these rules, it is possible to convert the prosody of a given utterance type to another one by preserving the naturalness of the speech. The rules can be used in text to speech (TTS) conversion to generate spoken dialogues.  相似文献   

9.
An important result in the robust adaptive control of continuous-time systems, using the persistent excitation of the reference input, was recently given by Narendra and Annaswamy (1986, IEEE Trans. Aut. Control, AC-31, 306–315). According to this result, the global boundedness of all the signals in the adaptive system can be assured if the degree of persistent excitation of the reference input is larger than an appropriate bound on the external disturbance. The main theorem in Narendra and Annaswamy (1986) is proved for a class of plants characterized by the property that the reference model used in the adaptive controller could be chosen to be strictly positive real, a condition which involves constraints on the relative degree of the plant. This paper presents a generalization of the above result to plants of arbitrary relative degree. Together with the work reported in the earlier paper, it demonstrates that the boundedness of all the signals in an adaptive system in the presence of bounded disturbances and arbitrary initial conditions can be assured by increasing the degree of persistent excitation of the reference input.  相似文献   

10.
Association rule is one of the data mining techniques involved in discovering information that represents the association among data. Data in the database sometimes appear infrequent but highly associated with a specific data. This paper proposes a technique for significant rare data by introducing second support in discovering the association rules of such data. We show that the proposed approach provides better performance as compared to standard association rules techniques.  相似文献   

11.
12.
如何更好地保护量子图像的版权,是量子水印技术的一个重要研究课题。基于对数极坐标的量子图像表示,提出了一种新颖的量子水印算法。根据通信双方共享一组密钥的值,发送方选择量子载体图像像素灰度值的高四位中的某一位作为受控位;再根据所选受控位的值,发送方将水印信息嵌入到量子载体图像的最低有效位或次最低有效位上。这种基于密钥的受控最低有效位修改技术,提高了量子水印图像的透明性和稳健性。基于MATLAB的实验仿真和性能分析,也表明新算法在透明性、稳健性和嵌入容量上有着良好的表现。  相似文献   

13.
14.
韵律结构的自动预测是高自然度文语转换(TTS)系统的关键组成部分,直接影响到合成语音的自然度和表现力。该文建立了一个同时具有语法信息与韵律结构标注的汉语语料库。在这一语料库的基础上,对汉语的韵律结构组成、韵律结构与语法语义之间的关系进行了分析,并进行了预测试验。研究发现,汉语的韵律结构虽与语法结构不同,但是有着密切的联系,韵律结构可以通过语法结构进行预测。韵律结构除与语法结构有关之外,还要受到语句语义的制约。  相似文献   

15.
This paper presents the data-driven prediction of word level prosody breaks modelling for the Slovenian language. Automatic learning techniques depend on the construction of a large corpus labeled appropriately. This labeling can be done either automatically, or by hand. While automatic labeling can be less accurate than hand labeling, the latter is very time consuming and, in some cases, inconsistent. Therefore, a new interactive tool for word level prosody labeling (major/minor breaks) is presented together with a new semi-automatic approach for determining prosody breaks. This interactive tool combines the advantages of hand labeling and automatic labeling by achieving a high consistency in labeling and reducing the time needed for hand labeling. The labeled Slovenian corpus has been used to train our phrase break prediction module, implementing a neural network (NN) structure. Experiments for the data-driven prediction of major = minor and major/minor phrase breaks were performed. The prediction accuracy achieved marks state-of-the-art word level prosody breaks prediction for the Slovenian language and is comparable to the prediction accuracy of other approaches in which more complex NN structures (Müller et al., 2000) or other prediction methods (Black and Tailor, 1997) were applied, and a much larger corpus was used for training. The overall prediction accuracy achieved is 94% for major = minor breaks and over 98/92% for major/minor phrase breaks, respectively.  相似文献   

16.
The purpose of this study was to develop a textile component system that could be added to a compression garment to achieve body posture that more closely resembles an ideal balanced posture. The approach of this study was to find a middle ground of posture correctors and compression garments by combining structural support elements with garment compression to achieve effective posture modification as well as comfort. To achieve this goal, a Posture Modification System using Soft materials structures (PMSS) was developed by experimenting with textile elastic bands to mimic the structure and placement of anatomical postural features (muscles and spinal column) of a woman’s back torso. For prototype development, a bodysuit type shapewear garment was used to incorporate the PMSS and a wear test with female participants was conducted. To assess posture changes through body angles, participants were 3D scanned and questionnaires were administered to determine wearer acceptability. Body angle assessment indicated that wearing the prototype positively affected posture changes including more balanced shoulders, more aligned lateral center of gravity, and straighter spine. As assessed with a questionnaire, the prototype achieved higher wearer acceptability in terms of posture, body shape, and fit compared to the shapewear without the PMSS. This study shows the potential of developing the soft structural posture modification system for use beyond the lingerie category. Furthermore, more aligned postures exhibited by participants wearing the PMSS enhanced garment while carrying loads indicate potential in developing soft-structured posture support garments for load-bearing situations in industrial and military settings.  相似文献   

17.
Hierarchical Structure and Word Strength Prediction of Mandarin Prosody   总被引:1,自引:0,他引:1  
We use Stem-ML to build an automatic learning system for Mandarin prosody that allows us to make quantitative measurements of prosodic strengths. Stem-ML is a phenomenological model of the muscle dynamics and planning process that controls the tension of the vocal folds. Because Stem-ML describes the interactions between nearby tones or accents, we were able to use a highly constrained model with only one accent template for each lexical tone category, and a single prosodic strength per word. The model accurately reproduces the intonation of the speaker, capturing 87% of the variance of the speech's fundamental frequency, f 0. The result reveals strong alternating metrical patterns in words, and suggests that the speaker uses word strength to mark a hierarchy of sentence, clause, phrase, and word boundaries.  相似文献   

18.
以自然语流中出现的焦点为对象,对汉语中焦点的声学特征表现进行了研究.研究结果表明:(1)焦点对音节韵律特征的影响与音节所在的高层韵律环境(上下文相关信息)密切相关.处于不同高层韵律环境的音节,其韵律特征受焦点影响改变的幅度和方向是不同的.(2)焦点的轻重感知一定程度上可以通过线性调节语音声学参数增量来表现出来.(3)在语音合成中,焦点的韵律特征可分为两步来进行预测.实验证实,在焦点位置已知的情况下该方法能够合成自然度很高的汉语语句焦点.  相似文献   

19.
Modifying the prosody parameters like pitch, duration and strength of excitation by desired factor is termed as prosody modification. The objective of this work is to develop a dynamic prosody modification method based on zero frequency filtered signal (ZFFS), a byproduct of zero frequency filtering (ZFF). The existing epoch based prosody modification techniques use epochs as pitch markers and the required prosody modification is achieved by the interpolation of epoch intervals plot. Alternatively, this work proposes a method for prosody modification by the resampling of ZFFS. Also the existing epoch based prosody modification method is further refined for modifying the prosodic parameters at every epoch level. Thus providing more flexibility for prosody modification. The general framework for deriving the modified epoch locations can also be used for obtaining the dynamic prosody modification from existing PSOLA and epoch based prosody modification methods. The quality of the prosody modified speech is evaluated using waveforms, spectrograms and subjective studies. The usefulness of the proposed dynamic prosody modification is demonstrated for neutral to emotional conversion task. The subjective evaluations performed for the emotion conversion indicate the effectiveness of the dynamic prosody modification over the fixed prosody modification for emotion conversion. The dynamic prosody modified speech files synthesized using the proposed, epoch based and TD-PSOLA methods are available at http://www.iitg.ac.in/eee/emstlab/demos/demo5.php.  相似文献   

20.
The perceived quality of synthetic speech strongly depends on its prosodic naturalness. Departing from earlier works by Mixdorff on a linguistically motivated model of German intonation based on the Fujisaki model, an integrated approach to predicting F0 along with syllable duration and energy was developed. The current paper first presents some statistical results concerning the relationship between linguistic and phonetic information underlying an utterance and its prosodic features. These results were employed for training the MFN-based integrated prosodic model predicting syllable duration and energy along with syllable-aligned Fujisaki control parameters. The paper then focusses on the method of perceptual evaluation developed, comparing resynthesis stimuli created by controlled prosodic degrading of natural speech with stimuli created using the integrated model. The results indicate that the integrated model generally receives better ratings than degraded stimuli with comparable durational and F0 deviations from the original. An important outcome is the observation that the accuracy of the predicted syllable durations appears to be a stronger factor with respect to the perceived quality than the accuracy of the predicted F0 contour.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号