首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper presents empirical results of an analysis on the role of prosody in the recognition of dialogue acts and utterance mood in a practical dialogue corpus in Mexican Spanish. The work is configured as a series of machine-learning experimental conditions in which models are created by using intonational and other data as predictors and dialogue act tagging data as targets. We show that utterance mood can be predicted from intonational information, and that this mood information can then be used to recognize the dialogue act.  相似文献   

2.
用于神经网络说话人识别的PCA-GA研究   总被引:1,自引:1,他引:0  
针对用于神经网络说话人识别的海量特征参数带来的识别率和网络训练稳定性的问题,提出了一种用于神经网络的基于语音特征参数的PCA新方法.该方法提取出的新特征参数在神经网络中的识别率和训练速度得到较大提高.结合GA能有效防止网络收敛于局部极小点,缩短训练时间,提高网络稳定性.从而全面提高了基于NN的说话人识别效果.  相似文献   

3.
Automatic Speaker Recognition (ASR) refers to the task of identifying a person based on his or her voice with the help of machines. ASR finds its potential applications in telephone based financial transactions, purchase of credit card and in forensic science and social anthropology for the study of different cultures and languages. Results of ASR are highly dependent on database, i.e., the results obtained in ASR are meaningless if recording conditions are not known. In this paper, a methodology and a typical experimental setup used for development of corpora for various tasks in the text-independent speaker identification in different Indian languages, viz., Marathi, Hindi, Urdu and Oriya have been described. Finally, an ASR system is presented to evaluate the corpora.  相似文献   

4.
Literature review on prosody reveals the lack of corpora for prosodic studies in Catalan and Spanish. In this paper, we present a corpus intended to fill this gap. The corpus comprises two distinct data-sets, a news subcorpus and a dialogue subcorpus, the latter containing either conversational or task-oriented speech. More than 25 h were recorded by twenty eight speakers per language. Among these speakers, eight were professional (four radio news broadcasters and four advertising actors). The entire material presented here has been transcribed, aligned with the acoustic signal and prosodically annotated. Two major objectives have guided the design of this project: (i) to offer a wide coverage of representative real-life communicative situations which allow for the characterization of prosody in these two languages; and (ii) to conduct research studies which enable us to contrast the speakers different speaking styles and discursive practices. All material contained in the corpus is provided under a Creative Commons Attribution 3.0 Unported License.  相似文献   

5.
The use of quality information for multilevel speaker recognition systems is addressed in this contribution. From a definition of what constitutes a quality measure, two applications are proposed at different phases of the recognition process: scoring and multilevel fusion stages. The traditional likelihood scoring stage is further developed providing guidelines for the practical application of the proposed ideas. Conventional user-independent multilevel support vector machine (SVM) score fusion is also adapted for the inclusion of quality information in the fusion process. In particular, quality measures meeting three different goodness criteria: SNR, F0 deviations and the ITU P.563 objective speech quality assessment are used in the speaker recognition process. Experiments carried out in the Switchboard-I database assess the benefits of the proposed quality-guided recognition approach for both the score computation and score fusion stages.  相似文献   

6.
A novel approach for joint speaker identification and speech recognition is presented in this article. Unsupervised speaker tracking and automatic adaptation of the human-computer interface is achieved by the interaction of speaker identification, speech recognition and speaker adaptation for a limited number of recurring users. Together with a technique for efficient information retrieval a compact modeling of speech and speaker characteristics is presented. Applying speaker specific profiles allows speech recognition to take individual speech characteristics into consideration to achieve higher recognition rates. Speaker profiles are initialized and continuously adapted by a balanced strategy of short-term and long-term speaker adaptation combined with robust speaker identification. Different users can be tracked by the resulting self-learning speech controlled system. Only a very short enrollment of each speaker is required. Subsequent utterances are used for unsupervised adaptation resulting in continuously improved speech recognition rates. Additionally, the detection of unknown speakers is examined under the objective to avoid the requirement to train new speaker profiles explicitly. The speech controlled system presented here is suitable for in-car applications, e.g. speech controlled navigation, hands-free telephony or infotainment systems, on embedded devices. Results are presented for a subset of the SPEECON database. The results validate the benefit of the speaker adaptation scheme and the unified modeling in terms of speaker identification and speech recognition rates.  相似文献   

7.
8.
Gaussian mixture model (GMM) has been widely used for modeling speakers. In speaker identification, one major problem is how to generate a set of GMMs for identification purposes based upon the training data. Due to the hill-climbing characteristic of the maximum likelihood (ML) method, any arbitrary estimate of the initial model parameters will usually lead to a sub-optimal model in practice. To resolve this problem, this paper proposes a hybrid training method based on genetic algorithm (GA). It utilizes the global searching capability of GA and combines the effectiveness of the ML method. Experimental results based on TI46 and TIMIT showed that this hybrid GA could obtain more optimized GMMs and better results than the simple GA and the traditional ML method.  相似文献   

9.
Over the last two decades, automatic speaker recognition has been an interesting and challenging problem to speech researchers. It can be classified into two different categories, speaker identification and speaker verification. In this paper, a new classifier, extreme learning machine, is examined on the text-independent speaker verification task and compared with SVM classifier. Extreme learning machine (ELM) classifiers have been proposed for generalized single hidden layer feedforward networks with a wide variety of hidden nodes. They are extremely fast in learning and perform well on many artificial and real regression and classification applications. The database used to evaluate the ELM and SVM classifiers is ELSDSR corpus, and the Mel-frequency Cepstral Coefficients were extracted and used as the input to the classifiers. Empirical studies have shown that ELM classifiers and its variants could perform better than SVM classifiers on the dataset provided with less training time.  相似文献   

10.
A text-independent speaker recognition system based on multi-resolution singular value decomposition (MSVD) is proposed. The MSVD is applied to the speaker data compression and feature extraction not at the square matrix. Our results have shown that this MSVD introduced better performance than the other Karhunen-Loeve transform with respect to the percentages of recognition.  相似文献   

11.
In this letter, a two-stage approach based on adaptive fuzzy C-means and wavelet transform clustering is proposed for efficient feature extraction of speaker recognition. Besides, the investigation includes the development of objective function for minimizing under unsupervised mode of training. Experimental results show that the speaker recognition rate is 95%.  相似文献   

12.
This work describes the process of collection and organization of a multi-style database for speaker recognition. The multi-style database organization is based on three different categories of speaker recognition: voice-password, text-dependent and text-independent framework. Three Indian institutes collaborated for the collection of the database at respective sites. The database is collected over an online telephone network that is deployed for speech based student attendance system. This enables the collection of data for a longer period from different speakers having session variabilities, which is useful for speaker verification (SV) studies in practical scenario. The database contains data of 923 speakers for the three different modes of SV and hence termed as multi-style speaker recognition database. This database is useful for session variability, multi-style speaker recognition and short utterance based SV studies. Initial results are reported over the database for the three different modes of SV. A copy of the database can be obtained by contacting the authors.  相似文献   

13.
This research work is a part of a global project of speech indexing entitled ISDS and concerns more particularly two machine learning classifier types: Neural Networks (NN) and Support Vector Machines (SVM), which are used by that project. However, in the present paper, we will only deal with the problem of speaker discrimination using a new relative reduced modelization for the speaker, restricting then our analysis to the new relative speaker characteristic used as input feature of the learning machines (NN and SVM). Speaker discrimination consists in checking whether two speech signals belong to the same speaker or not, by using some features of the speaker directly from his own speech. Our new proposed feature is based on a relative characterization of the speaker, called Relative Speaker Characteristic (RSC) and is well adapted for NN and SVM trainings. RSC consists in modeling one speaker relatively to another one, meaning that each speaker model is determined from both its speech signal and its dual speech. This investigation shows that the relative model, used as input of the classifier, optimizes the training, by speeding up the learning time and enhancing the discrimination accuracy of that classifier.  相似文献   

14.
In this paper we describe the collection and organization of the speaker recognition database in Indian scenario named as IITG Multivariability Speaker Recognition Database. The database contains speech from 451 speakers speaking English and other Indian languages both in conversational and read speech styles recorded using various sensors in parallel under different environmental conditions. The database is organized into four phases on the basis of different conditions employed for the recording. The results of the initial studies conducted on a speaker verification system exploring the impact of mismatch in training and test conditions using the collected data are also included. A copy of this database can be obtained from the authors by contacting them.  相似文献   

15.
This article presents the Spanish Iarg-AnCora corpus (400 k-words, 13,883 sentences) annotated with the implicit arguments of deverbal nominalizations (18,397 occurrences). We describe the methodology used to create it, focusing on the annotation scheme and criteria adopted. The corpus was manually annotated and an interannotator agreement test was conducted (81 % observed agreement) in order to ensure the reliability of the final resource. The annotation of implicit arguments results in an important gain in argument and thematic role coverage (128 % on average). It is the first corpus annotated with implicit arguments for the Spanish language with a wide coverage that is freely available. This corpus can subsequently be used by machine learning-based semantic role labeling systems, and for the linguistic analysis of implicit arguments grounded on real data. Semantic analyzers are essential components of current language technology applications, which need to obtain a deeper understanding of the text in order to make inferences at the highest level to obtain qualitative improvements in the results.  相似文献   

16.
基于听觉模型的特性,仿照MFCC参数提取过程,提出了一种基于Gammatone滤波器组的说话人语音特征提取方法。该方法用Gammatone滤波器组代替三角滤波器组求得倒谱系数,并且可以调整Gammatone滤波器组的通道数和带宽。将该方法所求得的特征在高斯混合模型识别系统中进行仿真实验,实验结果表明,该特征在一定情况下优于MFCC特征在系统的识别率,同时在Gammatone滤波器组通道数较高或滤波器带宽较小的情况下,系统具有较高的识别率。  相似文献   

17.
In this paper, an in-depth analysis is undertaken into effective strategies for integrating the audio-visual speech modalities with respect to two major questions. Firstly, at what level should integration occur? Secondly, given a level of integration how should this integration be implemented? Our work is based around the well-known hidden Markov model (HMM) classifier framework for modeling speech. A novel framework for modeling the mismatch between train and test observation sets is proposed, so as to provide effective classifier combination performance between the acoustic and visual HMM classifiers. From this framework, it can be shown that strategies for combining independent classifiers, such as the weighted product or sum rules, naturally emerge depending on the influence of the mismatch. Based on the assumption that poor performance in most audio-visual speech processing applications can be attributed to train/test mismatches we propose that the main impetus of practical audio-visual integration is to dampen the independent errors, resulting from the mismatch, rather than trying to model any bimodal speech dependencies. To this end a strategy is recommended, based on theory and empirical evidence, using a hybrid between the weighted product and weighted sum rules in the presence of varying acoustic noise for the task of text-dependent speaker recognition.  相似文献   

18.
Statistical information on a substantial corpus of representative Spanish texts is needed in order to determine the significance of data about individual authors or texts by means of comparison. This study describes the organization and analysis of a 150,000-word corpus of 30 well-known twentieth-century Spanish authors. Tables show the computational results of analyses involving sentences, segments, quotations, and word length.The article explains the considerations that guided content, selection, and sample size, and describes special editing needed for the input of Spanish text. Separate sections highlight and comment upon some of the findings.The corpus and the tables provide objective data for studies of homogeneity and heterogeneity. The format of the tables permits others to add to the original 30 authors, organize the results by categories, or use the cumulative results for normative comparisons.Estelle Irizarry is Professor of Spanish at Georgetown University and author of 20 books and annotated editions dealing with Hispanic literature, art, and hoaxes. Her latest book, an edition of Infortunios de Alonso Ramirez, treats the disputed authorship of Spanish America's first novel. She is Courseware Editor of CHum.  相似文献   

19.
《微型机与应用》2016,(7):54-56
如今,说话人识别技术已经比较成熟,但依然有很多因素影响说话人识别系统的稳定性。本文针对说话速度对说话人识别的影响进行了一系列的研究工作。通过模型空间分布可视化和语音频谱观察两方面来分析不同语速语音的差距。然后,提出了最大似然线性回归(MLLR)和Constraint MLLR(CMLLR)的方法对模型和特征进行变换,使训练端和测试端的语音特征互相接近匹配。通过实验发现,MLLR和CMLLR能较好地提高说话人识别系统中语速鲁棒性。  相似文献   

20.
This article puts forward a new algorithm for voice conversion which not only removes the necessity of parallel corpus in the training phase but also resolves the issue of insufficiency of the target speaker’s corpus. The proposed approach is based on one of the new voice conversion models utilizing classical LPC analysis-synthesis model combined with GMM. Through this algorithm, the conversion functions among vowels and demi-syllables are derived. We assumed that these functions are rather the same for different speakers if their genders, accents, and languages are alike. Therefore, we will be able to produce the demi-syllables with just having access to few sentences from the target speaker and forming the GMM for one of his/her vowels. The results from the appraisal of the proposed method for voice conversion clarifies that this method has the ability to efficiently realize the speech features of the target speaker. It can also provide results comparable to the ones obtained through the parallel-corpus-based approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号