期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

MFCC-GMM based accent recognition system for Telugu speech signals

Kasiprasad Mannepalli Panyam Narahari Sastry Maloji Suman 《International Journal of Speech Technology》2016,19(1):87-93

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. Identification of the accent before the speech recognition can improve performance of the speech recognition systems. If the number of accents is more in a language, the accent recognition becomes crucial. Telugu is an Indian language which is widely spoken in Southern part of India. Telugu language has different accents. The main accents are coastal Andhra, Telangana, and Rayalaseema. In this present work the samples of speeches are collected from the native speakers of different accents of Telugu language for both training and testing. In this work, Mel frequency cepstral coefficients (MFCC) features are extracted for each speech of both training and test samples. In the next step Gaussian mixture model (GMM) is used for classification of the speech based on accent. The overall efficiency of the proposed system to recognize the speaker, about the region he belongs, based on accent is 91 %. 相似文献

2.

云南民族口音汉语普通话语音识别研究

普园媛杨鉴尉洪赵征鹏《计算机工程与应用》2005,41(11):45-47

该文根据云南境内少数民族同胞说普通话时明显带有民族口音的语言使用现状,介绍了一个以研究非母语说话人汉语连续语音识别为目的的云南少数民族口音汉语普通话语音数据库,并在其基础上开展了发音变异规律、说话人自适应和非母语说话人口音识别研究,是汉语语音识别中用户多样性研究的重要补充。相似文献

3.

Accent Issues in Large Vocabulary Continuous Speech Recognition

Chao Huang Tao Chen Eric Chang 《International Journal of Speech Technology》2004,7(2-3):141-153

This paper addresses accent¹ issues in large vocabulary continuous speech recognition. Cross-accent experiments show that the accent problem is very dominant in speech recognition. Analysis based on multivariate statistical tools (principal component analysis and independent component analysis) confirms that accent is one of the key factors in speaker variability. Considering different applications, we proposed two methods for accent adaptation. When a certain amount of adaptation data was available, pronunciation dictionary modeling was adopted to reduce recognition errors caused by pronunciation mistakes. When a large corpus was collected for each accent type, accent-dependent models were trained and a Gaussian mixture model-based accent identification system was developed for model selection. We report experimental results for the two schemes and verify their efficiency in each situation. 相似文献

4.

深度神经网络建模方法用于数据缺乏的带口音普通话语音识别的研究

谢旭荣隋相刘循英王岚《集成技术》2015,4(6):26-36

众所周知中文普通话被众多的地区口音强烈地影响着,然而带不同口音的普通话语音数据却十分缺乏。因此,普通话语音识别的一个重要目标是恰当地模拟口音带来的声学变化。文章给出了隐式和显式地使用口音信息的一系列基于深度神经网络的声学模型技术的研究。与此同时,包括混合条件训练,多口音决策树状态绑定,深度神经网络级联和多级自适应网络级联隐马尔可夫模型建模等的多口音建模方法在本文中被组合和比较。一个能显式地利用口音信息的改进多级自适应网络级联隐马尔可夫模型系统被提出,并应用于一个由四个地区口音组成的、数据缺乏的带口音普通话语音识别任务中。在经过序列区分性训练和自适应后,通过绝对上 0.8% 到 1.5%(相对上 6% 到 9%)的字错误率下降,该系统显著地优于基线的口音独立深度神经网络级联系统。相似文献

5.

基于高斯混合模型的非母语说话人口音识别

赵征鹏杨鉴《计算机工程》2005,31(6):148-150

针对云南境内白族、纳西族、傈僳族3个典型的少数民族及汉族普通话语音,采用了高斯混合模型来训练每个民族的口音模型,并用少量的测试语音来获得较为满意的口音分类识别率,目的在于探索降低非母语口音话者语音识别错误率的有效途径.该文通过实验给出,对云南民族口音汉语普通话口音识别,当混合数为16,语音特征采用39维MFCC及其一阶、二阶差分参数时,口音识别正确率可达90.83%. 相似文献

6.

一种用于方言口音语音识别的字典自适应技术 总被引：2，自引：1，他引：1

潘复平赵庆卫颜永红《计算机工程与应用》2005,41(23):4-6,9

基于标准普通话的语音识别系统在识别带有方言口音的普通话时,识别率会下降很多。针对这一问题,论文介绍了一种“字典自适应技术”。文中首先提出了一种自动标注算法,然后以此为基础,通过分析语音数据,统计出带有方言口音普通话的发音规律,然后把这个规律编码到标准普通话字典里,构造出体现这种方言发音特征的新字典,最后把新字典整合于搜索框架,用于识别带有该方言口音的普通话,使识别率得到显著提高。相似文献

7.

Human and computer recognition of regional accents and ethnic groups from British English speech

A. Hanani M.J. Russell M.J. Carey 《Computer Speech and Language》2013,27(1):59-74

The paralinguistic information in a speech signal includes clues to the geographical and social background of the speaker. This paper is concerned with automatic extraction of this information from a short segment of speech. A state-of-the-art language identification (LID) system is applied to the problems of regional accent recognition for British English, and ethnic group recognition within a particular accent. We compare the results with human performance and, for accent recognition, the ‘text dependent’ ACCDIST accent recognition measure. For the 14 regional accents of British English in the ABI-1 corpus (good quality read speech), our LID system achieves a recognition accuracy of 89.6%, compared with 95.18% for our best ACCDIST-based system and 58.24% for human listeners. The “Voices across Birmingham” corpus contains significant amounts of telephone conversational speech for the two largest ethnic groups in the city of Birmingham (UK), namely the ‘Asian’ and ‘White’ communities. Our LID system distinguishes between these two groups with an accuracy of 96.51% compared with 90.24% for human listeners. Although direct comparison is difficult, it seems that our LID system performs much better on the standard 12 class NIST 2003 Language Recognition Evaluation task or the two class ethnic group recognition task than on the 14 class regional accent recognition task. We conclude that automatic accent recognition is a challenging task for speech technology, and speculate that the use of natural conversational speech may be advantageous for these types of paralinguistic task. 相似文献

8.

The effect of code-mixing on accent identification accuracy

Thomas Niesler Febe de Wet 《Computer Speech and Language》2009,23(4):435-443

We investigate whether accent identification is more effective for English utterances embedded in a different language as part of a mixed code than for English utterances that are part of a monolingual dialogue. Our focus is on Xhosa and Zulu, two South African languages for which code-mixing with English is very common. In order to carry out our investigation, we extract English utterances from mixed-code Xhosa and Zulu speech corpora, as well as comparable utterances from an English-only corpus by Xhosa and Zulu mother-tongue speakers. Experiments using automatic accent identification systems show that identification is substantially more accurate for the utterances originating from the mixed-code speech. These findings are supported by a corresponding set of perceptual experiments in which human subjects were asked to identify the accents of recorded utterances. We conclude that accent identification is more successful for these utterances because accents are more pronounced for English embedded in mother-tongue speech than for English spoken as part of a monolingual dialogue by non-native speakers. Furthermore we find that this is true for human listeners as well as for automatic identification systems. 相似文献

9.

Advances in phone-based modeling for automatic accent classification

Angkititrakul P. Hansen J.H.L. 《IEEE transactions on audio, speech, and language processing》2006,14(2):634-646

It is suggested that algorithms capable of estimating and characterizing accent knowledge would provide valuable information in the development of more effective speech systems such as speech recognition, speaker identification, audio stream tagging in spoken document retrieval, channel monitoring, or voice conversion. Accent knowledge could be used for selection of alternative pronunciations in a lexicon, engage adaptation for acoustic modeling, or provide information for biasing a language model in large vocabulary speech recognition. In this paper, we propose a text-independent automatic accent classification system using phone-based models. Algorithm formulation begins with a series of experiments focused on capturing the spectral evolution information as potential accent sensitive cues. Alternative subspace representations using principal component analysis and linear discriminant analysis with projected trajectories are considered. Finally, an experimental study is performed to compare the spectral trajectory model framework to a traditional hidden Markov model recognition framework using an accent sensitive word corpus. System evaluation is performed using a corpus representing five English speaker groups with native American English, and English spoken with Mandarin Chinese, French, Thai, and Turkish accents for both male and female speakers. 相似文献

10.

An approach to the problem of regional accent in automatic speech recognition

W. J. Barry C. E. Hoequist F. J. Nolan 《Computer Speech and Language》1989,3(4)

An approach to the problem of inter-speaker variability in automatic speech recognition is described which exploits systematic vowel differences in a two-stage process of adaptation to individual speaker characteristics. In stage one, an accent identification procedure selects one of four gross regional English accents on the basis of vowel quality differences within four calibration sentences. In stage two, an adjustment procedure shifts the regional reference vowel space onto the speaker's vowel space as calculated from the accent identification data. Results for 58 speakers from the four regional accent areas are presented. 相似文献

11.

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

G. Bouselmi D. Fohr I. Illina 《International Journal of Speech Technology》2012,15(2):203-213

This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers, but has difficulties dealing with pronunciation errors like phoneme insertions or substitutions. For these pronunciation mismatches, pronunciation modeling can make the recognition system more robust. Our approach is based on acoustic model transformation and pronunciation modeling for multiple non-native accents. For acoustic model transformation, two approaches are evaluated: MAP and model re-estimation. For pronunciation modeling, confusion rules (alternate pronunciations) are automatically extracted from a small non-native speech corpus. This paper presents a novel approach to introduce confusion rules in the recognition system which are automatically learned through pronunciation modelling. The modified HMM of a foreign spoken language phoneme includes its canonical pronunciation along with all the alternate non-native pronunciations, so that spoken language phonemes pronounced correctly by a non-native speaker could be recognized. We evaluate our approaches on the European project HIWIRE non-native corpus which contains English sentences pronounced by French, Italian, Greek and Spanish speakers. Two cases are studied: the native language of the test speaker is either known or unknown. Our approach gives better recognition results than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. We obtain 22% WER reduction compared to the reference system. 相似文献

12.

文本无关发音质量评估系统中声学模型的若干研究和改进

蒋同海齐耀辉葛凤培颜永红《微计算机应用》2012,1(2):47-53

在无关的发音质量评估系统中,需要先识别出待测语音的说话内容,才能进行准确评估。真实的评测数据往往有很多不利的因素影响识别正确率,包括噪声、方言口音、信道噪声、说话随意性等。针对这些不利因素,本文对声学模型进行了深入的研究,包括:在训练数据中加入背景噪声,增强了模型的抗噪声能力;采用基于说话人的倒谱均值方差规整(SCMVN),降低信道及说话人个体特性的影响;用和待测语音相同地域的朗读数据做最大后验概率(MAP)自适应,使模型带有当地方言口音的发音特点;用自然口语数据做MAP自适应,使模型较好地描述自然口语中比较随意的发音现象。实验结果表明,使用这些措施之后,使待测语音的识别正确率相对提高了44.1%,从而使机器评分和专家评分的相关系数相对提高了6.3%。相似文献

13.

Prior knowledge guided maximum expected likelihood based model selection and adaptation for nonnative speech recognition

《Computer Speech and Language》2007,21(2):247-265

In this paper, an improved method of model complexity selection for nonnative speech recognition is proposed by using maximum a posteriori (MAP) estimation of bias distributions. An algorithm is described for estimating hyper-parameters of the priors of the bias distributions, and an automatic accent classification algorithm is also proposed for integration with dynamic model selection and adaptation. Experiments were performed on the WSJ1 task with American English speech, British accented speech, and mandarin Chinese accented speech. Results show that the use of prior knowledge of accents enabled more reliable estimation of bias distributions with very small amounts of adaptation speech, or without adaptation speech. Recognition results show that the new approach is superior to the previous maximum expected likelihood (MEL) method, especially when adaptation data are very limited. 相似文献

14.

Modeling and synthesis of English regional accents with pitch and duration correlates

Qin Yan Saeed Vaseghi 《Computer Speech and Language》2010,24(4):711-725

This paper provides an introduction to the acoustic–phonetic structure of English regional accents and presents a signal processing method for the modeling and transformation of the acoustic correlates of English accents for example from British English to American English. The focus of this paper is on the modeling of intonation and duration correlates of accents as the modeling of formants is described in previous papers (Yan et al., 2007, Vaseghi et al., 2009). The intonation correlates of accents are modeled with the statistics of a set of broad features of the pitch contour. The statistical models of phoneme durations and word speaking rates are obtained from automatic segmentation of word/phoneme boundaries of speech databases. A contribution of this paper is the use of accent synthesis for comparative evaluation of the causal effects of the acoustic correlates of accent. The differences between the acoustics–phonetic realizations of British Received Pronunciation (RP), Broad Australian (BAU) and General American (GenAm) English accents are modeled and used in an accent transformation and synthesis method for evaluation of the influence of formant, pitch and duration on conveying accents. 相似文献

15.

基于韵律特征辅助的端到端语音识别方法

刘聪万根顺高建清付中华《计算机应用》2023,43(2):380-384

针对传统的语音识别系统采用数据驱动并利用语言模型来决策最优的解码路径,导致在部分场景下的解码结果存在明显的音对字错的问题,提出一种基于韵律特征辅助的端到端语音识别方法,利用语音中的韵律信息辅助增强正确汉字组合在语言模型中的概率。在基于注意力机制的编码-解码语音识别框架的基础上,首先利用注意力机制的系数分布提取发音间隔、发音能量等韵律特征;然后将韵律特征与解码端结合,从而显著提升了发音相同或相近、语义歧义情况下的语音识别准确率。实验结果表明,该方法在1 000 h及10 000 h级别的语音识别任务上分别较端到端语音识别基线方法在准确率上相对提升了5.2%和5.0%,进一步改善了语音识别结果的可懂度。相似文献

16.

Modeling of Vocal Styles Using Portable Features and Placement Rules

Chilin Shih Greg Kochanski 《International Journal of Speech Technology》2003,6(4):393-408

相似文献

17.

Speaker-independent ASR for Modern Standard Arabic: effect of regional accents

Ghania Droua-Hamdani Sid-Ahmed Selouani Malika Boudraa 《International Journal of Speech Technology》2012,15(4):487-493

This paper deals with speaker-independent Automatic Speech Recognition (ASR) system for continuous speech. This ASR system has been developed for Modern Standard Arabic (MSA) using recordings of six regions taken from ALGerian Arabic Speech Database (ALGASD), and has been designed by using Hidden Markov Models. The main purpose of this study is to investigate the effect of regional accent on speech recognition rates. First, the experiment assessed the general performance of the model for the data speech of six regions, details of the recognition results are performed to observe the deterioration of the performance of the ASR according to the regional variation included in the speech material. The results have shown that the ASR performance is clearly impacted by the regional accents of the speakers. 相似文献

18.

支持重音合成的汉语语音合成系统 总被引：1，自引：1，他引：1

朱维彬《中文信息学报》2007,21(3):122-128

针对基于单元挑选的汉语语音合成系统中重音预测及实现,本文采用了知识指导下的数据驱动建模策略。首先,采用经过感知结果优化的重音检测器,实现了语音数据库的自动标注;其次,利用重音标注数据库,训练得到支持重音预测的韵律预测模型;用重音韵律预测模型替代原语音合成系统中的相应模型,从而构成了支持重音合成的语音合成系统。实验结果分析表明,基于感知结果优化的重音检测器的标注结果是可靠的;支持重音的韵律声学预测模型是合理的;新的合成系统能够合成出带有轻重变化的语音。相似文献

19.

Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Dia AbuZeina Wasfi Al-Khatib Moustafa Elshafei Husni Al-Muhtaseb 《International Journal of Speech Technology》2012,15(2):65-75

Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model. 相似文献

20.

MLN-based Bangla ASR using context sensitive triphone HMM

Foyzul Hassan Mohammed Rokibul Alam Kotwal Ghulam Muhammad Mohammad Nurul Huda 《International Journal of Speech Technology》2011,14(3):183-191

Building a continuous speech recognizer for the Bangla (widely used as Bengali) language is a challenging task due to the unique inherent features of the language like long and short vowels and many instances of allophones. Stress and accent vary in spoken Bangla language from region to region. But in formal read Bangla speech, stress and accents are ignored. There are three approaches to continuous speech recognition (CSR) based on the sub-word unit viz. word, phoneme and syllable. Pronunciation of words and sentences are strictly governed by set of linguistic rules. Many attempts have been made to build continuous speech recognizers for Bangla for small and restricted tasks. However, medium and large vocabulary CSR for Bangla is relatively new and not explored. In this paper, the authors have attempted for building automatic speech recognition (ASR) method based on context sensitive triphone acoustic models. The method comprises three stages, where the first stage extracts phoneme probabilities from acoustic features using a multilayer neural network (MLN), the second stage designs triphone models to catch context of both sides and the final stage generates word strings based on triphone hidden Markov models (HMMs). The objective of this research is to build a medium vocabulary triphone based continuous speech recognizer for Bangla language. In this experimentation using Bangla speech corpus prepared by us, the recognizer provides higher word accuracy as well as word correct rate for trained and tested sentences with fewer mixture components in HMMs. 相似文献