首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers, but has difficulties dealing with pronunciation errors like phoneme insertions or substitutions. For these pronunciation mismatches, pronunciation modeling can make the recognition system more robust. Our approach is based on acoustic model transformation and pronunciation modeling for multiple non-native accents. For acoustic model transformation, two approaches are evaluated: MAP and model re-estimation. For pronunciation modeling, confusion rules (alternate pronunciations) are automatically extracted from a small non-native speech corpus. This paper presents a novel approach to introduce confusion rules in the recognition system which are automatically learned through pronunciation modelling. The modified HMM of a foreign spoken language phoneme includes its canonical pronunciation along with all the alternate non-native pronunciations, so that spoken language phonemes pronounced correctly by a non-native speaker could be recognized. We evaluate our approaches on the European project HIWIRE non-native corpus which contains English sentences pronounced by French, Italian, Greek and Spanish speakers. Two cases are studied: the native language of the test speaker is either known or unknown. Our approach gives better recognition results than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. We obtain 22% WER reduction compared to the reference system.  相似文献   

2.
李婧  黄双  张波 《计算机工程》2008,34(22):207-209
将已经成功应用到说话人识别/确认领域中的高斯混合模型和全局背景模型(UBM)引入语音发音质量评价领域,提出一种新的评价英语发音质量的算法。该算法训练出标准发音的全局背景模型。UBM模型描述与音素无关的特征分布,定义段时长归一化的相似度比例对数为音素的发音质量分数,综合得到整句发音的评分结果。实验证明,在实验室自行采集的非母语语音数据库上,该算法评分与专家评分的相关性达到了0.700,优于其他评分算法。  相似文献   

3.
4.
马继涌  高文 《计算机学报》1999,22(11):1127-1132
研究了随机提示文本的话者确认技术中的几个关键技术,包括确认算法的训练和识别速度、话者确认文本和说话方式的选择,测试文本长度的选择、阈值的设定及话者语音的中长期变异的自适应处 提高训练和识别速度,该文提出了快速动态高斯混合话者模型,讨论了音素对话者确认系统的影响及测试文本长度对话者确认系统性能的影响。提出话者语音假性的中长期变异性的自适应增量学习的方法,同时文中详细地分析了一次和多次测试时话者的弃真  相似文献   

5.
In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.  相似文献   

6.
This paper focuses on the problem of language transfer in foreign language learning. The transfer often leads to a communicative gap, which is caused by the difference between a learner's mother language (ML) and the target language (TL). This paper first analyzes the semantic relations between the ML and the TL. Then it proposes a CGM (Communicative Gap Model) because of the meaning difference between both languages. We have developed a computer assisted language-learning system called Neclle (Network-based Communicative Language-Learning Environment) in order to support foreign language learning through communication using a text based chat tool. Neclle has a software agent called Ankle (Agent for Kanji Learning), which observes the conversation between a learner and a native speaker, looks up a communicative gap in the learner's utterance automatically according to CGM, the student model and the word dictionary of both languages, intervenes into the conversation, and gives an instruction for bridging the gap. Then, the learner can not only be aware of the gap but also acquire its cultural background from the native speaker. In our case study, Chinese students used Neclle for Japanese language learning. Japanese language had incorporated with Chinese kanji but the meaning of a kanji sometimes differed between two languages. Therefore, the Chinese learners who want to study Japanese language have to pay much attention to the meaning gap between Chinese and Japanese language. In the evaluation of Neclle, nine Chinese students talked with Japanese students about three topics with Neclle for 1 h 30 min. The results of the experiment showed that it was very useful for Japanese language learning.  相似文献   

7.
In this paper, we propose a research work on speaker discrimination using a multi-classifier fusion with focus on feature reduction effects. Speaker discrimination consists in the automatic distinction between two speakers using the vocal characteristics of their speeches. A number of features are extracted using Mel Frequency Spectral Coefficients and then reduced using Relative Speaker Characteristic (RSC) along with the Principal Components Analysis (PCA). Several classification methods are implemented to ensure the discrimination task. Since different classifiers are employed, two fusion algorithms at the decision level, referred to as Weighted Fusion and Fuzzy Fusion, are proposed to boost the classification performances. These algorithms are based on the weighting of the different classifiers outputs. Furthermore, the effects of speaker gender and feature reduction on the speaker discrimination task have been examined too. The evaluation of our approaches was conducted on a subset of Hub-4 Broadcast-News. The experimental results have shown that the speaker discrimination accuracy is improved by 5–15% using the (RSC–PCA) feature reduction. In addition, the proposed fusion methods recorded an improvement of about 10% compared to the individual scores of the classifiers. Finally, we noticed that the gender has an important impact on the discrimination performances.  相似文献   

8.
文本分类任务作为文本挖掘的核心问题,已成为自然语言处理领域的一个重要课题.而短文本分类由于稀疏性、实时性和不规范性等特点,已经成为文本分类的亟待解决的问题之一.在某些特定的场景,短文本存在大量隐含语义,由此对挖掘有限文本内的隐含语义特征等任务带来挑战.已有的方法对短文本分类主要是采用传统机器学习或深度学习算法,但是该类算法的模型构建复杂且工作量大,效率不高.此外,短文本包含有效信息较少且口语化严重,对模型的特征学习能力要求较高.针对以上问题,本文提出了KAeRCNN模型,该模型在TextRCNN模型的基础上,融合了知识感知与双重注意力机制.知识感知包含了知识图谱实体链接和知识图谱嵌入,可以引入外部知识以获取语义特征,同时双重注意力机制可以提高模型对短文本中有效信息提取的效率.实验结果表明,KAeRCNN模型在分类准确度、F1值和实际应用效果等方面显著优于传统的机器学习算法.我们对算法的性能和适应性进行了验证,准确率达到95.54%,F1值达到0.901,对比四种传统机器学习算法,准确率平均提高了约14%,F1值提升了约13%.与TextRCNN相比,KAeRCNN模型在准确性方面提升了约3%.此外,与深度学习算法的对比实验结果也说明了我们的模型在其它领域的短文本分类中也有较好的表现.理论和实验结果都证明,提出的KAeRCNN模型对短文本分类效果更优.  相似文献   

9.
A trustworthy protocol is essential to evaluate a text detection algorithm in order to, first measure its efficiency and adjust its parameters and, second to compare its performances with those of other algorithms. However, current protocols do not give precise enough evaluations because they use coarse evaluation metrics, and deal with inconsistent matchings between the output of detection algorithms and the ground truth, both often limited to rectangular shapes. In this paper, we propose a new evaluation protocol, named EvaLTex, that solves some of the current problems associated with classical metrics and matching strategies. Our system deals with different kinds of annotations and detection shapes. It also considers different kinds of granularity between detections and ground truth objects and hence provides more realistic and accurate evaluation measures. We use this protocol to evaluate text detection algorithms and highlight some key examples that show that the provided scores are more relevant than those of currently used evaluation protocols.  相似文献   

10.
We present a novel evolutionary model for knowledge discovery from texts (KDTs), which deals with issues concerning shallow text representation and processing for mining purposes in an integrated way. Its aims is to look for novel and interesting explanatory knowledge across text documents. The approach uses natural language technology and genetic algorithms to produce explanatory novel hypotheses. The proposed approach is interdisciplinary, involving concepts not only from evolutionary algorithms but also from many kinds of text mining methods. Accordingly, new kinds of genetic operations suitable for text mining are proposed. The principles behind the representation and a new proposal for using multiobjective evaluation at the semantic level are described. Some promising results and their assessment by human experts are also discussed which indicate the plausibility of the model for effective KDT.  相似文献   

11.
A novel approach is introduced in this paper for the implementation of a question–answering based tool for the extraction of information and knowledge from texts. This effort resulted in the computer implementation of a system answering bilingual questions directly from a text using Natural Language Processing. The system uses domain knowledge concerning categories of actions and implicit semantic relations. The present state of the art in information extraction is based on the template approach which relies on a predefined user model. The model guides the extraction of information and the instantiation of a template that is similar to a frame or set of attribute value pairs as the result of the extraction process. Our question–answering based approach aims to create flexible information extraction tools accepting natural language questions and generating answers that contain information extracted from text either directly or after applying deductive inference. Our approach also addresses the problem of implicit semantic relations occurring either in the questions or in the texts from which information is extracted. These relations are made explicit with the use of domain knowledge. Examples of application of our methods are presented in this paper concerning four domains of quite different nature. These domains are: oceanography, medical physiology, aspirin pharmacology and ancient Greek law. Questions are expressed both in Greek and English. Another important point of our method is to process text directly avoiding any kind of formal representation when inference is required for the extraction of facts not mentioned explicitly in the text. This idea of using text as knowledge base was first presented in Kontos [7] and further elaborated in [9,11,12] as the ARISTA method. This is a new method for knowledge acquisition from texts that is based on using natural language itself for knowledge representation.  相似文献   

12.
基于大规模语料训练的语言模型,在文本生成任务上取得了突出性能表现.然而研究发现,这类语言模型在受到扰动时可能会产生攻击性的文本.这种不确定的攻击性给语言模型的研究和实际使用带来了困难,为了避免风险,研究人员不得不选择不公开论文的语言模型.因此,如何自动评价语言模型的攻击性成为一项亟待解决的问题.针对该问题,该文提出了一...  相似文献   

13.
In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the user by way of speech recognition. In this work we present results of several different unsupervised and cross-lingual adaptation approaches as well as an end-to-end speaker adaptive speech-to-speech translation system. Our experiments show that we can successfully apply speaker adaptation in both unsupervised and cross-lingual scenarios and our proposed algorithms seem to generalise well for several language pairs. We also discuss important future directions including the need for better evaluation metrics.  相似文献   

14.
During our pronunciation process, the position and movement properties of articulators such as tongue, jaw, lips, etc are mainly captured by the articulatory movement features (AMFs). This paper investigates to use the AMFs for short-duration text-dependent speaker verification. The AMFs can characterize the relative motion trajectory of articulators of individual speakers directly, which is rarely affected by the external environment. Therefore, we expect that, the AMFs are superior to the traditional acoustic features, such as mel-frequency cepstral coefficients (MFCC), to characterize the speaker identity differences between speakers. The speaker similarity scores measured by the dynamic time warping (DTW) algorithm are used to make the speaker verification decisions. Experimental results show that the AMFs can bring significant performance gains over the traditional MFCC features for short-duration text-dependent speaker verification task.  相似文献   

15.
High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker in current corpora, high-level systems now have the amount of data needed to sufficiently characterize a speaker. Although a significant amount of work has been done in finding novel high-level features, less work has been done on modeling these features. We describe a method of speaker modeling based upon support vector machines. Current high-level feature extraction produces sequences or lattices of tokens for a given conversation side. These sequences can be converted to counts and then frequencies of n-gram for a given conversation side. We use support vector machine modeling of these n-gram frequencies for speaker verification. We derive a new kernel based upon linearizing a log likelihood ratio scoring system. Generalizations of this method are shown to produce excellent results on a variety of high-level features. We demonstrate that our methods produce results significantly better than standard log-likelihood ratio modeling. We also demonstrate that our system can perform well in conjunction with standard cesptral speaker recognition systems.  相似文献   

16.
A common requirement in speech technology is to align two different symbolic representations of the same linguistic ‘message’. For instance, we often need to align letters of words listed in a dictionary with the corresponding phonemes specifying their pronunciation. As dictionaries become ever bigger, manual alignment becomes less and less tenable yet automatic alignment is a hard problem for a language like English. In this paper, we describe the use of a form of the expectation-maximization (EM) algorithm to learn alignments of English text and phonemes, starting from a variety of initializations. We use the British English Example Pronunciation (BEEP) dictionary of almost 200,000 words in this work. The quality of alignment is difficult to determine quantitatively since no ‘gold standard’ correct alignment exists. We evaluate the success of our algorithm indirectly from the performance of a pronunciation by analogy system using the aligned dictionary data as a knowledge base for inferring pronunciations. We find excellent performance—the best so far reported in the literature. There is very little dependence on the start point for alignment, indicating that the EM search space is strongly convex. Since the aligned BEEP dictionary is a potentially valuable resource, it is made freely available for research use.  相似文献   

17.
Sentiment lexicons and word embeddings constitute well-established sources of information for sentiment analysis in online social media. Although their effectiveness has been demonstrated in state-of-the-art sentiment analysis and related tasks in the English language, such publicly available resources are much less developed and evaluated for the Greek language. In this paper, we tackle the problems arising when analyzing text in such an under-resourced language. We present and make publicly available a rich set of such resources, ranging from a manually annotated lexicon, to semi-supervised word embedding vectors and annotated datasets for different tasks. Our experiments using different algorithms and parameters on our resources show promising results over standard baselines; on average, we achieve a 24.9% relative improvement in F-score on the cross-domain sentiment analysis task when training the same algorithms with our resources, compared to training them on more traditional feature sources, such as n-grams. Importantly, while our resources were built with the primary focus on the cross-domain sentiment analysis task, they also show promising results in related tasks, such as emotion analysis and sarcasm detection.  相似文献   

18.
基于滑动窗口的微博时间线摘要算法   总被引:1,自引:0,他引:1  
时间线摘要是在时间维度上对文本进行内容归纳和概要生成的技术。传统的时间线摘要主要研究诸如新闻之类的长文本,而本文研究微博短文本的时间线摘要问题。由于微博短文本内容特征有限,无法仅依靠文本内容生成摘要,本文采用内容覆盖性、时间分布性和传播影响力3种指标评价时间线摘要,并提出了基于滑动窗口的微博时间线摘要算法(Microblog timeline summariaztion based on sliding window, MTSW)。该算法首先利用词项强度和熵来确定代表性词项;然后基于上述3种指标构建出评价时间线摘要的综合评价指标;最后采用滑动窗口的方法,遍历时间轴上的微博消息序列,生成微博时间线摘要。利用真实微博数据集的实验结果表明,MTSW算法生成的时间线摘要可以有效地反映热点事件发展演化的过程。  相似文献   

19.
在英文TTS(text to speech)系统中,需要根据文本中每一个单词的发音来合成语音.由于在真实文本的处理中,无论词典规模如何大,都不可能包括文本中的每一个单词,所以需要使用某种算法来预测词典中未登录单词的发音.介绍了一种基于实例学习的方法,并在一个大规模的英语词典上进行了性能评测.结果表明,这种方法的单词发音正确率可以达到70.1%,显著地超过以往报导的其他自动预测方法.  相似文献   

20.
《Digital Signal Processing》2000,10(1-3):93-112
Dunn, Robert B., Reynolds, Douglas A., and Quatieri, Thomas F., Approaches to Speaker Detection and Tracking in Conversational Speech, Digital Signal Processing10(2000), 93–112.Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universal background model (GMM-UBM) speaker detection system as the core speaker recognition engine. In one approach, the individual log-likelihood ratio scores, which are produced on a frame-by-frame basis by the GMM-UBM system, are used to first partition the speech file into speaker homogenous regions and then to create scores for these regions. We refer to this approach as internal segmentation. Another approach uses an external segmentationalgorithm, based on blind clustering, to partition the speech file into speaker homogenous regions. The adapted GMM-UBM system then scores each of these regions as in the single-speaker recognition case. We show that the external segmentation system outperforms the internal segmentation system for both detection and tracking. In addition, we show how different components of the detection and tracking algorithms contribute to the overall system performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号