首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
This paper describes the development of LSESpeak, a spoken Spanish generator for Deaf people. This system integrates two main tools: a sign language into speech translation system and an SMS (Short Message Service) into speech translation system. The first tool is made up of three modules: an advanced visual interface (where a deaf person can specify a sequence of signs), a language translator (for generating the sequence of words in Spanish), and finally, an emotional text to speech (TTS) converter to generate spoken Spanish. The visual interface allows a sign sequence to be defined using several utilities. The emotional TTS converter is based on Hidden Semi-Markov Models (HSMMs) permitting voice gender, type of emotion, and emotional strength to be controlled. The second tool is made up of an SMS message editor, a language translator and the same emotional text to speech converter. Both translation tools use a phrase-based translation strategy where translation and target language models are trained from parallel corpora. In the experiments carried out to evaluate the translation performance, the sign language-speech translation system reported a 96.45 BLEU and the SMS-speech system a 44.36 BLEU in a specific domain: the renewal of the Identity Document and Driving License. In the evaluation of the emotional TTS, it is important to highlight the improvement in the naturalness thanks to the morpho-syntactic features, and the high flexibility provided by HSMMs when generating different emotional strengths.  相似文献   

3.
4.
This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Española: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.  相似文献   

5.
为了达到辅助老师教聋哑学生语文的目的,开发一套文本翻译成手语的教学系统。采用改进的结巴分词对课文内容进行分词,课文句子转化成词语序列,使用系统编辑功能对词语序列进行编辑,使其满足文法手语要求;同时建立虚拟人,采用关键帧技术制作手语动画,使用Unity3D游戏引擎完成手语动画合成和动画之间的过渡,实现课文内容自动翻译成手语的辅助教学系统。该研究对聋哑学生语文教学有特殊的意义,具有一定的实用价值。  相似文献   

6.
针对手语翻译方法所存在的动作特征提取以及时序翻译方面存在的问题,提出一种融合自适应图卷积AGCN 与Transformer时序模型的AGCN-T手语翻译网络。自适应图卷积网络用于学习手语动作中骨骼节点的交互空间依赖信息;Transformer时序模块捕捉手语动作序列的时间关系特征信息并将其翻译成可理解的手语内容。此外,在预处理部分,提出了一种移动窗口的关键帧提取算法,并用MediaPipe姿态估计算法对关键帧图像序列进行骨架提取。实验表明,该方法在大型中文连续手语数据集CCSL的词错率达到了3.75%,精度为97.87%,优于其他先进的手语翻译方法。  相似文献   

7.
文章探讨了如何让在手语新闻播报中的卡通人按照自然手语的语法规则而非正常人的语法规则来打手语。首先整理了现代汉语自然手语的规则并将其形式化,并建立了正常汉语到汉语自然手语转换的形式规则库;从而实现了现代汉语文本到相应的自然手语的手语动作序列的自动生成。最后将其嵌入到通过手语合成技术和卡通动画的手语新闻播报系统中,使其在线输出的是符合聋人习惯的自然手语。  相似文献   

8.
Waibel  A. 《Computer》1996,29(7):41-48
As communication becomes increasingly automated and transnational, the need for rapid, computer-aided speech translation grows. The Janus-II system uses paraphrasing and interactive error correction to boost performance. Janus-II operates on spontaneous conversational human dialogue in limited domains with vocabularies of 3,000 or more words. Current experiments involve 10,000 to 40,000 word vocabularies. It now accepts English, German, Japanese, Spanish, and Korean input, which it translates into any other of these languages. Beyond translating syntactically well-formed speech or carefully structured human-to-machine speech utterances, Janus-II research has focused on the more difficult task of translating spontaneous conversational speech between humans. This naturally requires a suitable database and task domain  相似文献   

9.
Sign language (SL) is a kind of natural language for the deaf. Chinese Sign Language (CSL) synthesis aims to translate text into virtual human animation, which makes information and service accessible to the deaf. Generally, sign language animation based on key frames is realized by concatenating sign words captured independently. That means a sign language word has the same pattern in diverse context, which is different from realistic sign language expression. This paper studies the effect of context on manual gesture and non-manual gesture, and presents a method for generating stylized manual gesture and non-manual gesture according to the context. Experimental results show that synthesized sign language animation considering context based on the proposed method is more accurate and intelligible than that irrespective of context.  相似文献   

10.
This paper proposes a new technique to increase the robustness of spoken dialogue systems employing an automatic procedure that aims to correct frames incorrectly generated by the system’s component that deals with spoken language understanding. To do this the technique carries out a training that takes into account knowledge of previous system misunderstandings. The correction is transparent for the user as he is not aware of some mistakes made by the speech recogniser and thus interaction with the system can proceed more naturally. Experiments have been carried out using two spoken dialogue systems previously developed in our lab: Saplen and Viajero, which employ prompt-dependent and prompt-independent language models for speech recognition. The results obtained from 10,000 simulated dialogues show that the technique improves the performance of the two systems for both kinds of language modelling, especially for the prompt-independent language model. Using this type of model the Saplen system increases sentence understanding by 19.54%, task completion by 26.25%, word accuracy by 7.53%, and implicit recovery of speech recognition errors by 20.3%, whereas for the Viajero system these figures increase by 14.93%, 18.06%, 6.98% and 15.63%, respectively.  相似文献   

11.
基于中国手语合成技术的虚拟人手语视频显示平台技术是一个全新的课题.为了满足广电新闻节目对手势运动流畅性要求,实现了一种上下文相关的手势运动平滑算法,该方法能够充分利用前后两帧的差异来实现手势运动的平滑过渡,其视觉感观效果较传统插值算法更加平滑自然;同时提出了一种基于统计和规则相结合的手势运动重定向算法,在统计方法的基础上针对不同骨架大小以及运动特性进行规则约束,使得标准模型手势运动数据应用到新模型上而不失其准确性;最后,通过扩展基本手语词表达形式并基于alpha融合技术实现了面向广电新闻节目的虚拟人手语合成显示平台并取得很好的结果.  相似文献   

12.
13.
《Knowledge》2006,19(3):153-163
Spoken dialogue systems can be considered knowledge-based systems designed to interact with users using speech in order to provide information or carry out simple tasks. Current systems are restricted to well-known domains that provide knowledge about the words and sentences the users will likely utter. Basically, these systems rely on an input interface comprised of speech recogniser and semantic analyser, a dialogue manager, and an output interface comprised of response generator and speech synthesiser. As an attempt to enhance the performance of the input interface, this paper proposes a technique based on a new type of speech recogniser comprised of two modules. The first one is a standard speech recogniser that receives the sentence uttered by the user and generates a graph of words. The second module analyses the graph and produces the recognised sentence using the context knowledge provided by the current prompt of the system. We evaluated the performance of two input interfaces working in a previously developed dialogue system: the original interface of the system and a new one that features the proposed technique. The experimental results show that when the sentences uttered by the users are out-of-context analysed by the new interface, the word accuracy and sentence understanding rates increase by 93.71 and 77.42% absolute, respectively, regarding the original interface. The price to pay for this clear enhancement is a little reduction in the scores when the new interface analyses sentences in-context, as they decrease by 2.05 and 3.41% absolute, respectively, in comparison with the original interface. Given that in real dialogues sentences may be out-of-context analysed, specially when they are uttered by inexperienced users, the technique can be very useful to enhance the system performance.  相似文献   

14.
System Combination for Machine Translation of Spoken and Written Language   总被引:1,自引:0,他引:1  
This paper describes an approach for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The consensus translation is computed by weighted majority voting on a confusion network, similarly to the well-established ROVER approach of Fiscus for combining speech recognition hypotheses. To create the confusion network, pairwise word alignments of the original MT hypotheses are learned using an enhanced statistical alignment algorithm that explicitly models word reordering. The context of a whole corpus of automatic translations rather than a single sentence is taken into account in order to achieve high alignment quality. The confusion network is rescored with a special language model, and the consensus translation is extracted as the best path. The proposed system combination approach was evaluated in the framework of the TC-STAR speech translation project. Up to six state-of-the-art statistical phrase-based translation systems from different project partners were combined in the experiments. Significant improvements in translation quality from Spanish to English and from English to Spanish in comparison with the best of the individual MT systems were achieved under official evaluation conditions.   相似文献   

15.
This article presents statistical language translation models,called dependency transduction models, based on collectionsof head transducers. Head transducers are middle-out finite-state transducers which translate a head word in a source stringinto its corresponding head in the target language, and furthertranslate sequences of dependents of the source head into sequencesof dependents of the target head. The models are intended to capturethe lexical sensitivity of direct statistical translation models,while at the same time taking account of the hierarchical phrasalstructure of language. Head transducers are suitable for directrecursive lexical translation, and are simple enough to be trainedfully automatically. We present a method for fully automatictraining of dependency transduction models for which the only inputis transcribed and translated speech utterances. The method has beenapplied to create English–Spanish and English–Japanese translationmodels for speech translation applications. The dependencytransduction model gives around 75% accuracy for an English–Spanishtranslation task (using a simple string edit-distance measure) and70% for an English–Japanese translation task. Enhanced with targetn-grams and a case-based component, English–Spanish accuracy is over76%; for English–Japanese it is 73% for transcribed speech, and60% for translation from recognition word lattices.  相似文献   

16.
17.
18.
One of the aims of Assistive Technologies is to help people with disabilities to communicate with others and to provide means of access to information. As an aid to Deaf people, we present in this work a production-quality rule-based machine system for translating from Spanish to Spanish Sign Language (LSE) glosses, which is a necessary precursor to building a full machine translation system that eventually produces animation output. The system implements a transfer-based architecture from the syntactic functions of dependency analyses. A sketch of LSE is also presented. Several topics regarding translation to sign languages are addressed: the lexical gap, the bootstrapping of a bilingual lexicon, the generation of word order for topic-oriented languages, and the treatment of classifier predicates and classifier names. The system has been evaluated with an open-domain testbed, reporting a 0.30 BLEU (BiLingual Evaluation Understudy) and 42% TER (Translation Error Rate). These results show consistent improvements over a statistical machine translation baseline, and some improvements over the same system preserving the word order in the source sentence. Finally, the linguistic analysis of errors has identified some differences due to a certain degree of structural variation in LSE.  相似文献   

19.
Computer animation and visualization can facilitate communication between the hearing impaired and those with normal speaking capabilities. This paper presents a model of a system that is capable of translating text from a natural language into animated sign language. Techniques have been developed to analyse language and transform it into sign language in a systematic way. A hand motion coding method as applied to the hand motion representation, and control has also been investigated. Two translation examples are also given to demonstrate the practicality of the system.  相似文献   

20.
基于深度神经网络的语音驱动发音器官的运动合成   总被引:1,自引:0,他引:1  
唐郅  侯进 《自动化学报》2016,42(6):923-930
实现一种基于深度神经网络的语音驱动发音器官运动合成的方法,并应用于语音驱动虚拟说话人动画合成. 通过深度神经网络(Deep neural networks, DNN)学习声学特征与发音器官位置信息之间的映射关系,系统根据输入的语音数据估计发音器官的运动轨迹,并将其体现在一个三维虚拟人上面. 首先,在一系列参数下对比人工神经网络(Artificial neural network, ANN)和DNN的实验结果,得到最优网络; 其次,设置不同上下文声学特征长度并调整隐层单元数,获取最佳长度; 最后,选取最优网络结构,由DNN 输出的发音器官运动轨迹信息控制发音器官运动合成,实现虚拟人动画. 实验证明,本文所实现的动画合成方法高效逼真.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号