首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
口语自动翻译系统技术评析   总被引:1,自引:1,他引:0  
近几年来,随着信息技术的发展,口语自动翻译技术成为新的研究热点。目前国际上一些著名大学和研究机构甚至企业,都纷纷加入这一高技术的竞争行列,我国在相关技术方面也进行了卓有成效的研究。本文对目前自动口语翻译研究的技术现状进行了全面综述和分析,并对一些具体问题作了深入探讨。作者希望本文作出的分析和讨论的问题,能够对我国的自动口语翻译研究提供有益的参考。  相似文献   

2.
介绍基于服务聚合的同声传译平台的设计与实现。该软件以服务聚合思想为基础,利用模块化开发技术,将现有的语音识别、机器翻译、TTS(Text to Speech)服务进行有效的聚合,从而实现了同声传译功能。该软件的最大特色是利用当前较为流行的服务聚合技术来实现同声传译功能。  相似文献   

3.
中等词汇的汉英语音翻译系统   总被引:1,自引:0,他引:1  
本文给出汉英语音翻译系统的组成,介绍了系统中连续汉语语音识别和汉英机器翻译的工作;我们已经在限定主题、中等词汇量的条件下实现了非特定人的连续语音识别,实现了汉英语音翻译实验演示系统。  相似文献   

4.
张凌霄  陈亦菲  郭瑜 《软件》2020,(4):244-246
全球化进程推动了人们的跨语种交流需求,无论个人层面或国家层面,翻译都担任着不可或缺的角色。机器翻译的产生与发展推动了交流的革新,渐渐成为生活中重要的一部分。面对市面上良莠不齐的软件,我们选择了目前研究较少的"实时语音翻译",针对"口译APP"深入重点学科领域展开调查研究,探究其现状与前景。  相似文献   

5.
在机器同传(MSI)流水线系统中,将自动语音识别(ASR)的输出直接输入神经机器翻译(NMT)中会产生语义不完整问题,为解决该问题,提出基于BERT(Bidirectional Encoder Representation from Transformers)和Focal Loss的模型。首先,将ASR系统生成的几个片段缓存并组成一个词串;然后,使用基于BERT的序列标注模型恢复该词串的标点符号,并利用Focal Loss作为模型训练过程中的损失函数来缓解无标点样本比有标点样本多的类别不平衡问题;最后,将标点恢复后的词串输入NMT中。在英-德和汉-英翻译上的实验结果表明,在翻译质量上,使用提出的标点恢复模型的MSI,比将ASR输出直接输入NMT的MSI分别提高了8.19 BLEU和4.24 BLEU,比使用基于注意力机制的双向循环神经网络标点恢复模型的MSI分别提高了2.28 BLEU和3.66 BLEU。因此所提模型可以有效应用于MSI中。  相似文献   

6.
Statistical machine translation (SMT) has proven to be an interesting pattern recognition framework for automatically building machine translations systems from available parallel corpora. In the last few years, research in SMT has been characterized by two significant advances. First, the popularization of the so called phrase-based statistical translation models, which allows to incorporate local contextual information to the translation models. Second, the availability of larger and larger parallel corpora, which are composed of millions of sentence pairs, and tens of millions of running words. Since phrase-based models basically consists in statistical dictionaries of phrase pairs, their estimation from very large corpora is a very costly task that yields a huge number of parameters which are to be stored in memory. The handling of millions of model parameters and a similar number of training samples have become a bottleneck in the field of SMT, as well as in other well-known pattern recognition tasks such as speech recognition or handwritten recognition, just to name a few. In this paper, we propose a general framework that deals with the scaling problem in SMT without introducing significant time overhead by means of the combination of different scaling techniques. This new framework is based on the use of counts instead of probabilities, and on the concept of cache memory.  相似文献   

7.
This paper addresses modeling user behavior in interactions between two people who do not share a common spoken language and communicate with the aid of an automated bidirectional speech translation system. These interaction settings are complex. The translation machine attempts to bridge the language gap by mediating the verbal communication, noting however that the technology may not be always perfect. In a step toward understanding user behavior in this mediated communication scenario, usability data from doctor–patient dialogs involving a two way English–Persian speech translation system are analyzed. We specifically consider user behavior in light of potential uncertainty in the communication between the interlocutors. We analyze the Retry (Repeat and Rephrase) versus Accept behaviors in the mediated verbal channel and as a result identify three user types – Accommodating, Normal and Picky, and propose a dynamic Bayesian network model of user behavior. To validate the model, we performed offline and online experiments. The experimental results using offline data show that correct user type is clearly identified as a user keeps his/her consistent behavior in a given interaction condition. In the online experiment, agent feedback was presented to users according to the user types. We show high user satisfaction and interaction efficiency in the analysis of user interview, video data, questionnaire and log data.  相似文献   

8.
Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions count as discourse markers, we have to reconsider how to set a framework for annotating, and, in order to better understand what we gain by introducing a discourse marker category, we have to analyse their characteristics and functions in discourse. This is especially important for languages such as Slovenian where no or little research on the topic of discourse markers has been carried out. The aims of this paper are to present a scheme for annotating discourse markers based on the analysis of a corpus of telephone conversations in the tourism domain in the Slovenian language, and to give some additional arguments based on the characteristics and functions of discourse markers that confirm their special status in conversation.  相似文献   

9.
手语研究是典型的多领域交叉研究课题,涉及计算机视觉、自然语言处理、跨媒体计算、人机交互等多个方向,主要包括离散手语识别、连续手语翻译和手语视频生成.手语识别与翻译旨在将手语视频转换成文本词汇或语句,而手语生成是根据口语或文本语句合成手语视频.换言之,手语识别翻译与手语生成可视为互逆过程.文中综述了手语研究的最新进展,介...  相似文献   

10.
面向汉语的计算机辅助语音学习系统特征的研究   总被引:2,自引:0,他引:2  
本文在分析了语言及语音学习和教学的重要性及特点的基础上,讨论了将语音处理技术应用于语言、语音的计算机辅助学习或教学中所涉及的多方面问题;并针对汉语语音的特点,研究了面向汉语学习的CALL系统所应具有的特征,及其在设计和实现时应遵循的原则;最后借助通用语音分析器′Speech Analyzer′进行了汉语语音学习的尝试。  相似文献   

11.
深度学习研究与进展   总被引:2,自引:0,他引:2  
深度学习是机器学习领域一个新兴的研究方向,它通过模仿人脑结构,实现对复杂输入数据的高效处理,智能地学习不同的知识,而且能够有效地解决多类复杂的智能问题。近年来,随着深度学习高效学习算法的出现,机器学习界掀起了研究深度学习理论及应用的热潮。实践表明,深度学习是一种高效的特征提取方法,它能够提取数据中更加抽象的特征,实现对数据更本质的刻画,同时深层模型具有更强的建模和推广能力。鉴于深度学习的优点及其广泛应用,对深度学习进行了较为系统的介绍,详细阐述了其产生背景、理论依据、典型的深度学习模型、具有代表性的快速学习算法、最新进展及实践应用,最后探讨了深度学习未来值得研究的方向。  相似文献   

12.
This paper proposes a new technique to test the performance of spoken dialogue systems by artificially simulating the behaviour of three types of user (very cooperative, cooperative and not very cooperative) interacting with a system by means of spoken dialogues. Experiments using the technique were carried out to test the performance of a previously developed dialogue system designed for the fast-food domain and working with two kinds of language model for automatic speech recognition: one based on 17 prompt-dependent language models, and the other based on one prompt-independent language model. The use of the simulated user enables the identification of problems relating to the speech recognition, spoken language understanding, and dialogue management components of the system. In particular, in these experiments problems were encountered with the recognition and understanding of postal codes and addresses and with the lengthy sequences of repetitive confirmation turns required to correct these errors. By employing a simulated user in a range of different experimental conditions sufficient data can be generated to support a systematic analysis of potential problems and to enable fine-grained tuning of the system.  相似文献   

13.
Monaural speech separation and recognition challenge   总被引:2,自引:1,他引:1  
Robust speech recognition in everyday conditions requires the solution to a number of challenging problems, not least the ability to handle multiple sound sources. The specific case of speech recognition in the presence of a competing talker has been studied for several decades, resulting in a number of quite distinct algorithmic solutions whose focus ranges from modeling both target and competing speech to speech separation using auditory grouping principles. The purpose of the monaural speech separation and recognition challenge was to permit a large-scale comparison of techniques for the competing talker problem. The task was to identify keywords in sentences spoken by a target talker when mixed into a single channel with a background talker speaking similar sentences. Ten independent sets of results were contributed, alongside a baseline recognition system. Performance was evaluated using common training and test data and common metrics. Listeners’ performance in the same task was also measured. This paper describes the challenge problem, compares the performance of the contributed algorithms, and discusses the factors which distinguish the systems. One highlight of the comparison was the finding that several systems achieved near-human performance in some conditions, and one out-performed listeners overall.  相似文献   

14.
This paper addresses the problem of recognising speech in the presence of a competing speaker. We review a speech fragment decoding technique that treats segregation and recognition as coupled problems. Data-driven techniques are used to segment a spectro-temporal representation into a set of fragments, such that each fragment is dominated by one or other of the speech sources. A speech fragment decoder is used which employs missing data techniques and clean speech models to simultaneously search for the set of fragments and the word sequence that best matches the target speaker model. The paper investigates the performance of the system on a recognition task employing artificially mixed target and masker speech utterances. The fragment decoder produces significantly lower error rates than a conventional recogniser, and mimics the pattern of human performance that is produced by the interplay between energetic and informational masking. However, at around 0 dB the performance is generally quite poor. An analysis of the errors shows that a large number of target/masker confusions are being made. The paper presents a novel fragment-based speaker identification approach that allows the target speaker to be reliably identified across a wide range of SNRs. This component is combined with the recognition system to produce significant improvements. When the target and masker utterance have the same gender, the recognition system has a performance at 0 dB equal to that of humans; in other conditions the error rate is roughly twice the human error rate.  相似文献   

15.
A Semantic Syntax-Directed Translation is presented. Its rules are used to segment continuous speech and, at the same time, to produce phonetic interpretations.  相似文献   

16.
口语对话系统一直是计算机科学领域人类语言技术的热点,能够应用于不同的领域并且具备广阔的前景。将分析国外不同领域的三种典型会话系统:CommandTalk、ITSPOKE 和NICE。将从使用范围与交互方式、语音识别、对话管理、语音合成等几方面分析和研究这三种来自不同领域的对话系统,并提出观点和见解,为国内的口语对话系统研究和开发提供一定的参考和建议。  相似文献   

17.
This paper concerns the treatment, in the context of machine translation, of English complex nominal groups which can be considered as nominalizations of verb phrases. We discuss the fact that many styles of English prose which are suitable for translation by machine typically favor the use of nominal rather than verbal syntagms. But such constructions when translated literally are often considered unnatural. The general problem is described in detail, with examples. The more specific problem of recognizing nominalizations and analyzing their structure is considered. How and where to achieve the required syntactic transformation is discussed, and exemplified.On leave of absence from the Centre for Computational Linguistics, University of Manchester Institute of Science and Technology, England.  相似文献   

18.
The recognition of facial gestures and expressions in image sequences is an important and challenging problem. Most of the existing methods adopt the following paradigm. First, facial actions/features are retrieved from the images, then the facial expression is recognized based on the retrieved temporal parameters. In contrast to this mainstream approach, this paper introduces a new approach allowing the simultaneous retrieval of facial actions and expression using a particle filter adopting multi-class dynamics that are conditioned on the expression. For each frame in the video sequence, our approach is split into two consecutive stages. In the first stage, the 3D head pose is retrieved using a deterministic registration technique based on Online Appearance Models. In the second stage, the facial actions as well as the facial expression are simultaneously retrieved using a stochastic framework based on second-order Markov chains. The proposed fast scheme is either as robust as, or more robust than existing ones in a number of respects. We describe extensive experiments and provide evaluations of performance to show the feasibility and robustness of the proposed approach.  相似文献   

19.
The kernel function is the core of the Support Vector Machine (SVM), and its selection directly affects the performance of SVM. There has been no theoretical basis on choosing a kernel function for speech recognition. In order to improve the learning ability and generalization ability of SVM for speech recognition, this paper presents the Optimal Relaxation Factor (ORF) kernel function, which is a set of new SVM kernel functions for speech recognition, and proves that the ORF function is a Mercer kernel function. The experiments show the ORF kernel function's effectiveness on mapping trend, bi-spiral, and speech recognition problems. The paper draws the conclusion that the ORF kernel function performs better than the Radial Basis Function (RBF), the Exponential Radial Basis Function (ERBF) and the Kernel with Moderate Decreasing (KMOD). Furthermore, the results of speech recognition with the ORF kernel function illustrate higher recognition accuracy.  相似文献   

20.
在语言信息处理的研究中,语料库(特别是双语语料库)的作用日益凸现出来。机器翻译作为语言信息处理研究的一个分支,通过采用语料库技术,较好地提高了翻译的准确性和可读性。因此,标准语料库的建立及应用在其中有着重要的地位和作用。本文主要研究了一个专业领域(如自动化、计算机)汉英平行语料库的建立,最后简述了语料库在统计机器翻译系统中的应用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号