期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

宋全德褚晓圆白杰沈雁俊《计算机应用与软件》2011,(10)

介绍基于服务聚合的同声传译平台的设计与实现。该软件以服务聚合思想为基础,利用模块化开发技术,将现有的语音识别、机器翻译、TTS(Text to Speech)服务进行有效的聚合,从而实现了同声传译功能。该软件的最大特色是利用当前较为流行的服务聚合技术来实现同声传译功能。相似文献

2.

Computer-assisted translation using speech recognition

Vidal E. Casacuberta F. Rodriguez L. Civera J. Hinarejos C.D.M. 《IEEE transactions on audio, speech, and language processing》2006,14(3):941-951

Current machine translation systems are far from being perfect. However, such systems can be used in computer-assisted translation to increase the productivity of the (human) translation process. The idea is to use a text-to-text translation system to produce portions of target language text that can be accepted or amended by a human translator using text or speech. These user-validated portions are then used by the text-to-text translation system to produce further, hopefully improved suggestions. There are different alternatives of using speech in a computer-assisted translation system: From pure dictated translation to simple determination of acceptable partial translations by reading parts of the suggestions made by the system. In all the cases, information from the text to be translated can be used to constrain the speech decoding search space. While pure dictation seems to be among the most attractive settings, unfortunately perfect speech decoding does not seem possible with the current speech processing technology and human error-correcting would still be required. Therefore, approaches that allow for higher speech recognition accuracy by using increasingly constrained models in the speech recognition process are explored here. All these approaches are presented under the statistical framework. Empirical results support the potential usefulness of using speech within the computer-assisted translation paradigm. 相似文献

3.

Effect of acoustic and linguistic contexts on human and machine speech recognition

《Computer Speech and Language》2014,28(3):769-787

We compared the performance of an automatic speech recognition system using n-gram language models, HMM acoustic models, as well as combinations of the two, with the word recognition performance of human subjects who either had access to only acoustic information, had information only about local linguistic context, or had access to a combination of both. All speech recordings used were taken from Japanese narration and spontaneous speech corpora.Humans have difficulty recognizing isolated words taken out of context, especially when taken from spontaneous speech, partly due to word-boundary coarticulation. Our recognition performance improves dramatically when one or two preceding words are added. Short words in Japanese mainly consist of post-positional particles (i.e. wa, ga, wo, ni, etc.), which are function words located just after content words such as nouns and verbs. So the predictability of short words is very high within the context of the one or two preceding words, and thus recognition of short words is drastically improved. Providing even more context further improves human prediction performance under text-only conditions (without acoustic signals). It also improves speech recognition, but the improvement is relatively small.Recognition experiments using an automatic speech recognizer were conducted under conditions almost identical to the experiments with humans. The performance of the acoustic models without any language model, or with only a unigram language model, were greatly inferior to human recognition performance with no context. In contrast, prediction performance using a trigram language model was superior or comparable to human performance when given a preceding and a succeeding word. These results suggest that we must improve our acoustic models rather than our language models to make automatic speech recognizers comparable to humans in recognition performance under conditions where the recognizer has limited linguistic context. 相似文献

4.

Nine Issues in Speech Translation

Mark Seligman 《Machine Translation》2000,15(1-2):149-186

This paper sketches research in nine areas related to spoken language translation: interactive disambiguation (two demonstrations of highly interactive, broad-coverage speech translation are reported); system architecture; data structures; the interface between speech recognition and analysis; the use of natural pauses for segmenting utterances; example-based machine translation; dialogue acts; the tracking of lexical co-occurrences; and the resolution of translation mismatches. 相似文献

5.

抗噪声语音识别及语音增强算法的应用 总被引：1，自引：0，他引：1

汤玲戴斌《计算机仿真》2006,23(9):80-82,143

提高语音识别系统的鲁棒性是语音识别技术一个重要的研究课题。语音识别系统往往由于训练环境下的数据和识别环境下的数据不匹配造成系统的识别性能下降，为了让语音识别系统在含噪的环境下获得令人满意的工作性能，该文根据人耳听觉特性提出了一种鲁棒语音特征提取方法。在MFCC特征提取之前先对含噪语音特征进行掩蔽特性处理，同时结合语音增强方法对特征进行处理，最后得到鲁棒语音特征。通过4种不同试验结果分析表明，将这种方法用于抗噪声分析可以提高系统的抗噪声能力；同时这种特征的处理方法对不同噪声在不同信噪比有很好的适应性。相似文献

6.

连续语音识别和语音翻译

林道发杨家沅《计算机应用与软件》1994,11(2):15-19,25

本文介绍了在连续语音识别和语音机器翻译方面所进行的工作。我们已在中等词汇量范围、限定说话主题的条件下，实现了特定人的连续话句的识别。并实现了一个英汉语音翻译实验演示系统。相似文献

7.

基于深度前编码卷积网络的汉越语音翻译方法

王剑许树理余正涛王振晗梁仁凤《小型微型计算机系统》2021,(4):736-739

语音翻译是将源语言语音翻译为目标语言文本的过程.传统序列到序列模型应用到语音翻译领域时,模型对于序列长度较为敏感,编码端特征提取和局部依赖建模压力较大.针对这一问题,本文基于Transformer网络构建语音翻译模型,使用深度卷积网络对音频频谱特征进行前编码处理,通过对音频序列进行下采样,对音频频谱中的时频信息进行局部依赖建模和深层特征提取,缓解编码器的建模压力,实现了汉越双语的语音到文本互译.实验结果表明,提出方法取得很好效果,相比基准系统获得了约19%的性能提升. 相似文献

8.

跨模态信息融合的端到端语音翻译

刘宇宸宗成庆《软件学报》2023,34(4):1837-1849

语音翻译旨在将一种语言的语音翻译成另一种语言的语音或文本. 相比于级联式翻译系统, 端到端的语音翻译方法具有时间延迟低、错误累积少和存储空间小等优势, 因此越来越多地受到研究者们的关注. 但是, 端到端的语音翻译方法不仅需要处理较长的语音序列, 提取其中的声学信息, 而且需要学习源语言语音和目标语言文本之间的对齐关系, 从而导致建模困难, 且性能欠佳. 提出一种跨模态信息融合的端到端的语音翻译方法, 该方法将文本机器翻译与语音翻译模型深度结合, 针对语音序列长度与文本序列长度不一致的问题, 通过过滤声学表示中的冗余信息, 使过滤后的声学状态序列长度与对应的文本序列尽可能一致; 针对对齐关系难学习的问题, 采用基于参数共享的方法将文本机器翻译模型嵌入到语音翻译模型中, 并通过多任务训练方法学习源语言语音与目标语言文本之间的对齐关系. 在公开的语音翻译数据集上进行的实验表明, 所提方法可以显著提升语音翻译的性能. 相似文献

9.

Implementation of Embedded Technology-Based English Speech Identification and Translation System

Zheng Zeng 《计算机系统科学与工程》2020,35(5):377-383

Due to the increase in globalization, communication between different countries has become more and more frequent. Language barriers are the most important issues in communication. Machine translation is limited to texts, and cannot be an adequate substitute for oral communication. In this study, a speech recognition and translation system based on embedded technology was developed for the purpose of English speech recognition and translation. The system adopted the Hidden Markov Model (HMM) and Windows CE operating system. Experiments involving English speech recognition and EnglishChinese translation found that the accuracy of the system in identifying English speech was about 88%, and the accuracy rate of the system in translating English to Chinese was over 85%. The embedded technology-based English speech recognition and translation system demonstrated a level of high accuracy in speech identification and translation, demonstrating its value as a practical application. Therefore, it merits further research and development. 相似文献

10.

多语言综合信息服务系统研究与设计 总被引：1，自引：0，他引：1

下载免费PDF全文

肖荣吴英姿《计算机工程》2009,35(2):263-264

基于多语言的综合信息服务正成为信息服务领域的一个重要的发展方向。该文提出面向2010年上海世博会和城市信息服务的多语言综合信息服务应用的总体架构,该系统基于下一代网络技术,在逻辑上分为七层架构。系统通过多种类型的门户,集成和应用包括多语言语音识别、合成、机器翻译等关键技术,整合现有的城市信息服务资源,为用户提供方便、快捷的多语言综合信息服务。相似文献

11.

An end-to-end model for cross-lingual transformation of paralinguistic information

Takatomo Kano Shinnosuke Takamichi Sakriani Sakti Graham Neubig Tomoki Toda Satoshi Nakamura 《Machine Translation》2018,32(4):353-368

Speech translation is a technology that helps people communicate across different languages. The most commonly used speech translation model is composed of automatic speech recognition, machine translation and text-to-speech synthesis components, which share information only at the text level. However, spoken communication is different from written communication in that it uses rich acoustic cues such as prosody in order to transmit more information through non-verbal channels. This paper is concerned with speech-to-speech translation that is sensitive to this paralinguistic information. Our long-term goal is to make a system that allows users to speak a foreign language with the same expressiveness as if they were speaking in their own language. Our method works by reconstructing input acoustic features in the target language. From the many different possible paralinguistic features to handle, in this paper we choose duration and power as a first step, proposing a method that can translate these features from input speech to the output speech in continuous space. This is done in a simple and language-independent fashion by training an end-to-end model that maps source-language duration and power information into the target language. Two approaches are investigated: linear regression and neural network models. We evaluate the proposed methods and show that paralinguistic information in the input speech of the source language can be reflected in the output speech of the target language. 相似文献

12.

MARS: A Statistical Semantic Parsing and Generation-Based Multilingual Automatic tRanslation System

Yuqing Gao Bowen Zhou Zijian Diao Jeffrey Sorensen Michael Picheny 《Machine Translation》2002,17(3):185-212

We present MARS (Multilingual Automatic tRanslation System), a research prototype speech-to-speech translation system. MARS is aimed at two-way conversational spoken language translation between English and Mandarin Chinese for limited domains, such as air travel reservations. In MARS, machine translation is embedded within a complex speech processing task, and the translation performance is highly effected by the performance of other components, such as the recognizer and semantic parser, etc. All components in the proposed system are statistically trained using an appropriate training corpus. The speech signal is first recognized by an automatic speech recognizer (ASR). Next, the ASR-transcribed text is analyzed by a semantic parser, which uses a statistical decision-tree model that does not require hand-crafted grammars or rules. Furthermore, the parser provides semantic information that helps further re-scoring of the speech recognition hypotheses. The semantic content extracted by the parser is formatted into a language-independent tree structure, which is used for an interlingua based translation. A Maximum Entropy based sentence-level natural language generation (NLG) approach is used to generate sentences in the target language from the semantic tree representations. Finally, the generated target sentence is synthesized into speech by a speech synthesizer.Many new features and innovations have been incorporated into MARS: the translation is based on understanding the meaning of the sentence; the semantic parser uses a statistical model and is trained from a semantically annotated corpus; the output of the semantic parser is used to select a more specific language model to refine the speech recognition performance; the NLG component uses a statistical model and is also trained from the same annotated corpus. These features give MARS the advantages of robustness to speech disfluencies and recognition errors, tighter integration of semantic information into speech recognition, and portability to new languages and domains. These advantages are verified by our experimental results. 相似文献

13.

深度学习神经网络在语音识别中的应用

刘豫军夏聪《网络安全技术与应用》2014,(12):28-28

近年来,随着我国科学技术的不断深入与发展,神经网络逐渐与语音识别技术联系的越来越密切。在传统的语音识别技术中,模板匹配法是其主要的操作方法,而在现代的语音识别技术中,神经网络已成为主要的发展趋势。神经网络技术主要模拟了人类的神经元活动原理,将人类所特有的自主学习、想象能力综合到了语音识别系统中,为语音识别的发展开辟了一条新的途径。本文我们将综合具体事例简要分析深度学习神经网络与语音识别系统的结合。相似文献

14.

Sichuan dialect speech recognition with deep LSTM network

Wangyang YING Lei ZHANG Hongli DENG 《Frontiers of Computer Science》2020,14(2):378-387

In speech recognition research,because of the variety of languages,corresponding speech recognition systems need to be constructed for different languages.Especially in a dialect speech recognition system,there are many special words and oral language features.In addition,dialect speech data is very scarce.Therefore,constructing a dialect speech recognition system is difficult.This paper constructs a speech recognition system for Sichuan dialect by combining a hidden Markov model(HMM)and a deep long short-term memory(LSTM)network.Using the HMM-LSTM architecture,we created a Sichuan dialect dataset and implemented a speech recognition system for this dataset.Compared with the deep neural network(DNN),the LSTM network can overcome the problem that the DNN only captures the context of a fixed number of information items.Moreover,to identify polyphone and special pronunciation vocabularies in Sichuan dialect accurately,we collect all the characters in the dataset and their common phoneme sequences to form a lexicon.Finally,this system yields a 11.34%character error rate on the Sichuan dialect evaluation dataset.As far as we know,it is the best performance for this corpus at present. 相似文献

15.

资源稀缺蒙语语音识别研究 总被引：1，自引：1，他引：0

张爱英倪崇嘉《计算机科学》2017,44(10):318-322

随着语音识别技术的发展,资源稀缺语言的语音识别系统的研究吸引了更广泛的关注。以蒙语为目标识别语言,研究了在资源稀缺的情况下(如仅有10小时的带标注的语音)如何利用其他多语言信息提高识别系统的性能。借助基于多语言深度神经网络的跨语言迁移学习和基于多语言深度Bottleneck神经网络的抽取特征可以获得更具有区分度的声学模型。通过搜索引擎以及网络爬虫的定向抓取获得大量的网页数据,有助于获得文本数据,以增强语言模型的性能。融合多个不同识别结果以进一步提高识别精度。与基线系统相比,多种系统融合的识别绝对错误率减少12%。相似文献

16.

手语识别与合成技术在智能建筑中的应用

杨全王民《微计算机信息》2007,23(24):219-221

本文分析了手语识别与合成技术在智能建筑中应用的可能性与意义,分别介绍了手语识别部分与合成部分的主要技术,提出了适用于智能建筑的手语识别、合成系统的结构,并给出了一种手语/语音双向翻译系统在无障碍化智能住宅小区中应用的可行性实例。相似文献

17.

Thai spelling analysis for automatic spelling speech recognition

Chutima Pisarn Thanaruk Theeramunkong 《Information Sciences》2008,178(1):122-136

Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents a Thai spelling analysis to develop a Thai spelling speech recognizer. The Thai phonetic characteristics, alphabet system and spelling methods have been analyzed. As a training resource, two alternative corpora, a small spelling speech corpus and an existing large continuous speech corpus, are used to train hidden Markov models (HMMs). Then their recognition results are compared to each other. To solve the problem of utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed has been taken into account. Two alternative language models, bigram and trigram, are used for investigating performance of spelling speech recognition. Our approach achieves up to 98.0% letter correction rate, 97.9% letter accuracy and 82.8% utterance correction rate when the language model is trained based on trigram and the acoustic model is trained from the small spelling speech corpus with eight Gaussian mixtures. 相似文献

18.

基于音频流的电视智能监测系统设计

崔朝阳刘晓星韩疆颜永红《微计算机应用》2005,26(6):687-690

广电总局的电视监测业务已经实现了设备控制自动化和卫星信号采集的数字化、信息化和网络化,但基于内容的异态事件监测和信息处理还是完全依赖人工完成.语音处理、语音识别和关联检索等技术的发展,为电视监测业务智能化提供了可能.本文介绍了电视监测业务智能辅助系统的架构,该系统能够自动定位电视节目,把电视新闻语音转化为文字,对敏感语言内容预警,并关联聚类相关信息,方便人工后续处理. 相似文献

19.

语音识别中声学模型研究综述

叶硕褚钰王祎李田港《计算机技术与发展》2020,(3):181-186

智能语音技术包含语音识别、自然语言处理、语音合成三个方面的内容,其中语音识别是实现人机交互的关键技术,识别系统通常需要建立声学模型和语言模型。神经网络的兴起使声学模型数量急剧增加,基于神经网络的声学模型与传统识别模型相结合的方式,极大地推动了语音识别的发展。语音识别作为人机交互的前端,具有许多研究方向,文中着重对语音识别任务中的文本识别、说话人识别、情绪识别三个方向的声学模型研究现状进行归纳总结,尽可能对语音识别技术的演化进行细致介绍,为以后的相关研究提供有价值的参考。同时对目前语音识别的主流方法进行概括比较,介绍了端到端的语音识别模型的优势,并对发展趋势进行分析展望,最后提出当前语音识别任务中面临的挑战。相似文献

20.

集成语种辨识的中英文LVCSR系统

孙健王作英《计算机工程与设计》2007,28(8):1931-1933

为了在未知一段语音所属语言种类的情况下将其转换为正确的字符序列,将语种辨识(language identification,LID)同语音识别集成在一起建立了中、英文大词汇量连续语音识别(large vocabulary continuous speech recognition,LVCSR)系统.为了在中、英文连续语音识别系统中能够尽早的对语音所属的语言种类做出判决以便进行识别,从而降低解码的计算量,对语种辨识过程中的语种剪枝进行了研究,表明采用合理的语种剪枝门限在不降低系统性能的情况下,可以有效的降低系统的计算量及识别时间. 相似文献