期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王亚娟李晓杨雅婷米成刚《计算机工程》2019,45(4)

针对维汉机器翻译中单个翻译模型翻译效果差且多个翻译模型间翻译差异较大的问题,提出一种基于释义信息的系统融合方法。通过提取汉语端释义信息对汉语翻译假设进行词对齐,利用词对齐信息构建并解码混淆网络,从而得到维汉机器翻译系统融合结果。实验结果表明,与单个翻译系统HPSTW相比,该方法能够有效提高翻译质量。相似文献

2.

基于翻译规则的统计机器翻译

刘颖姜巍《计算机科学》2013,40(2):214-217

扩展HMM模型可以解决词语对齐结果与句法约束冲突,从而更好地进行词语对齐。在短语对齐基础上利用目标语言的短语结构树抽取翻译规则。采用扩展CYK算法CYKA+作为系统的解码器,该算法可以处理非乔姆斯基范式的翻译规则;采用两轮解码算法在解码过程中整合语言模型。实验表明,与传统词语对齐模型相比,改进的 HMM词语对齐模型具有更高的对齐准确率,并且翻译结果的BLEU评测得分更高。采用翻译规则的系统在不同数据集上具有更稳定的翻译结果。两轮解码算法与立方剪枝算法具有相近的解码质量,但前者解码速度更快。相似文献

3.

基于WordNet词义消歧的系统融合 总被引：3，自引：3，他引：0

刘宇鹏李生赵铁军《自动化学报》2010,36(11):1575-1580

最近混淆网络在融合多个机器翻译结果中展示很好的性能. 然而为了克服在不同的翻译系统中不同的词序, 假设对齐在混淆网络的构建上仍然是一个重要的问题. 但以往的对齐方法都没有考虑到语义信息. 本文为了更好地改进系统融合的性能, 提出了用词义消歧(Word sense disambiguation, WSD)来指导混淆网络中的对齐. 同时骨架翻译的选择也是通过计算句子间的相似度来获得的, 句子的相似性计算使用了二分图的最大匹配算法. 为了使得基于WordNet词义消歧方法融入到系统中, 本文将翻译错误率(Translation error rate, TER)算法进行了改进, 实验结果显示本方法的性能好于经典的TER算法的性能. 相似文献

4.

汉语音节混淆网络的生成与重打分算法研究

尹明明李弼程屈丹牛铜《小型微型计算机系统》2012,33(6):1385-1388

针对目前混淆网络生成算法速度与精度不能兼顾的不足,提出一种新的汉语音节混淆网络生成的方法.本算法采用类似轴对齐算法,对音节网格每次提取一条局部路径与参考路径对齐,根据每次对齐路径与参考路径长度不同,采用不同的策略生成混淆网络,并在生成混淆网络之后对其应用一种新的解码框架进行重打分.实验表明,该算法生成的混淆网络精度较高,时间复杂度优于轴对齐算法,且重打分后的识别率有显著提高. 相似文献

5.

联合式多引擎维汉机器翻译系统 总被引：1，自引：0，他引：1

下载免费PDF全文

宿建军张小燕吐尔洪·吾司曼李晓《计算机工程》2011,37(16):179-181

根据维吾尔语形态变化丰富的特殊性,搭建一个基于Factored的维汉机器翻译系统,将Factored系统和基于层次短语的Joshua翻译系统以及Moses中基于句法的翻译模型进行系统融合,构建混淆网络。提出一种词级和句子级联合融合的维汉机器翻译方法,利用一致性网络进行词级融合,并采用最小贝叶斯算法进行句子级融合。实验结果表明,联合式多引擎方法能提高1.72%个BLUE-SBP值。相似文献

6.

翻译规则剪枝与基于半强制解码和变分贝叶斯推理的模型训练

高恩婷段湘煜巢佳媛张民《中文信息学报》2014,28(5):141-147

统计机器翻译一般采用启发式方法训练翻译模型。但启发式方法的理论基础不够完善,因此,会导致翻译模型规模庞大以及模型参数精确率不高。针对以上两个问题,该文提出一种基于变分贝叶斯推理的模型训练方法,形成更精确的精简翻译模型。该方法首先通过强制解码对齐语料,然后利用变分贝叶斯EM算法获得模型参数。该文的实验语料为NIST汉英翻译任务数据,实验结果显示,基于句法(基于短语)的统计机器翻译中,超过95%(76%)的规则被剪枝,且BLEU值显著提高。相似文献

7.

统计机器翻译和翻译记忆的动态融合方法研究

汪昆宗成庆苏克毅《中文信息学报》2015,29(2):87-94

在融合翻译记忆和统计机器翻译的整合式模型的基础上,该文提出在解码过程中进一步地动态加入翻译记忆中新发现的短语对。它在机器翻译解码过程中,动态地加入翻译记忆片段作为候选,并利用翻译记忆的相关信息,指导基于短语的翻译模型进行解码。实验结果表明该方法显著提高了翻译质量: 与翻译记忆系统相比,该方法提高了21.15个BLEU值,降低了21.47个TER值;与基于短语的翻译系统相比,该方法提高了5.16个BLEU值,降低了4.05个TER值。相似文献

8.

基于高斯混合模型的遥感影像连续型朴素贝叶斯网络分类器 总被引：1，自引：0，他引：1

陶建斌舒宁沈照庆《遥感信息》2010,(2):18-24,29

提出了一种新的嵌入高斯混合模型(GMM,Gaussian Mixture Model)遥感影像朴素贝叶斯网络模型GMM-NBC(GMMbased Na ve Bayesian Classifier)。针对连续型朴素贝叶斯网络分类器中假设地物服从单一高斯分布的缺点,该方法将地物在特征空间的分布用高斯混合模型来模拟,用改进EM算法自动获取高斯混合模型的参数;高斯混合模型整体作为一个子节点嵌入朴素贝叶斯网络中,将其输出作为节点(特征)的中间类后验概率,在朴素贝叶斯网络的框架下进行融合获得最终的类后验概率。对多光谱和高光谱数据的分类实验结果表明,该方法较传统贝叶斯分类器分类效果要好,且有较强的鲁棒性。相似文献

9.

基于分层语块分析的统计翻译研究 总被引：1，自引：0，他引：1

魏玮杜金华徐波《中文信息学报》2007,21(5):87-90

本文描述了一个基于分层语块分析的统计翻译模型。该模型在形式上不仅符合同步上下文无关文法,而且融合了基于条件随机场的英文语块分析知识,因此基于分层语块分析的统计翻译模型做到了将句法翻译模型和短语翻译模型有效地结合。该系统的解码算法改进了线图分析的CKY算法,融入了线性的N-gram语言模型。目前,本文主要针对中文－英文的口语翻译进行了一系列实验,并以国际口语评测IWSLT(International Workshop on Spoken Language Translation)为标准,在2005年的评测测试集上,BLEU和NIST得分均比统计短语翻译系统有所提高。相似文献

10.

一种汉字混淆网络算法

吴斌孙成立刘刚郭军《数据采集与处理》2008,23(4)

在普通话大词汇量连续语音识别中,使用最大后验概率决策规则解码得到的是具有最小句子错误率的识别结果,但是本文通常使用字错误率作为识别结果的评测标准.为了使识别结果具有最小字错误率,在充分考虑汉语语言特点的基础上,提出了一种汉字混淆网络算法.这种算法能够有效地将普通话大词汇量连续语音识别系统输出的词格转换成为汉字混淆网络.详细讨论了最小贝叶斯风险决策规则理论及使用汉字混淆网络进行的解码过程.基于2005 HTRDP(863)评测数据集进行的实验结果表明,这种使用汉语字混淆网络的方法有效地降低了普通话大词汇量连续语音识别结果的字错误率. 相似文献

11.

Minimum Bayes Risk decoding and system combination based on a recursion for edit distance

Haihua Xu Daniel Povey Lidia Mangu Jie Zhu 《Computer Speech and Language》2011,25(4):802-828

In this paper we describe a method that can be used for Minimum Bayes Risk (MBR) decoding for speech recognition. Our algorithm can take as input either a single lattice, or multiple lattices for system combination. It has similar functionality to the widely used Consensus method, but has a clearer theoretical basis and appears to give better results both for MBR decoding and system combination. Many different approximations have been described to solve the MBR decoding problem, which is very difficult from an optimization point of view. Our proposed method solves the problem through a novel forward–backward recursion on the lattice, not requiring time markings. We prove that our algorithm iteratively improves a bound on the Bayes risk. 相似文献

12.

机器翻译系统融合技术综述 总被引：4，自引：1，他引：3

李茂西宗成庆《中文信息学报》2010,24(4):74-85

该文对机器翻译研究中的系统融合方法进行了全面综述和分析。根据在多系统输出结果的基础上进行融合的层次差异,我们将系统融合方法分为三类句子级系统融合、短语级系统融合和词汇级系统融合。然后,针对这三种融合方法,该文分别介绍了它们各自具有代表性的研究工作,包括实现方法、置信度估计和解码算法等,并着重阐述了近年来使用广泛的词汇级系统融合方法中用于构造混淆网络的词对齐技术。最后,该文对这三类系统融合方法进行了比较、总结和展望。相似文献

13.

System Combination for Machine Translation of Spoken and Written Language 总被引：1，自引：0，他引：1

《IEEE transactions on audio, speech, and language processing》2008,16(7):1222-1237

This paper describes an approach for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The consensus translation is computed by weighted majority voting on a confusion network, similarly to the well-established ROVER approach of Fiscus for combining speech recognition hypotheses. To create the confusion network, pairwise word alignments of the original MT hypotheses are learned using an enhanced statistical alignment algorithm that explicitly models word reordering. The context of a whole corpus of automatic translations rather than a single sentence is taken into account in order to achieve high alignment quality. The confusion network is rescored with a special language model, and the consensus translation is extracted as the best path. The proposed system combination approach was evaluated in the framework of the TC-STAR speech translation project. Up to six state-of-the-art statistical phrase-based translation systems from different project partners were combined in the experiments. Significant improvements in translation quality from Spanish to English and from English to Spanish in comparison with the best of the individual MT systems were achieved under official evaluation conditions. 相似文献

14.

统计机器翻译中一致性解码方法比较分析

段楠李沐周明《中文信息学报》2013,27(1):64-72

该文对近年来统计机器翻译研究中出现的多种一致性解码方法进行比较与分析。根据现有一致性解码方法对(单个或多个)统计机器翻译系统输出结果使用方式的不同,首先将其归纳为两大类基于翻译假设重排序的一致性解码方法和基于翻译假设重组合的一致性解码方法;然后,针对每类方法,分别回顾其最具代表性的研究工作;最后,通过在大规模中—英机器翻译评测数据上的对比实验,对该文中介绍的多种方法进行比较,并对该课题未来研究方向进行展望。相似文献

15.

基于锁相环的比率抖动调制数字水印同步算法

颜斌王小明郝建军张仁彦《计算机应用》2011,31(10):2674-2677

针对数字水印中的去同步攻击问题,提出一种基于锁相环的比率抖动调制水印算法。该方法考虑分数平移和幅度缩放两种去同步攻击,水印嵌入器采用比率抖动调制方法,水印检测器采用联合信道参数估计和水印消息解码。其中检测器根据比率抖动调制水印的解码结果,调整去同步攻击参数的估计值,再根据该参数的估计来辅助水印解码,提高解码性能。实验分别测试了有无数据辅助模式和不同插值方式对于该水印系统的影响。实验结果表明,该算法能有效地抵抗分数平移和幅度缩放两种去同步攻击。相似文献

16.

Dynamic registration selection for fingerprint verification

Neil Yager^{Author Vitae} Adnan Amin Author Vitae 《Pattern recognition》2006,39(11):2141-2148

Information fusion is a powerful approach to increasing the accuracy of biometric authentication systems, and is currently an active area of research. The majority of studies focus on combining the results from multiple verification systems at the match score level using either a classification or combination scheme. However, there are advantages to performing the fusion at an earlier stage of processing. Fingerprint registration involves finding the translation and rotation parameters that align two fingerprints; a challenging problem that can be approached in a number of ways. The fusion of fingerprint alignment algorithms is introduced in the form of dynamic registration selection. A Bayesian statistical framework is used to select the most probable alignment produced by competing algorithms. The results of the proposed technique are tested on multiple FVC 2002 databases, and are shown to outperform methods based on match score combination. 相似文献

17.

基于超图的翻译模型融合的研究

刘宇鹏李生赵铁军《软件学报》2012,23(9):2347-2357

当前,系统融合是在机器翻译的后处理上进行.提出了在解码过程中来融合翻译模型,融合了主流两个翻译系统的翻译模型(层次化的基于短语的文法Hiero和括号转录文法BTG).并从理论和实践的角度探索了现在主流的两种解码方法.同时,所提出的解码方法解决了伪歧义或一致性问题.在实验结果上得出:多文法模型融合的标志性要好于成员翻译模型;新的解码方法标志性好于传统解码方法(Viterbi解码). 相似文献

18.

Substring-based machine translation

Graham Neubig Taro Watanabe Shinsuke Mori Tatsuya Kawahara 《Machine Translation》2013,27(2):139-166

Machine translation is traditionally formulated as the transduction of strings of words from the source to the target language. As a result, additional lexical processing steps such as morphological analysis, transliteration, and tokenization are required to process the internal structure of words to help cope with data-sparsity issues that occur when simply dividing words according to white spaces. In this paper, we take a different approach: not dividing lexical processing and translation into two steps, but simply viewing translation as a single transduction between character strings in the source and target languages. In particular, we demonstrate that the key to achieving accuracies on a par with word-based translation in the character-based framework is the use of a many-to-many alignment strategy that can accurately capture correspondences between arbitrary substrings. We build on the alignment method proposed in Neubig et al. (Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Portland, Oregon, pp. 632–641, 2011), improving its efficiency and accuracy with a focus on character-based translation. Using a many-to-many aligner imbued with these improvements, we demonstrate that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-based translation for two distant language pairs. 相似文献