首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
基于短语统计翻译的汉维机器翻译系统   总被引:1,自引:0,他引:1  
杨攀  李淼  张建 《计算机应用》2009,29(7):2022-2025
描述了一种基于短语统计翻译的汉维机器翻译系统。首先使用汉维语料进行训练,得到语言模型和翻译模型;再利用训练好的模型对源语句进行解码,以得到最佳的翻译语句。解码的核心算法是柱搜索(beam search)算法。其中维文语料使用的是拉丁维文。实验结果表明,基于短语的统计机器翻译方法可以快速有效地构建一个汉维机器翻译平台。  相似文献   

2.
This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language. To translate between languages L s and L t with limited bilingual resources, we bring in a third language, L p , called the pivot language. For the language pairs L s  − L p and L p  − L t , there exist large bilingual corpora. Using only L s  − L p and L p  − L t bilingual corpora, we can build a translation model for L s  − L t . The advantage of this method lies in the fact that we can perform translation between L s and L t even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language approach significantly outperforms the standard model trained on a small bilingual corpus. Moreover, with a small L s  − L t bilingual corpus available, our method can further improve translation quality by using the additional L s  − L p and L p  − L t bilingual corpora.  相似文献   

3.
罗毅  李淼  张建 《计算机应用》2007,27(8):1973-1975
描述了一种基于短语统计机器翻译的柱搜索解码器。搜索算法的效率是解码的关键,基于传统的柱搜索解码算法,提出了提高搜索效率的改进措施:动态剪枝策略改进了原来固定地剪枝对搜索当前情形反应不足的问题,提高了剪枝精度;预剪枝策略限制了较差的扩展,减少了不必要的扩展,提高了搜索速度;在研究了当前主要位置重排限制的基础上,提出了一种快速位置重排限制策略,加快了位置重排时的解码速度。此外,针对领域术语翻译唯一性问题提出了专门处理方法以提高翻译的准确度。分析对比实验结果,证明了算法的有效性。  相似文献   

4.
5.
In this paper, we propose and evaluate a novel dynamic feature function for log-linear model combinations in phrase-based statistical machine translation. The feature function is inspired on the popularly known vector-space model which is typically used in information retrieval and text mining applications, and it aims at improving translation unit selection at decoding time by incorporating context information from the source language. Significant improvements on an English-Spanish experimental corpus are presented and discussed.  相似文献   

6.
In this article we present two novel enhancements for the cube pruning and cube growing algorithms, two of the most widely applied methods when using the hierarchical approach to statistical machine translation. Cube pruning is the de facto standard search algorithm for the hierarchical model. We propose to adapt concepts of the source cardinality synchronous search organization as used for standard phrase-based translation to the characteristics of cube pruning. In this way we will be able to improve the performance of the generation process and reduce the average translation time per sentence to approximately one quarter. We will also investigate the cube growing algorithm, a reformulation of cube pruning with on-demand computation. This algorithm depends on a heuristic for the language model, but this issue is barely discussed in the original work. We analyze the behaviour of this heuristic and propose a new one which greatly reduces memory consumption without costs in runtime or translation performance. Results are reported on the German–English Europarl corpus.  相似文献   

7.
The direct simulation Monte Carlo (DSMC) method is a widely used approach for flow simulations having rarefied or nonequilibrium effects. It involves heavily to sample instantaneous values from prescribed distributions using random numbers. In this note, we briefly review the sampling techniques typically employed in the DSMC method and present two techniques to speedup related sampling processes. One technique is very efficient for sampling geometric locations of new particles and the other is useful for the Larsen-Borgnakke energy distribution.  相似文献   

8.
We present a phrase-based statistical machine translation approach which uses linguistic analysis in the preprocessing phase. The linguistic analysis includes morphological transformation and syntactic transformation. Since the word-order problem is solved using syntactic transformation, there is no reordering in the decoding phase. For morphological transformation, we use hand-crafted transformational rules. For syntactic transformation, we propose a transformational model based on a probabilistic context-free grammar. This model is trained using a bilingual corpus and a broad-coverage parser of the source language. This approach is applicable to language pairs in which the target language is poor in resources. We considered translation from English to Vietnamese and from English to French. Our experiments showed significant BLEU-score improvements in comparison with Pharaoh, a state-of-the-art phrase-based SMT system.  相似文献   

9.
未登录词与分词粒度是汉日日汉机器翻译研究的两个主要问题。与英语等西方语言不同,汉语与日语词语间不存在空格,分词为汉日双语处理的重要工作。由于词性标注体系、文法及语义表现上的差异,分词结果的粒度需要进一步调整,以改善统计机器翻译系统的性能。提出了面向统计机器翻译的基于汉日汉字对照表及日汉词典信息的汉语与日语的分词粒度调整方法。实验结果表明,该方法能有效地调节源语言和目标语言端的分词粒度,提高统计机器翻译系统的性能。通过对比实验结果,分析探讨分词粒度对汉日双语统计系统性能的影响。  相似文献   

10.
The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.  相似文献   

11.
The work here explores new numerical methods for supporting a Bayesian approach to parameter estimation of dynamic systems. This is primarily motivated by the goal of providing accurate quantification of estimation error that is valid for arbitrary, and hence even very short length data records. The main innovation is the employment of the Metropolis-Hastings algorithm to construct an ergodic Markov chain with invariant density equal to the required posterior density. Monte Carlo analysis of samples from this chain then provides a means for efficiently and accurately computing posteriors for model parameters and arbitrary functions of them.  相似文献   

12.
针对基于短语统计机器翻译中目前常用的Och提出的短语抽取算法,提出了一种改进算法。该算法能够在原有算法的基础上抽取出更多的准确对齐信息,这对语料库较小的汉民统计机器来说意义重大,增加正确的对齐信息可以减少未登录词的产生,提高翻译正确率。经过对不同规模语料库的实验,抽取的短语对数目有明显增多。  相似文献   

13.
解码器是统计机器翻译研究的关键部分。在基于短语的统计机器翻译的基础上,结合对数线性模型的思想加入多个特征模型,研究了一种动态规划的柱搜索解码算法。详细介绍此算法在解码器中的具体实现,并对翻译速度和精度作了分析。  相似文献   

14.
Modern computers produce large volumes of simulation results so quickly that their management becomes a formidable task. We describe interactive computer software for replicating simulation models with different parameters. A single simulation run then produces results for hundreds of models with different parameter values without the loop overhead imposed by repeated simulations. One can
  • 1.arrange corresponding values of model-parameter values and model performance measures in corresponding arrays suitable for use in commercially available spreadsheet and relational-database programs for further processing and archival storage, or
  • 2.produce a Monte Carlo sample of model runs with random parameter values and compute statistics such as various averages or probability estimates as functions of simulation time in a single simulation run.
  相似文献   

15.
In complex systems with many degrees of freedom such as spin glass and biomolecular systems, conventional simulations in canonical ensemble suffer from the quasi-ergodicity problem. A simulation in generalized ensemble performs a random walk in potential energy space and overcomes this difficulty. From only one simulation run, one can obtain canonical ensemble averages of physical quantities as functions of temperature by the single-histogram and/or multiple-histogram reweighting techniques. In this article we review the generalized ensemble algorithms. Three well-known methods, namely, multicanonical algorithm (MUCA), simulated tempering (ST), and replica-exchange method (REM), are described first. Both Monte Carlo (MC) and molecular dynamics (MD) versions of the algorithms are given. We then present five new generalized-ensemble algorithms which are extensions of the above methods.  相似文献   

16.
We present a syntax-based reordering model (RM) for hierarchical phrase-based statistical machine translation (HPB-SMT) enriched with semantic features. Our model brings a number of novel contributions: (i) while the previous dependency-based RM is limited to the reordering of head and dependant constituent pairs, we also model the reordering of pairs of dependants; (ii) Our model is enriched with semantic features (Wordnet synsets) in order to allow the reordering model to generalize to pairs not seen in training but with equivalent meaning. (iii) We evaluate our model on two language directions: English-to-Farsi and English-to-Turkish. These language pairs are particularly challenging due to the free word order, rich morphology and lack of resources of the target languages.We evaluate our RM both intrinsically (accuracy of the RM classifier) and extrinsically (MT). Our best configuration outperforms the baseline classifier by 5–29% on pairs of dependants and by 12–30% on head and dependant pairs while the improvement on MT ranges between 1.6% and 5.5% relative in terms of BLEU depending on language pair and domain. We also analyze the value of the feature weights to obtain further insights on the impact of the reordering-related features in the HPB-SMT model. We observe that the features of our RM are assigned significant weights and that our features are complementary to the reordering feature included by default in the HPB-SMT model.  相似文献   

17.
Phrase-based translation models, with sequences of words (phrases) as translation units, achieve state-of-the-art translation performance. However, phrase reordering is a major challenge for this model. Recently, researchers have focused on utilizing syntax to improve phrase reordering. In adding syntactic knowledge into phrase reordering model, using handcrafted or probabilistic syntactic rules to reorder the source-language approximating the target-language word order has been successful in improving translation quality. However, it suffers from propagating the pre-ordering errors to the later translation step (e.g. decoding). In this paper, we propose a novel framework to uniformly represent the handcrafted and probabilistic syntactic rules and integrate them more effectively into phrase-based translation. In the translation phase, for a source sentence to be translated, handcrafted or probabilistic syntactic rules are first acquired from the source parse tree prior to translation, and then instead of reordering the source sentence directly, we input these rules into the decoder and design a new algorithm to apply these rules during decoding. In order to attach more importance to the syntactic rules and distinguish reordering between syntactic and non-syntactic unit reordering, we propose to design respectively a syntactic reordering model and a non-syntactic reordering model. The syntactic rules will guide phrase reordering in decoding within the syntactic reordering model. Extensive experiments on Chinese-to-English translation show that our approach, whether incorporating handcrafted or probabilistic syntactic rules, significantly outperforms the previous methods.  相似文献   

18.
The Barcelonagram is a Monte Carlo simulator recently designed in order to take account of the behaviour of living systems. In this paper we apply this technique to real bacterial growth in different and significant experimental conditions, namely (i) the growth of the Serratia marcescens in a minimal glucose-limited medium, (ii) the temperature effect on the anaerobic growth of the same strain, (iii) the growth of the Escherichia coli in a minimal medium and (iv) the normal specific growth rate of bacterial populations against the available substrate concentration. In the context of these different cases we discuss the diverse contributions of these simulated results to the understanding of the microbiological processes and the general reliability of the simulation considered as a third alternative besides both (and together with!) experience and mathematical modelling.  相似文献   

19.
Bilingual termbanks are important for many natural language processing applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. The initial candidate terminology list is prepared by taking all arbitrary n-gram word sequences from the corpus. Then, a well-known statistical measure (the Dice coefficient) is employed in order to remove any multi-word terms with weak associations from the candidate term list. Thereafter, the log-likelihood comparison method is applied to rank the phrasal candidate term list. Then, using a phrase-based statistical machine translation model, we create a bilingual terminology with the extracted monolingual term lists. We integrate an external knowledge source—the Wikipedia cross-language link databases—into the terminology extraction (TE) model to assist two processes: (a) the ranking of the extracted terminology list, and (b) the selection of appropriate target terms for a source term. First, we report the performance of our monolingual TE model compared to a number of the state-of-the-art TE models on English-to-Turkish and English-to-Hindi data sets. Then, we evaluate our novel bilingual TE model on an English-to-Turkish data set, and report the automatic evaluation results. We also manually evaluate our novel TE model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains.  相似文献   

20.
The fast radiosity-type methods for very complex diffuse environments, introduced herein, present a nearly linear-time solution. The outlined procedures rely on recursive algorithms with stochastic convergence for solving the radiosity equation system. Approximations of gathering and shooting at very low computational cost—rather than the exact matrix of a single reflection—are used. The efficiency of the methods will be increased by applying variance reduction techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号