首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
基于短语的阿拉伯语到中文的机器翻译系统*   总被引:1,自引:1,他引:0  
使用基于短语的统计翻译方法,搭建了一个简易的阿拉伯语到中文的翻译系统。核心的解码器采用了loglinear直接翻译模型进行开发,在系统中使用了大量的开源软件进行语料库的预处理,并讨论了该方向上尚未解决的问题和未来的发展趋势。  相似文献   

2.
The interlingual approach to machine translation (MT) is used successfully in multilingual translation. It aims to achieve the translation task in two independent steps. First, meanings of the source-language sentences are represented in an intermediate language-independent (Interlingua) representation. Then, sentences of the target language are generated from those meaning representations. Arabic natural language processing in general is still underdeveloped and Arabic natural language generation (NLG) is even less developed. In particular, Arabic NLG from Interlinguas was only investigated using template-based approaches. Moreover, tools used for other languages are not easily adaptable to Arabic due to the language complexity at both the morphological and syntactic levels. In this paper, we describe a rule-based generation approach for task-oriented Interlingua-based spoken dialogue that transforms a relatively shallow semantic interlingual representation, called interchange format (IF), into Arabic text that corresponds to the intentions underlying the speaker’s utterances. This approach addresses the handling of the problems of Arabic syntactic structure determination, and Arabic morphological and syntactic generation within the Interlingual MT approach. The generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic generator is implemented in SICStus Prolog. We conducted evaluation experiments using the input and output from the English analyzer that was developed by the NESPOLE! team at Carnegie Mellon University. The results of these experiments were promising and confirmed the ability of the rule-based approach in generating Arabic translation from the Interlingua taken from the travel and tourism domain.  相似文献   

3.
4.
5.

Question answering is a subfield of information retrieval. It is a task of answering a question posted in a natural language. A question answering system (QAS) may be considered a good alternative to search engines that return a set of related documents. The QAS system is composed of three main modules; question analysis, passage retrieval, and answer extraction. Over the years, numerous QASs have been presented for use in different languages. However, the the development of Arabic QASs has been slowed by linguistic challenges and the lack of resources and tools available to researchers. In this survey, we start with the challenges due to the language and how these challenges make the development of new Arabic QAS more difficult. Next, we do a detailed review of several Arabic QASs. This is followed by an in-depth analysis of the techniques and approaches in the three modules of a QAS. We present an overview of important and recent tools that were developed to help the researchers in this field. We also cover the available Arabic and multilingual datasets, and a look at the different measures used to assess QASs. Finally, the survey delves into the future direction of Arabic QAS systems based on the current state-of-the-art techniques developed for question answering in other languages.

  相似文献   

6.
7.
Morphologically rich languages pose a challenge for statistical machine translation (SMT). This challenge is magnified when translating into a morphologically rich language. In this work we address this challenge in the framework of a broad-coverage English-to-Arabic phrase based statistical machine translation (PBSMT). We explore the largest-to-date set of Arabic segmentation schemes ranging from full word form to fully segmented forms and examine the effects on system performance. Our results show a difference of 2.31 BLEU points averaged over all test sets between the best and worst segmentation schemes indicating that the choice of the segmentation scheme has a significant effect on the performance of an English-to-Arabic PBSMT system in a large data scenario. We show that a simple segmentation scheme can perform as well as the best and more complicated segmentation scheme. An in-depth analysis on the effect of segmentation choices on the components of a PBSMT system reveals that text fragmentation has a negative effect on the perplexity of the language models and that aggressive segmentation can significantly increase the size of the phrase table and the uncertainty in choosing the candidate translation phrases during decoding. An investigation conducted on the output of the different systems, reveals the complementary nature of the output and the great potential in combining them.  相似文献   

8.
9.
In this article, we investigate different methodologies of Arabic segmentation for statistical machine translation by comparing a rule-based segmenter to different statistically-based segmenters. We also present a method for segmentation that serves the needs of a real-time translation system without impairing the translation accuracy. Second, we report on extended lexicon models based on triplets that incorporate sentence-level context during the decoding process. Results are presented on different translation tasks that show improvements in both BLEU and TER scores.  相似文献   

10.
In this paper, we present a system that automatically translates Arabic text embedded in images into English. The system consists of three components: text detection from images, character recognition, and machine translation. We formulate the text detection as a binary classification problem and apply gradient boosting tree (GBT), support vector machine (SVM), and location-based prior knowledge to improve the F1 score of text detection from 78.95% to 87.05%. The detected text images are processed by off-the-shelf optical character recognition (OCR) software. We employ an error correction model to post-process the noisy OCR output, and apply a bigram language model to reduce word segmentation errors. The translation module is tailored with compact data structure for hand-held devices. The experimental results show substantial improvements in both word recognition accuracy and translation quality. For instance, in the experiment of Arabic transparent font, the BLEU score increases from 18.70 to 33.47 with use of the error correction module.  相似文献   

11.
Hebrew and Arabic are related but mutually incomprehensible languages with complex morphology and scarce parallel corpora. Machine translation between the two languages is therefore interesting and challenging. We discuss similarities and differences between Hebrew and Arabic, the benefits and challenges that they induce, respectively, and their implications on machine translation. We highlight the shortcomings of using English as a pivot language and advocate a direct, transfer-based and linguistically-informed (but still statistical, and hence scalable) approach. We report preliminary results of the two systems we are currently developing, for translation in both directions.  相似文献   

12.
In the last decade the dominant models of MT have been data-driven or corpus-based. Of the two main trends, statistical machine translation and example-based machine translation (EBMT), the latter is much less clearly defined. In a review of the recently published collection edited by Michael Carl and Andy Way, this essay surveys the basic processes, methods, main problems and tasks of EBMT, and attempts to provide a definition of the essence of EBMT in comparison with statistical MT and traditional rule-based MT. Recent Advances in Example-based Machine Translation. Edited by Michael Carl and Andy Way. Dordrecht: Kluwer Academic Publishers, 2003. xxxi, 482pp. (Text, Speech and Language Technology, vol. 21) ISBN: 1-4020-1400-7 (hardback), 1-4020-1401-5 (paperback).  相似文献   

13.
Universal Access in the Information Society - Arabic sign language (ArSL) is a full natural language that is used by the deaf in Arab countries to communicate in their community. Unfamiliarity with...  相似文献   

14.
This paper provides an overview of the KBMT-89 project at Carmegie Mellon University's Center for Machine Translation, as well therefore of the special number of this journal, which reports on the project. The knowledge-based approach to machine translation is presented and defended in a historical context. Various components of the system, key parts of which are described in subsequent papers of the issue, are introduced and paired with their computational motivations.  相似文献   

15.
Machine translation is traditionally formulated as the transduction of strings of words from the source to the target language. As a result, additional lexical processing steps such as morphological analysis, transliteration, and tokenization are required to process the internal structure of words to help cope with data-sparsity issues that occur when simply dividing words according to white spaces. In this paper, we take a different approach: not dividing lexical processing and translation into two steps, but simply viewing translation as a single transduction between character strings in the source and target languages. In particular, we demonstrate that the key to achieving accuracies on a par with word-based translation in the character-based framework is the use of a many-to-many alignment strategy that can accurately capture correspondences between arbitrary substrings. We build on the alignment method proposed in Neubig et al. (Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Portland, Oregon, pp. 632–641, 2011), improving its efficiency and accuracy with a focus on character-based translation. Using a many-to-many aligner imbued with these improvements, we demonstrate that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-based translation for two distant language pairs.  相似文献   

16.
METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their “home” languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions.  相似文献   

17.
NEC has been developing a Japanese-English bi-directional machine translation system called VENUS (Vehicle for Natural language Understanding & Synthesis) in order to reduce the increasing cost of the manual translation of vast amounts of in-house technical documents. In addition, a translation support subsystem has been developed on the basis of VENUS, and extended to have the requisite facilities to prepare translated documents, such as document entry, editing, translation, printing, management, etc.This paper briefly introduces the current status of the VENUS translation system, and the basic idea for the system development.  相似文献   

18.
Due to the rapid advancement of both computer technology and linguistic theory, machine translation systems are now coming into practical use.Fujitsu has two machine translation systems, ATLAS-I is a syntax-based machine translation system which translates English into Japanese. ATLAS II is a semantic-based system which aims at high quality multilingual translation. In this paper, both the ATLAS-I and ATLAS II translation mechanisms are explained.  相似文献   

19.
As the cognitive processes of natural language understanding and generation are better understood, it is becoming easier, nowadays, to perform machine translation. In this paper we present our work on machine translation from Arabic to English and French, and illustrate it with a fully operational system, which runs on PC compatibles with Arabic/Latin interface. This system is an extension of an earlier system, whose task was the analysis of the natural language Arabic. Thanks to the regularity of its phrase structures and word patterns, Arabic lends itself quite naturally to a Fillmore-like analysis. The meaning of a phrase is stored in a star-like data structure, where the verb occupies the center of the star and the various noun sentences occupy specific peripheral nodes of the star. The data structure is then translated into an internal representation in the target language, which is then mapped into the target text.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号