首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
从双语语料中获取翻译模板   总被引:1,自引:0,他引:1       下载免费PDF全文
翻译模板自动获取是提高MT译文输出质量和领域适应能力的关键性因素。利用Tree-to-String方法抽取等价对,使用错误驱动的学习方法从中获取翻译模板并进行优化。将优化后的翻译模板用于一个基于转换的机器翻译系统中,同时使用“863”对话语料对其进行评测。实验结果表明:当使用自动获取并经优化的模板进行翻译时,开放测试语料的译文评测分数有一定程度的提高。  相似文献   

2.
The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual resources are usually more difficult to find than “shallow” monolingual resources such as morphological dictionaries or part-of-speech taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian Portuguese–Spanish and Brazilian Portuguese–English parallel texts. The results show that the proposed methodology can enable the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation and other natural language processing tasks).  相似文献   

3.
4.
在当前的基于统计的翻译方法中,双语语料库的规模、词对齐的准确率对于翻译系统的性能有很大的影响。虽然大规模语料库可以改善词语对齐的准确度,提高系统的性能,但同时会以增加系统的负载为代价,因此目前对于统计机器翻译方法的研究在使用大规模语料库的基础上,同时寻求其他可以提高系统性能的方法。针对以上问题,提出一种把双语词典应用在统计机器翻译中的方法,不仅优化了词对齐的准确率,而且得出质量更高的翻译结果,在一定程度上缓解了数据稀疏问题。  相似文献   

5.
在统计机器翻译中融入有价值的句法层面的语言学知识,对于推动统计机器翻译的发展,具有重要的理论意义和应用价值。提出了三种由简到繁的把双语最大名词短语融入到统计翻译模型的策略,整体翻译性能逐步上升。Method-III采用“分而治之”的策略,以“硬约束”的方式在统计机器翻译中融入最大名词短语,并在双语最大名词短语层面上,融合了短语翻译模型和层次短语模型,对翻译系统的改善最显著。所述策略显著提高了短语翻译模型的质量,在复杂长句翻译中,Method-III的BLEU值比基于短语的基线翻译模型提高了3.03%。  相似文献   

6.
Universal Access in the Information Society - An equal opportunity for all is the basic right of every human being. The deaf society of the world needs to have access to all the information just...  相似文献   

7.
Bilingual termbanks are important for many natural language processing applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. The initial candidate terminology list is prepared by taking all arbitrary n-gram word sequences from the corpus. Then, a well-known statistical measure (the Dice coefficient) is employed in order to remove any multi-word terms with weak associations from the candidate term list. Thereafter, the log-likelihood comparison method is applied to rank the phrasal candidate term list. Then, using a phrase-based statistical machine translation model, we create a bilingual terminology with the extracted monolingual term lists. We integrate an external knowledge source—the Wikipedia cross-language link databases—into the terminology extraction (TE) model to assist two processes: (a) the ranking of the extracted terminology list, and (b) the selection of appropriate target terms for a source term. First, we report the performance of our monolingual TE model compared to a number of the state-of-the-art TE models on English-to-Turkish and English-to-Hindi data sets. Then, we evaluate our novel bilingual TE model on an English-to-Turkish data set, and report the automatic evaluation results. We also manually evaluate our novel TE model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains.  相似文献   

8.
The paper discusses a methodology of machine assisted translation (MAT) as an alternative to fully automatic MT. A prototype MAT system is described, which is an integration of a dictionary database system and a text editor. The functional requests for such a system from the linguistic point of view and some general problems of designing and implementing such systems are presented. Special attention is given to language-dependence, and to the problem of completeness and efficiency of the linguistic data representation for a very simple system. A statistic analysis of English inflexion and word derivation patterns is presented.  相似文献   

9.
10.
Key concept extraction is a major step for ontology learning that aims to build an ontology by identifying relevant domain concepts and their semantic relationships from a text corpus. The success of ontology development using key concept extraction strongly relies on the degree of relevance of the key concepts identified. If the identified key concepts are not closely relevant to the domain, the constructed ontology will not be able to correctly and fully represent the domain knowledge. In this paper, we propose a novel method, named CFinder, for key concept extraction. Given a text corpus in the target domain, CFinder first extracts noun phrases using their linguistic patterns based on Part-Of-Speech (POS) tags as candidates for key concepts. To calculate the weights (or importance) of these candidates within the domain, CFinder combines their statistical knowledge and domain-specific knowledge indicating their relative importance within the domain. The calculated weights are further enhanced by considering an inner structural pattern of the candidates. The effectiveness of CFinder is evaluated with a recently developed ontology for the domain of ‘emergency management for mass gatherings’ against the state-of-the-art methods for key concept extraction including—Text2Onto, KP-Miner and Moki. The comparative evaluation results show that CFinder statistically significantly outperforms all the three methods in terms of F-measure and average precision.  相似文献   

11.
12.
In recent years, several authors have presented algorithms that locate instances of a given string, or set of strings, within a text. Recently, authors have given less consideration to the complementary problem of processing a text to find out what strings appear in the text, without any preconceived notion of what strings might be present. A system called PATRICIA, which was developed two decades ago, is an implementation of a solution to this problem. The design of PATRICIA is very tightly bound to the assumptions that individual string elements are bits and that the user of the system can provide complete lists of starting and stopping places for strings. This paper presents an approach that drops these assumptions. Our method allows different definitions of indivisible string elements for different applications, and the only information the user provides for the determination of the beginning and ends of strings is a specification of a maximum length for output strings. This paper also describes a portable C implementation of the method, called PORTREP. The primary data structure of PORTREP is a trie represented as a ternary tree. PORTREP has a method for eliminating redundancy from the output, and it can function with a bounded number of nodes by employing a heuristic process that reuses seldom-visited nodes. Theoretical analysis and empirical studies, reported here, give confidence in the efficiency of the algorithms. PORTREP has the ability to form the basis for a variety of text-analysis applications, and this paper considers one such application, automatic document indexing.  相似文献   

13.
Bilingual multiword expression extraction is always a significant problem in extracting meaning from free text. This involves analyzing large amounts of textual information. In this paper we propose a text mining approach to extract bilingual multiword expression. Both statistic and rule-based methods are employed into the system. There are two phases in the extraction process. In the first phase, lots of candidates are extracted from the corpus by statistic methods. The algorithm of multiple sequence alignment is sensitive to the flexible multiword. In the second phase, error-driven rules and patterns are extracted from corpus. For acquired high qualified instances, the manual work with active learning is also performed in sample selection. These trained rules are used to filter the candidates. Bilingual comparisons are used in a parallel corpus. Parts of bilingual syntactic patterns are obtained from the bilingual phrase dictionary. Some related experiments are designed for achieving the best performance because there are lots of parameters in this system. Experimental results showed our approach gains good performance.  相似文献   

14.
鱼探仪是一种智能化助渔设备,换能器的垂直安装将能有效地检测下方水域鱼群动态。提出一种基于AA32416对数放大器的新型鱼探仪设计,首先通过主控模块,发射模块,接收模块,回波信号处理模块实现探测。然后通过软件程序实现对数据的处理分析。现场实验表明:设计的鱼探仪能够达到预期指标,具有一定实用价值和科研意义。  相似文献   

15.
As the cognitive processes of natural language understanding and generation are better understood, it is becoming easier, nowadays, to perform machine translation. In this paper we present our work on machine translation from Arabic to English and French, and illustrate it with a fully operational system, which runs on PC compatibles with Arabic/Latin interface. This system is an extension of an earlier system, whose task was the analysis of the natural language Arabic. Thanks to the regularity of its phrase structures and word patterns, Arabic lends itself quite naturally to a Fillmore-like analysis. The meaning of a phrase is stored in a star-like data structure, where the verb occupies the center of the star and the various noun sentences occupy specific peripheral nodes of the star. The data structure is then translated into an internal representation in the target language, which is then mapped into the target text.  相似文献   

16.
Although there is no machine learning technique that fully meets human requirements, finding a quick and efficient translation mechanism has become an urgent necessity, due to the differences between the languages spoken in the world’s communities and the vast development that has occurred worldwide, as each technique demonstrates its own advantages and disadvantages. Thus, the purpose of this paper is to shed light on some of the techniques that employ machine translation available in literature, to encourage researchers to study these techniques. We discuss some of the linguistic characteristics of the Arabic language. Features of Arabic that are related to machine translation are discussed in detail, along with possible difficulties that they might present. This paper summarizes the major techniques used in machine translation from Arabic into English, and discusses their strengths and weaknesses.  相似文献   

17.
As data exploration has increased rapidly in recent years, the datastore and data processing are getting more and more attention in extracting important information. To find a scalable solution to process the large-scale data is a critical issue in either the relational database system or the emerging NoSQL database. With the inherent scalability and fault tolerance of Hadoop, MapReduce is attractive to process the massive data in parallel. Most of previous researches focus on developing the SQL or SQL-like queries translator with the Hadoop distributed file system. However, it could be difficult to update data frequently in such file system. Therefore, we need a flexible datastore as HBase not only to place the data over a scale-out storage system, but also to manipulate the changeable data in a transparent way. However, the HBase interface is not friendly enough for most users. A GUI composed of SQL client application and database connection to HBase will ease the learning curve. In this paper, we propose the JackHare framework with SQL query compiler, JDBC driver and a systematical method using MapReduce framework for processing the unstructured data in HBase. After importing the JDBC driver to a SQL client GUI, we can exploit the HBase as the underlying datastore to execute the ANSI-SQL queries. Experimental results show that our approaches can perform well with efficiency and scalability.  相似文献   

18.
In this paper, we present a novel face detection approach based on a convolutional neural architecture, designed to robustly detect highly variable face patterns, rotated up to /spl plusmn/20 degrees in image plane and turned up to /spl plusmn/60 degrees, in complex real world images. The proposed system automatically synthesizes simple problem-specific feature extractors from a training set of face and nonface patterns, without making any assumptions or using any hand-made design concerning the features to extract or the areas of the face pattern to analyze. The face detection procedure acts like a pipeline of simple convolution and subsampling modules that treat the raw input image as a whole. We therefore show that an efficient face detection system does not require any costly local preprocessing before classification of image areas. The proposed scheme provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases. We present extensive experimental results illustrating the efficiency of the proposed approach on difficult test sets and including an in-depth sensitivity analysis with respect to the degrees of variability of the face patterns.  相似文献   

19.
In a bilingual civil society, such as that in Wales, language and the use of language can be a highly political issue. Within this context, web sites may act as a beneficial influence on the maintenance and revitalisation of the minority language, or may serve to exclude and marginalise that language. Through a study of existing web sites, this paper examines the extent to which the Welsh language is being presented as a usable tool through which individuals may be informed about and participate in civil society in Wales. While this work specifically considers Wales, the issues faced are similar to those of many other bilingual communities.  相似文献   

20.
In a heterogeneous database system, a query for one type of database system (i.e., a source query) may have to be translated to an equivalent query (or queries) for execution in a different type of database system (i.e., a target query). Usually, for a given source query, there is more than one possible target query translation. Some of them can be executed more efficiently than others by the receiving database system. Developing a translation procedure for each type of database system is time-consuming and expensive. We abstract a generic hierarchical database system (GHDBS) which has properties common to database systems whose schema contains hierarchical structures (e.g., System 2000, IMS, and some object-oriented database systems). We develop principles of query translation with GHDBS as the receiving database system. Translation into any specific system can be accomplished by a translation into the general system with refinements to reflect the characteristics of the specific system. We develop rules that guarantee correctness of the target queries, where correctness means that the target query is equivalent to the source query. We also provide rules that can guarantee a minimum number of target queries in cases when one source query needs to be translated to multiple target queries. Since the minimum number of target queries implies the minimum number of times the underlying system is invoked, efficiency is taken into consideration  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号