共查询到20条相似文献,搜索用时 295 毫秒
1.
关晓薇 《计算机工程与应用》2008,44(5):22-24
提出了汉语时间信息的新分类和时间模式的概念,基于时间模式对汉语句子的时间信息进行形式化,构建汉语句子的词汇信息和语法信息时间模式库;提出多策略汉语句子时间分析和英译方法,将汉语单句时间分析算法、汉语关联词语标记句时间分析算法、类虚拟语气句时间分析算法和篇章信息识别规则相结合。实验表明该方法能有效解决汉英机器翻译中汉语句子时间分析和英译问题。 相似文献
2.
3.
4.
5.
基于概率统计技术和规则方法的新词发现 总被引:9,自引:1,他引:8
新词/短语的识别是自然语言处理、信息检索和机器翻译等领域的一项基础研究。该文分析了已有短语抽取技术,并结合汉语特点,提出了基于概率统计技术和规则方法相结合的概念抽取方法。该方法包括高效的“二元语法”统计模型、统计算法、统计选词策略、丰富的规则知识和规则过滤算法。实验证明该方法适用于从大规模语料库中自动高效地发现新词/短语。 相似文献
6.
该研究以型式语法为理论基础,通过链语法形式化语法体系对动词型式进行了形式化,并对链语法动词词典进行了重构,旨在构建一个更好的面向中国学生的英语书面语动词形式错误检查系统。测试结果显示,重构后链语法词典的查错性能和句法分析能力得到提高。对错句检查的召回率比原词典提高了4.5%,准确率提高了15.7%;对本族者正确分析句子的准确率提高了12.2%。研究表明,该研究所基于的语言学理论(动词型式语法)和形式模型(链语法)可以较好地适用于中国学生书面英语动词形式错误检查系统的构建。 相似文献
7.
8.
基于词类串的汉语句子结构相似度计算方法 总被引:9,自引:1,他引:9
句子相似度的衡量是基于实例机器翻译研究中最重要的一个内容。对于基于实例的汉英机器翻译研究,汉语句子相似度衡量的准确性,直接影响到最后翻译结果的输出。本文提出了一种汉语句子结构相似性的计算方法。该方法比较两个句子的词类信息串,进行最优匹配,得到一个结构相似性的值。在小句子集上的初步实验结果表明,该方法可行,有效,符合人的直观判断。 相似文献
9.
10.
11.
Toward a lexicalized grammar for interlinguas 总被引:1,自引:0,他引:1
In this paper we present one aspect of our research on machine translation (MT): capturing the grammatical and computational relation between (i) the interlingua (IL) as defined declaratively in the lexicon and (ii) the IL as defined procedurally by way of algorithms that compose and decompose pivot IL forms. We begin by examining the interlinguas in the lexicons of a variety of current IL-based approaches to MT. This brief survey makes it clear that no consensus exists among MT researchers on the level of representation for defining the IL. In the section that follows, we explore the consequences of this missing formal framework for MT system builders who develop their own lexical-IL entries. The lack of software tools to support rapid IL respecification and testing greatly hampers their ability to modify representations to handle new data and new domains. Our view is that IL-based MT research needs both (a) the formal framework to specify possible IL grammars and (b) the software support tools to implement and test these grammars. With respect to (a), we propose adopting a lexicalized grammar approach, tapping research results from the study oftree grammars for natural language syntax. With respect to (b), we sketch the design and functional specifications for parts of ILustrate, the set of software tools that we need to implement and test the various IL formalisms that meet the requirements of a lexicalized grammar. In this way, we begin to address a basic issue in MT research, how to define and test an interlingua as a computational language — without building a full MT system for each possible IL formalism that might be proposed. 相似文献
12.
Stavroula-Evita Fotinea Eleni Efthimiou George Caridakis Kostas Karpouzis 《Universal Access in the Information Society》2008,6(4):405-418
This paper presents the modules that comprise a knowledge-based sign synthesis architecture for Greek sign language (GSL).
Such systems combine natural language (NL) knowledge, machine translation (MT) techniques and avatar technology in order to
allow for dynamic generation of sign utterances. The NL knowledge of the system consists of a sign lexicon and a set of GSL
structure rules, and is exploited in the context of typical natural language processing (NLP) procedures, which involve syntactic
parsing of linguistic input as well as structure and lexicon mapping according to standard MT practices. The coding on linguistic
strings which are relevant to GSL provide instructions for the motion of a virtual signer that performs the corresponding
signing sequences. Dynamic synthesis of GSL linguistic units is achieved by mapping written Greek structures to GSL, based
on a computational grammar of GSL and a lexicon that contains lemmas coded as features of GSL phonology. This approach allows
for robust conversion of written Greek to GSL, which is an essential prerequisite for access to e-content by the community
of native GSL signers. The developed system is sublanguage oriented and performs satisfactorily as regards its linguistic
coverage, allowing for easy extensibility to other language domains. However, its overall performance is subject to current
well known MT limitations. 相似文献
13.
One may indicate the potentials of an MT system by stating what text genres it can process, e.g., weather reports and technical manuals. This approach is practical, but misleading, unless domain knowledge is highly integrated in the system. Another way to indicate which fragments of language the system can process is to state its grammatical potentials, or more formally, which languages the grammars of the system can generate. This approach is more technical and less understandable to the layman (customer), but it is less misleading, since it stresses the point that the fragments which can be translated by the grammars of a system need not necessarily coincide exactly with any particular genre. Generally, the syntactic and lexical rules of an MT system allow it to translate many sentences other than those belonging to a certain genre. On the other hand it probably cannot translate all the sentences of a particular genre. Swetra is a multilanguage MT system defined by the potentials of a formal grammar (standard referent grammar) and not by reference to a genre. Successful translation of sentences can be guaranteed if they are within a specified syntactic format based on a specified lexicon. The paper discusses the consequences of this approach (Grammatically Restricted Machine Translation, GRMT) and describes the limits set by a standard choice of grammatical rules for sentences and clauses, noun phrases, verb phrases, sentence adverbials, etc. Such rules have been set up for English, Swedish and Russian, mainly on the basis of familiarity (frequency) and computer efficiency, but restricting the grammar and making it suitable for several languages poses many problems for optimization. Sample texts — newspaper reports — illustrate the type of text that can be translated with reasonable success among Russian, English and Swedish. 相似文献
14.
This paper presents a methodology for evaluating Arabic Machine Translation (MT) systems. We are specifically interested in evaluating lexical coverage, grammatical coverage, semantic correctness and pronoun resolution correctness. The methodology presented is statistical and is based on earlier work on evaluating MT lexicons in which the idea of the importance of a specific word sense to a given application domain and how its presence or absence in the lexicon affects the MT system’s lexical quality, which in turn will affect the overall system output quality. The same idea is used in this paper and generalized so as to apply to grammatical coverage, semantic correctness and correctness of pronoun resolution. The approach adopted in this paper has been implemented and applied to evaluating four English-Arabic commercial MT systems. The results of the evaluation of these systems are presented for the domain of the Internet and Arabization. 相似文献
15.
蜕变测试技术综述 总被引:4,自引:0,他引:4
软件测试是一种重要的、不可缺少的软件质量保证技术,用于发现和纠正软件中存在的缺陷和错误,但在很多情况下待测程序的预期输出难以确定。蜕变测试技术通过检查程序的多个执行结果之间的关系来测试程序,可以有效地解决上述问题。经过近十年的研究,蜕变测试技术已经在测试过程的优化、与其他验证或测试方法的结合等方面取得了巨大的进展,并被广泛地应用于各个领域中。对当前蜕变测试技术的研究进行了综述,针对已有方法的不足之处,对未来的研究方向进行了展望,包括蜕变测试充分性研究、实用蜕变关系构造技术、实用原始测试用例选取技术、新型软件中蜕变测试技术的研究、蜕变测试工具的开发等。 相似文献
16.
Necessary and sufficient conditions are derived for the weights of a generalized correlation matrix of a bidirectional associative memory (BAM) which guarantee the recall of all training pairs. A linear programming/multiple training (LP/MT) method that determines weights which satisfy the conditions when a solution is feasible is presented. The sequential multiple training (SMT) method is shown to yield integers for the weights, which are multiplicities of the training pairs. Computer simulation results, including capacity comparisons of BAM, LP/MT BAM, and SMT BAM, are presented. 相似文献
17.
Gregor Thurmair 《Computers and the Humanities》1991,25(2-3):115-128
This paper describes developments in the area of machine translation (MT). First, the paper gives an overview of developments in Germany in general; then, special problems are discussed. The system taken as an example is METAL (Machine Translation and Analysis of Natural Language), where recent development work has centered around two main topics. (i) Efforts have been made to make the system really multilingual. The German-to-English prototype had to be expanded, some system components had to be readjusted, and additional problems had to be solved. Currently, analysis and synthesis components for German, English, French, Spanish, and Dutch are under development. All these languages use a common system kernel and a standard interface structure. (ii) The system had to be made user-friendly. This was an even more important task as, up to now, MT systems have not been well accepted by users. METAL tries to be more realistic, and also tries to support the main user interfaces in a much better way than has been done before. This is based on the conviction that there are several parameters which determine the real success of an MT system. It is not just translation quality which is decisive, it is also the integration of an MT system into the whole process of preparing and translating documents.Gregor Thurmair is head of the Linguistics Department at Siemens Nixdorf Information Systems and project leader of the machine translation group, METAL. He is involved in projects in information retrieval (morphological analysis), speech understanding (parsing, semantics) and machine translation (METAL system). He has presented papers on morphology, semantics in speech understanding, transfer problems in MT, and grammar checking. 相似文献
18.
MT信号现场处理的实现技术研究 总被引:2,自引:0,他引:2
为满足大地电磁测探(MT)信号现场处理的要求,本文在分析与优化MT信号处理算法的基础上,提出以TMS320C30高速浮点数字信号处理器构成主从式信号处理系统,其运算速度比用Fortran及其他语言编写的程序在IBMPC机上运行要快几十倍,运算精度比有定点数字信号处理器TMS32020高,文中讨论了软件设计方法并给测试结果,本文对解决MT信号野外采集的现场处理的计算瓶颈提供了一种有效的途径。 相似文献
19.
Mark Przybocki Kay Peterson Sébastien Bronsart Gregory Sanders 《Machine Translation》2009,23(2-3):71-103
This paper discusses the evaluation of automated metrics developed for the purpose of evaluating machine translation (MT) technology. A general discussion of the usefulness of automated metrics is offered. The NIST MetricsMATR evaluation of MT metrology is described, including its objectives, protocols, participants, and test data. The methodology employed to evaluate the submitted metrics is reviewed. A summary is provided for the general classes of evaluated metrics. Overall results of this evaluation are presented, primarily by means of correlation statistics, showing the degree of agreement between the automated metric scores and the scores of human judgments. Metrics are analyzed at the sentence, document, and system level with results conditioned by various properties of the test data. This paper concludes with some perspective on the improvements that should be incorporated into future evaluations of metrics for MT evaluation. 相似文献
20.
基于WordNet词义消歧的系统融合 总被引:3,自引:3,他引:0
最近混淆网络在融合多个机器翻译结果中展示很好的性能. 然而为了克服在不同的翻译系统中不同的词序, 假设对齐在混淆网络的构建上仍然是一个重要的问题. 但以往的对齐方法都没有考虑到语义信息. 本文为了更好地改进系统融合的性能, 提出了用词义消歧(Word sense disambiguation, WSD)来指导混淆网络中的对齐. 同时骨架翻译的选择也是通过计算句子间的相似度来获得的, 句子的相似性计算使用了二分图的最大匹配算法. 为了使得基于WordNet词义消歧方法融入到系统中, 本文将翻译错误率(Translation error rate, TER)算法进行了改进, 实验结果显示本方法的性能好于经典的TER算法的性能. 相似文献