首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
李玉鉴 《计算机科学》2004,31(5):172-175
本文提出了一种新的机器翻译方法,即基于UAMRT的机器翻译。该方法的基本思想非常简单:首先设计模板匹配替换通用算法UAMRT,然后利用UAMRT匹配句子中的源语言模板,并将其替换为相应的目标语言模板,从而实现对句子的翻译。在结合句型分析算法和从句分析算法的基础上,利用启发式搜索机制进一步提高了句子的翻译速度和质量。速度测试表明用该方法实现的英汉翻译系统在P-IV1.7G的计算机上翻译速度每秒可以达到1300个单词左右;质量测试表明该系统的性能在开发过程中仅仅通过增加更多的模板就会变得越来越好,而且在应用时与几种商用系统相比可以达到中等水平。  相似文献   

2.
The paper addresses the problem of generating sentences from logical formulae. It describes a simple and efficient algorithm for generating text which has been developed for use in machine translation, but will have wider application in natural language processing. An important property of the algorithm is that the logical form used to generate a sentence need not be one which could have been produced by parsing the sentence: formal equivalence between logical forms is allowed for. This is necessary for a machine translation system, such as the one envisaged in this paper, which uses single declarative grammars of individual languages, and declarative statements of translation equivalences for transfer. In such a system, it cannot be guaranteed that transfer will produce a logical form in the same order as would have been produced by parsing some target-language sentence, and it is not practicable to define a normal form for the logical forms. The algorithm is demonstrated using a categorial grammar and a simple indexed logic, as this allows a particularly clear and elegant formulation. It is shown that the algorithm can be adapted to phrase-structure grammars, and to more complex semantic representations than that used here.  相似文献   

3.
多通道自然人机对话系统要求计算机能够对用户的语句产生智能应答,传统的人机对话系统由于知识库的限制以及用户话语的随意性,当对话内容超出知识库范围时,系统将无法应答或产生与用户期望不符的回答,这在一定程度上影响了人机对话系统用户的体验感.为了解决该问题,提出了一种融合多模态历史交互信息和面向数据的句法分析(data-oriented parsing,简称DOP)模型的最优答句生成方法:首先从大规模句法树库中提取上下文无关文法的语法规则,然后结合对话过程中用户呈现的表情、姿态等多模态历史交互信息,融合DOP模型对上下文无关文法生成的汉语句子进行过滤,最终生成一个符合语法规则且符合语义的答句返回给用户,让计算机在无法获得知识库支撑时,根据交互历史信息生成应对当前对话的语句,有效地提升了多通道自然人机交互系统用户的体验感.该方法应用于交通信息查询以及咖啡厅的多主题多模态人机自由对话系统.用户的体验表明,该方法能够有效提高用户交互的自然度和体验感.  相似文献   

4.
篇章机器翻译的首要问题是确定翻译单位。基于汉语和英语的语言知识和英汉翻译的实践,该文提出面向篇章机器翻译的基本单位和复合单位的双层单位体系,讨论了这两种单位支持篇章翻译应满足的性质,并据此勾画了篇章机器翻译的拆分、翻译、装配三步模型(PTA模型)。该文提出,汉语篇章机器翻译的复合单位为广义话题结构对应的文本块,基本单位则是根据广义话题结构流水模型得到的话题自足句;英语篇章机器翻译的复合单位为句号句,基本单位为naming-telling小句(NT小句),即指称性成分加上对它的陈述或后修饰成分所构成的小句。该文展示了在这样的翻译单位体系下采用PTA模型的英汉翻译过程实例,规划了面向篇章翻译的英汉小句对齐语料库的建设任务,讨论了PTA模型的可行性。
  相似文献   

5.
序列到序列(seq2seq)的框架可以应用到抽象语义表示(AMR)解析任务中,把AMR解析当作一个从源端句子到目标端AMR图的翻译任务。然而,以前的工作通常把源端句子表示为一个单词序列,忽略了句子内部潜藏的句法和语义角色信息。基于seq2seq框架,该文提出了一个直接而有效的融合句法和语义角色信息的AMR解析方法。实验结果表明,该文的方法在AMR英文标准数据集上取得了6.7%的显著提升。最后,该文从多个角度深入分析了源端的句法和语义角色信息是如何对AMR解析提供帮助的。分析表明,词性信息和subword技术对AMR解析性能提升的贡献最大,上层句法和语义角色信息次之。  相似文献   

6.
We show that vector space semantics and functional semantics in two-sorted first order logic are equivalent for pregroup grammars. We present an algorithm that translates functional expressions to vector expressions and vice-versa. The semantics is compositional, variable free and invariant under change of order or multiplicity. It includes the semantic vector models of Information Retrieval Systems and has an interior logic admitting a comprehension schema. A sentence is true in the interior logic if and only if the ‘usual’ first order formula translating the sentence holds. The examples include negation, universal quantifiers and relative pronouns.  相似文献   

7.
黄河燕  陈肇雄 《软件学报》1997,8(9):716-721
在汉英机器翻译译文生成中,一个主要的问题是如何根据句子的上下文语境获取有关时态、语态、句式和主谓性、数、格等信息,生成具有正确单词形态的译文,如动词的过去式、过去分词、现在式形式;名词的所有相、复数形式;助动词生成以及冠同的生成等.本文提出一种基于SC文法的汉英机器翻译译文词形态生成算法,该方法通过设计一种生成导向的语言特征描述体系,采用译文生成和源文分析一体化的语言分析技术,使得译文生成能够充分利用源又分析过程中所用到的各种知识,准确地形成句子中各个成分的形态特征,并能有效地解决汉英机译译文生成中助动词  相似文献   

8.
由于目前哈萨克语句法分析准确率较低并缺乏基于神经网络的哈萨克语句法分析的相关研究,针对哈萨克语短语结构的句法分析,使用基于移进—归约的方法,采用在栈中存储句子跨度而不是部分树结构,从而在进行句法树解析时不需要对句法树进行二叉化。该研究在句子特征提取时使用双向LSTM对句子跨度特征进行提取,得到句子跨度在整个句子上下文中信息,再使用多层感知机对句法分析模型进行训练,最后在解码时使用动态规划选取最优句法分析结果;最终使得哈萨克语短语句法分析准确率达到了76.92%。研究成果对哈萨克语句法分析准确率有了进一步的提高,并为后续的哈萨克语机器翻译及语义分析奠定良好的基础。  相似文献   

9.
This paper presents, a grammatically motivated, sentiment classification model, applied on a morphologically rich language: Urdu. The morphological complexity and flexibility in grammatical rules of this language require an improved or altogether different approach. We emphasize on the identification of the SentiUnits, rather than, the subjective words in the given text. SentiUnits are the sentiment carrier expressions, which reveal the inherent sentiments of the sentence for a specific target. The targets are the noun phrases for which an opinion is made. The system extracts SentiUnits and the target expressions through the shallow parsing based chunking. The dependency parsing algorithm creates associations between these extracted expressions. For our system, we develop sentiment-annotated lexicon of Urdu words. Each entry of the lexicon is marked with its orientation (positive or negative) and the intensity (force of orientation) score. For the evaluation of the system, two corpora of reviews, from the domains of movies and electronic appliances are collected. The results of the experimentation show that, we achieve the state of the art performance in the sentiment analysis of the Urdu text.  相似文献   

10.
汉英机器翻译源语分析中词的识别   总被引:1,自引:1,他引:0  
汉英MT源语分析首先遇到的问题是词的识别。汉语中的“词”没有明确的定义,语素和词、词和词组、词组和句子,相互之间也没有清楚的界限。按照先分词、再句法分析的办法,会在分词时遇到构词问题和句法问题相互交错的困难。作者认为,可以把字作为源语句法分析的起始点,使词和词组的识别与句法分析同时进行。本文叙述了这种观点及其实现过程,并且以处理离合词为例,说明了识别的基本方法。  相似文献   

11.
Sentence alignment is a basic task in natural lan-guage processing which aims to extract high-quality paral-lel sentences automatically.Motivated by the observation that aligned sentence pairs contain a larger number of aligned words than unaligned ones,we treat word translation as one of the most useful external knowledge.In this paper,we show how to explicitly integrate word translation into neural sentence alignment.Specifically,this paper proposes three cross-lingual encoders to incorporate word translation:1)Mixed Encoder that learns words and their translation annotation vectors over sequences where words and their translations are mixed alterma-tively;2)Factored Encoder that views word translations as fea-tures and encodes words and their translations by concatenating their embeddings;and 3)Gated Encoder that uses gate mechanism to selectively control the amount of word translations moving forward.Experimentation on NIST MT and Opensub-titles Chinese-English datasets on both non-monotonicity and monotonicity scenarios demonstrates that all the proposed encoders significantly improve sentence alignment performance.  相似文献   

12.
面向口语翻译的双语语块自动识别   总被引:1,自引:0,他引:1  
程葳  赵军  刘非凡  徐波 《计算机学报》2004,27(8):1016-1020
语块识别是实现“基于语块处理方法”的基础 .目前 ,针对单语语块的研究成果已有很多 ,但机器翻译更需要双语相关的语块分析 .该文根据口语翻译的实际需要 ,提出了“双语语块”的概念 .并在此基础上 ,实现了一种针对并行语料库进行双语语块自动识别的新方法 .该方法将统计和规则相结合 ,可同时保证双语语块的语义特性和句法规范 .通过在一个 6万句的旅馆预定领域口语语料库中的实验可以看出 ,该方法对汉英并行语料的双语语块识别正确率可达到 80 %左右 .  相似文献   

13.
This paper gives a simple method for providing categorial brands of feature-based unification grammars with a model-theoretic semantics. The key idea is to apply the paradigm of fibred semantics (or layered logics, see Gabbay (1990)) in order to combine the two components of a feature-based grammar logic. We demonstrate the method for the augmentation of Lambek categorial grammar with Kasper/Rounds-style feature logic. These are combined by replacing (or annotating) atomic formulas of the first logic, i.e. the basic syntactic types, by formulas of the second. Modelling such a combined logic is less trivial than one might expect. The direct application of the fibred semantics method where a combined atomic formula like np (num: sg & pers: 3rd) denotes those strings which have the indicated property and the categorial operators denote the usual left- and right-residuals of these string sets, does not match the intuitive, unification-based proof theory. Unification implements a global bookkeeping with respect to a proof whereas the direct fibring method restricts its view to the atoms of the grammar logic. The solution is to interpret the (embedded) feature terms as global feature constraints while maintaining the same kind of fibred structures. For this adjusted semantics, the anticipated proof system is sound and complete.  相似文献   

14.
为了有效翻译体育领域文本,特别是文本中的长句,本文提出了一种面向体育领域的句子主干翻译方法。该方法采用模板来表示句子主干,主要包括句法主干分析、模板转换和句子主干译文生成三个步骤。本文研究中特别针对体育领域的语言特点进行了模板的设计和获取;在译文生成过程中,则分别利用规则和模板,采用了短语级全译和句子级摘译相结合的混合生成策略,并引入翻译函数来处理形态变化。实验结果表明句子主干翻译方法能够获取句子的关键信息,在可懂度上优于完全翻译,其忠实度也令人满意,是处理体育领域文本的有效方法。  相似文献   

15.
不停变化的网络协议标准和用户定制化网络业务需求要求交换机硬件具有更高的灵活性。在此背景下,提出了一种能够通过软件编程定义协议解析规则的以太网交换机芯片数据包解析器基本处理单元,具有高性能且高灵活性的优点,通过灵活配置硬件解析逻辑和查找表内容,定义对数据包包头内容的提取、查找、匹配、动作等解析过程,从而支持对不同种类的协议解析任务,其由2类基本结构的串联或并联组合而成,这样可以根据需要进行硬件资源裁剪。基于该可重构基本处理单元,可以构成可重构报文解析器,支持自定义协议及未知协议的解析。主要介绍了该可重构基本处理单元的结构,并介绍了基于该基本处理单元的解析器架构的实现方法。采用40 nm工艺综合后的评估结果显示,该基本单元电路最高工作时钟频率可以达到240 MHz,基于该基本处理单元结构实现的支持4层常用以太网协议解析的解析器每秒可处理2.4亿个数据包。该可重构基本处理单元所用存储资源共计87.98 Kb,设计规模约147万门。  相似文献   

16.
Since Colmerauer's introduction of metamorphosis grammars (MGs), with their associated type-O-like grammar rules, there has been a desire to allow more general rule formats in logic grammars. Gap symbols were added to the MG rule by Pereira, resulting in extraposition grammars (XGs). Gaps, which are referenced by gap symbols, are sequences of zero or more unspecified symbols which may be present anywhere in a sentence or in a sentential form. However, XGs imposed restrictions on the position of gap symbols and on the contents of gaps. With the introduction of gapping grammars (GGs) by Dahl, these restrictions were removed but the rule was still required to possess a nonterminal symbol as the first symbol on the left-hand side. This restriction is removed with the introduction of unrestricted gapping grammars. FIGG, a flexible implementation of gapping grammars, possesses a bottom-up parser which can process a large subset of unrestricted gapping grammars. It can be used to examine the usefulness of unrestricted GGs for describing phenomena of natural languages such as free word order and partially free word/constituent order. Unrestricted gapping grammars, as implemented in FIGG, can also be used to describe grammars (or metagrammars) that utilize the gap concept, such as Gazdar's generalized phrase structure grammars.  相似文献   

17.
现有单一策略的机器翻译系统很难有效地解决机器翻译所面临的所有问题。本文,提出一种基于人机交互互动的多策略机器翻译系统设计方法,该方法把基于多知识一体化描述的规则推理、基于经验记忆的类比启发式搜索推理和基于统计知识的概率方法及适当程度的人机交互有机地结合起来,利用现有基于规则的智能机器翻译系统自动产生具有各种特征知识的特征事例模式库,从而既可以通过与以往翻译实例的类比启发式搜索有效地利用以往系统成功的句子分析经验解决相似句子的分析,同时对特征事例模式库中没有相似实例的句子,又可以利用原有基于规则的方法和统计概率方法进行翻译转换处理,并在系统本身的知识不足以解决所遇到的多义区分问题时适时由人介入,从而可以大提高系统的翻译速度和翻译准确率,增强系统的实用性。  相似文献   

18.
藏文自动分词系统中紧缩词的识别   总被引:9,自引:2,他引:7  
在藏文信息处理中,涉及句法、语义都需要以词为基本单位,句法分析、语句理解、自动文摘、自动分类和机器翻译等,都是在切词之后基于词的层面来完成各项处理。因此,藏文分词是藏文信息处理的基础。该文通过研究藏文自动分词中的紧缩词,首次提出了它的一种识别方案,即还原法,并给出了还原算法。其基本思想是利用藏文紧缩词的添接规则还原藏文原文,以达到进行分词的目的。该还原算法已应用到笔者承担的国家语委项目中。经测试,在85万字节的藏文语料中紧缩词的识别准确率达99.83%。  相似文献   

19.
Extended context-free grammars allow regular expressions to appear in productions right hand sides, and are a clear and natural way to describe the syntax of programming languages.In this paper an LR parsing technique for extended context-free grammars is presented, which is based on an underlying transformation of the grammar into an equivalent context-free one.The technique is suitable for inclusion in one-pass compilers: the implementation requires little extensions to the algorithms working for normal LR grammars. Besides describing the parsing method, the paper shows also the algorithms for deriving the parsing tables; tables optimization is also discussed. Finally, this technique is compared with other proposals appeared in the literature.  相似文献   

20.
We present a very simple scheme for compiling boolean expressions in the short-circuit manner in one pass. The generated code is of very high quality and avoids most inefficiencies commonly associated with one-pass code generation. In particular, redundant conditional and unconditional branches are kept to a minimum. The scheme is general enough to compile the boolean expressions of a typical high-level language such as Pascal. It is presented in a format suited for syntax-directed translation and can be used with both top-down and bottom-up parsing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号