首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 160 毫秒
1.
研究生成选词问题对改善机翻系统的翻译质量有重要意义, 基于语义模式的选词方法是常用的选词方法, 在混合选词模型也扮演了重要角色。本文针对该方法的不足, 提出了语义模式自动获取的思路和模糊语义模式的概念, 对其进行了改进。采用语义模式自动获取的思路可以克服传统手工方法需要巨大工作量的问题, 而模糊语义模式概念的提出则使语义模式能表示语言现象的量化差别。文中首先讨论该研究的重要性, 然后介绍了模糊语义模式的概念, 接着给出了构建模糊语义模式库时使用的一个训练算法, 最后给出了应用模糊语义模式进行选词的具体算法并将它与传统算法进行了比较。  相似文献   

2.
当前图像描述生成的研究主要仅限于单语言(如英文),这得益于大规模的已人工标注的图像及其英文描述语料。该文探索零标注资源情况下,以英文作为枢轴语言的图像中文描述生成研究。具体地,借助于神经机器翻译技术,该文提出并比较了两种图像中文描述生成的方法: (1)串行法,该方法首先将图像生成英文描述,然后由英文描述翻译成中文描述; (2)构建伪训练语料法,该方法首先将训练集中图像的英文描述翻译为中文描述,得到图像-中文描述的伪标注语料,然后训练一个图像中文描述生成模型。特别地,对于第二种方法,该文还比较了基于词和基于字的中文描述生成模型。实验结果表明,采用构建伪训练语料法优于串行法,同时基于字的中文描述生成模型也要优于基于词的模型,BLEU_4值达到0.341。  相似文献   

3.
随着生成式对抗网络的出现,从文本描述合成图像最近成为一个活跃的研究领域.然而,目前文本描述往往使用英文,生成的对象也大多是人脸和花鸟等,专门针对中文和中国画的研究较少.同时,文本生成图像任务往往需要大量标注好的图像文本对,制作数据集的代价昂贵.随着多模态预训练的出现与推进,使得能够以一种优化的方式来指导生成对抗网络的生成过程,大大减少了对数据集和计算资源的需求.提出一种多域VQGAN模型来同时生成多种域的中国画,并利用多模态预训练模型WenLan来计算生成图像和文本描述之间的距离损失,通过优化输入多域VQGAN的隐空间变量来达到图片与文本语义一致的效果.对模型进行了消融实验,详细比较了不同结构的多域VQGAN的FID及R-precisoin指标,并进行了用户调查研究.结果表示,使用完整的多域VQGAN模型在图像质量和文本图像语义一致性上均超过原VQGAN模型的生成结果.  相似文献   

4.
我们在[1]中介绍了语法分析程序的生成,并指出在生成的分析程序中加入语义处理子程序后,可作为翻译程序。本文将介绍这种生成的方法,同时也介绍如何描述语义。文章中所涉及的目标计算机是DJS-6计算机。 一、语义描述 为了自动生成语义处理程序,首要的任务是如何描述语义。我们采用四元组来描述,即用  相似文献   

5.
现有视频内容描述模型生成的视频内容描述文本可读性差且准确率不高。基于ViT模型提出一种语义引导的视频内容描述方法。利用ReNeXt和ECO网络提取视频的视觉特征,以提取的视觉特征为输入、语义标签的概率预测值为输出训练语义检测网络(SDN)。在此基础上,通过ViT模型对静态和动态视觉特征进行全局编码,并与SDN提取的语义特征进行注意力融合,采用语义长短期记忆网络对融合特征进行解码,生成视频对应的描述文本。通过引入视频中的语义特征能够引导模型生成更符合人类习惯的描述,使生成的描述更具可读性。在MSR-VTT数据集上的测试结果表明,该模型的BLEU-4、METEOR、ROUGE-L和CIDEr指标分别为44.8、28.9、62.8和51.1,相比于当前主流的视频内容描述模型ADL和SBAT,提升的得分总和达到16.6和16.8。  相似文献   

6.
目的 由于缺乏图像与目标语言域的成对数据,现有的跨语言描述方法都是基于轴(源)语言转化为目标语言,由于转化过程中的语义噪音干扰,生成的句子存在不够流畅以及与图像视觉内容关联弱等问题,为此,本文提出了一种引入语义匹配和语言评价的跨语言图像描述模型。方法 首先,选择基于编码器—解码器的图像描述基准网络框架。其次,为了兼顾图像及其轴语言所包含的语义知识,构建了一个源域语义匹配模块;为了学习目标语言域的语言习惯,还构建了一个目标语言域评价模块。基于上述两个模块,对图像描述模型进行语义匹配约束和语言指导:1)图像&轴语言域语义匹配模块通过将图像、轴语言描述以及目标语言描述映射到公共嵌入空间来衡量各自模态特征表示的语义一致性。2)目标语言域评价模块依据目标语言风格,对所生成的描述句子进行语言评分。结果 针对跨语言的英文图像描述任务,本文在MS COCO(Microsoft common objects in context)数据集上进行了测试。与性能较好的方法相比,本文方法在BLEU(bilingual evaluation understudy)-2、BLEU-3、BLEU-4和METE...  相似文献   

7.
UML的形式化描述语义   总被引:1,自引:0,他引:1       下载免费PDF全文
本文提出了一种新的定义UML形式化语义的方法。我们将建模语言的语义区分为描述语义和功能语义两个方面。描述语义定义哪些系统满足模型,功能语义定义模型中的基本概念。本文用一阶逻辑定义了UML的类图、交互图和状态图的描述语义,并介绍我们实现的将UML模型转换成逻辑系统的软件工具LAMBDES,该工具集成了定理证明器SPASS,可以对模型进行自动推理。我们成功地将此方法和工具应用于模型的一致性检查。  相似文献   

8.
图像描述是计算机视觉、自然语言处理与机器学习的交叉领域多模态信息处理任务,需要算法能够有效地处理图像和语言两种不同模态的信息。由于异构语义鸿沟的存在,该任务具有较大的挑战性。目前主流的研究仍集中在基于英文的图像描述任务,对图像中文描述的研究相对较少。图像视觉信息在图像描述算法中没有得到足够的重视,算法模型的性能更多地取决于语言模型。针对以上两个方面的研究不足,该文提出了基于多层次选择性视觉语义属性特征的图像中文描述生成算法。该算法结合目标检测和注意力机制,充分考虑了图像高层视觉语义所对应的中文属性信息,抽取不同尺度和层次的属性上下文表示。为了验证该文算法的有效性,在目前规模最大的AI Challenger 2017图像中文描述数据集以及Flick8k-CN图像中文描述数据集上进行了测试。实验结果表明,该算法能够有效地实现视觉-语义关联,生成文字表述较为准确、内容丰富的描述语句。较现阶段主流图像描述算法在中文语句上的性能表现,该文算法在各项评价指标上均有约3%~30%的较大幅度提升。为了便于后续研究复现,该文的相关源代码和模型已在开源网站Github上公开。  相似文献   

9.
传统的文本隐写方案很难均衡隐藏容量和隐蔽性之间的矛盾。利用宋词载体语义丰富、句法灵活的特点,文章提出BERT(Bidirectional Encoder Representations from Transformers)词嵌入结合Attention机制的Seq2Seq模型生成隐写宋词的算法。该算法采用BERT词嵌入作为生成模型的语义向量转换部分,其丰富的词向量空间保证了生成句子间语义的连贯性,提高了生成宋词的质量。另外,该算法采用格律模板和互信息选词方法约束隐写语句的生成,增强了隐藏算法的安全性。通过与现有文本隐藏算法在嵌入率方面的对比实验和分析表明,文章所提算法的嵌入率相比Ci-stega提高了7%以上,且在安全性和鲁棒性方面均有良好的表现。  相似文献   

10.
视频字幕生成(video captioning)在视频推荐、辅助视觉、人机交互等领域具有广泛的应用前景.目前已有大量的视频英文字幕生成方法和数据,通过机器翻译视频英文字幕可以实现视频中文字幕的生成.然而,中西方文化差异和机器翻译算法性能都会影响中文字幕生成的质量.为此,本文提出了一种跨语言知识蒸馏的视频中文字幕生成方法.该方法不仅可以根据视频内容直接生成中文语句,还充分利用了易于获取的视频英文字幕作为特权信息(privileged information)指导视频中文字幕的生成.由于同一视频的英文字幕与中文字幕之间存在语义关联关系,本文方法从中学习到与视频内容相关的跨语言知识,并利用知识蒸馏将英文字幕包含的高层语义信息融入中文字幕生成.同时,通过端到端的训练方式确保模型训练目标与视频中文字幕生成任务目标的一致性,有效提升中文字幕生成性能.此外,本文还对视频英文字幕数据集MSVD扩展,给出了中英文视频字幕数据集MSVD-CN.  相似文献   

11.
Multilingual generation in machine translation (MT) requires a knowledge organization that facilitates the task of lexical choice, i.e. selection of lexical units to be used in the generation of a target-language sentence. This paper investigates the extent to which lexicalization patterns involving the lexical aspect feature [+telic] may be used for translating events and states among languages. Telicity has been correlated syntactically with both transitivity and unaccusativity, and semantically with Talmy's path of a motion event, the representation of which characterizes languages parametrically.Taking as our starting point the syntactic/semantic classification in Levin's English Verb Classes and Alternations, we examine the relation between telicity and the syntactic contexts, or alternations, outlined in this work, identifying systematic relations between the lexical aspect features and the semantic components that potentiate these alternations. Representing lexical aspect — particularly telicity — is therefore crucial for the tasks of lexical choice and syntactic realization. Having enriched the data in Levin (by correlating the syntactic alternations (Part I) and semantic verb classes (Part II) and marking them for telicity) we assign to verbs lexical semantic templates (LSTs). We then demonstrate that it is possible from these templates to build a large-scale repository for lexical conceptual structures which encode meaning components that correspond to different values of the telicity feature. The LST framework preserves both semantic content and semantic structure (following Grimshaw during the processes of lexical choice and syntactic realization. Application of this model identifies precisely where the Knowledge Representation component may profitably augment our rules of composition, to identify cases where the interlingua underlying the source language sentence must be either reduced or modified in order to produce an appropriate target language sentence.  相似文献   

12.
This paper addresses the problem of automatic acquisition of lexical knowledge for rapid construction of engines for machine translation and embedded multilingual applications. We describe new techniques for large-scale construction of a Chinese–English verb lexicon and we evaluate the coverage and effectiveness of the resulting lexicon. Leveraging off an existing Chinese conceptual database called How Net and a large, semantically rich English verb database, we use thematic-role information to create links between Chinese concepts and English classes. We apply the metrics of recall and precision to evaluate the coverage and effectiveness of the linguistic resources. The results of this work indicate that: (a) we are able to obtain reliable Chinese–English entries both with and without pre-existing semantic links between the two languages; (b) if we have pre-existing semantic links, we are able to produce a more robust lexical resource by merging these with our semantically rich English database; (c) in our comparisons with manual lexicon creation, our automatic techniques were shown to achieve 62% precision, compared to a much lower precision of 10% for arbitrary assignment of semantic links. This revised version was published online in November 2006 with corrections to the Cover Date.  相似文献   

13.
WNCT:一种WordNet概念自动翻译方法   总被引:2,自引:1,他引:1  
WordNet是在自然语言处理领域有重要作用的英语词汇知识库,该文提出了一种将WordNet中词汇概念自动翻译为中文的方法。首先,利用电子词典和术语翻译工具将英语词汇在义项的粒度上翻译为中文;其次,将特定概念中词汇的正确义项选择看作分类问题,归纳出基于翻译唯一性、概念内和概念间翻译交集、中文短语结构规则,以及基于PMI的翻译相关性共12个特征,训练分类模型实现正确义项的选择。实验结果表明,该方法对WordNet 3.0中概念翻译的覆盖率为85.21%,准确率为81.37%。  相似文献   

14.
A large number of wording choices naturally occurring in English sentences cannot be accounted for on semantic or syntactic grounds. They represent arbitrary word usages and are termed collocations. In this paper, we show how collocations can enhance the task of lexical selection in language generation. Previous language generation systems were not able to account for collocations for two reasons: they did not have the lexical information in compiled form and the lexicon formalisms available were not able to handle the variations in collocational knowledge. We describe an implemented generator, Cook, which uses a wide range of collocations to produce sentences in the stock market domain. Cook uses a flexible lexicon containing a range of collocations, from idiomatic phrases to word pairs that were compiled automatically from text corpora using a lexicographic tool, Xtract. We show how Cook is able to merge collocations of various types to produce a wide variety of sentences.  相似文献   

15.
This paper describes the design and implementation of a computational model for Arabic natural language semantics, a semantic parser for capturing the deep semantic representation of Arabic text. The parser represents a major part of an Interlingua-based machine translation system for translating Arabic text into Sign Language. The parser follows a frame-based analysis to capture the overall meaning of Arabic text into a formal representation suitable for NLP applications that need for deep semantics representation, such as language generation and machine translation. We will show the representational power of this theory for the semantic analysis of texts in Arabic, a language which differs substantially from English in several ways. We will also show that the integration of WordNet and FrameNet in a single unified knowledge resource can improve disambiguation accuracy. Furthermore, we will propose a rule based algorithm to generate an equivalent Arabic FrameNet, using a lexical resource alignment of FrameNet1.3 LUs and WordNet3.0 synsets for English Language. A pilot study of motion and location verbs was carried out in order to test our system. Our corpus is made up of more than 2000 Arabic sentences in the domain of motion events collected from Algerian first level educational Arabic books and other relevant Arabic corpora.  相似文献   

16.
Unknown words are one of the key factors that greatly affect the translation quality.Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words.However, these approaches have two disadvantages.On the one hand, they usually rely on many additional resources such as bilingual web data;on the other hand, they cannot guarantee good reordering and lexical selection of surrounding words.This paper gives a new perspective on handling unknown words in statistical machine translation (SMT).Instead of making great efforts to find the translation of unknown words, we focus on determining the semantic function of the unknown word in the test sentence and keeping the semantic function unchanged in the translation process.In this way, unknown words can help the phrase reordering and lexical selection of their surrounding words even though they still remain untranslated.In order to determine the semantic function of an unknown word, we employ the distributional semantic model and the bidirectional language model.Extensive experiments on both phrase-based and linguistically syntax-based SMT models in Chinese-to-English translation show that our method can substantially improve the translation quality.  相似文献   

17.
在英语及其它的欧洲语言里,词汇语意关系已有相当充分的研究。例如,欧语词网( EuroWordNet ,Vossen 1998) 就是一个以语意关系来勾勒词汇词义的数据库。也就是说,词汇意义的掌握是透与其它词汇语意的关连来获致的。为了确保数据库建立的品质与一致性,欧语词网计画就每一个处理的语言其词汇间的词义关系是否成立提出相应的语言测试。实际经验显示,利用这些语言测试,人们可以更容易且更一致地辨识是否一对词义之间确实具有某种词义关系。而且,每一个使用数据库的人也可以据以检验其中关系连结的正确性。换句话说,对一个可检验且独立于语言的词汇语意学理论而言,这些测试提供了一个基石。本文中,我们探究为中文词义关系建立中文语言测试的可能性。尝试为一些重要的语意关系提供测试的句式和规则来评估其可行性。这项研究除了建构中文词汇语意学的理论基础,也对Miller的词汇网络架构(WordNet ,Fellbaum 1998) 提供了一个有力的支持,这个架构在词汇表征和语言本体架构研究上开拓了关系为本的进路。  相似文献   

18.
19.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号