首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
刘占一  李生  刘挺  王海峰 《软件学报》2012,23(6):1472-1485
基于实例的机器翻译(example-based machine translation,简称EBMT)使用预处理过的双语例句作为主要翻译资源,通过编辑与待翻译句子匹配的翻译实例来生成译文.在EBMT系统中,翻译实例选择及译文选择对系统性能影响较大.提出利用统计搭配模型来增强EBMT系统中翻译实例选择及译文选择的能力,提高译文质量.首先,使用单语统计词对齐从单语语料中训练统计搭配模型.然后,利用该模型从3个方面提高EBMT的性能:(1)利用统计搭配模型估计待翻译句子与翻译实例之间的匹配度,从而增强系统的翻译实例选择能力;(2)通过引入候选译文与上下文之间搭配强度的估计来提高译文选择能力;(3)使用统计搭配模型检测翻译实例中被替换词的搭配词,同时根据新的替换词及上下文对搭配词进行矫正,进一步提高EBMT系统的译文质量.为了验证所提出的方法,在基于词的EBMT系统上评价了英汉翻译的译文质量.与基线系统相比,所提出的方法使译文的BLEU得分提高了4.73~6.48个百分点.在半结构化的EBMT系统上进一步检验了基于统计搭配模型的译文选择方法,从实验结果来看,该方法使译文的BLEU得分提高了1.82个百分点.同时,人工评价结果显示,改进后的半结构化EBMT系统的译文能够表达原文的大部分信息,并且具有较高的流利度.  相似文献   

2.
INTERACTIVE SEMANTIC ANALYSIS OF TECHNICAL TEXTS   总被引:4,自引:0,他引:4  
Sentence syntax is the basis for organizing semantic relations in TANKA, a project that aims to acquire knowledge from technical text. Other hallmarks include an absence of precoded domain-specific knowledge; significant use of public-domain generic linguistic information sources; involvement of the user as a judge and source of expertise; and learning from the meaning representations produced during processing. These elements shape the realization of the TANKA project: implementing a trainable text processing system to propose correct semantic interpretations to the user. A three-level model of sentence semantics, including a comprehensive Case system, provides the framework for TANKA's representations. Text is first processed by the DIPETT parser, which can handle a wide variety of unedited sentences. The semantic analysis module HAIKU then semi-automatically extracts semantic patterns from the parse trees and composes them into domain knowledge representations. HAIKU's dictionaries and main algorithm are described with the aid of examples and traces of user interaction. Encouraging experimental results are described and evaluated.  相似文献   

3.
One of the primary motivations of text generation is the achievement of a very wide range of linguistic abilities coupled with functional control of that range. This control rests on the appropriate construction of abstract specifications of meaning that can guide the generation process to produce language that is textually, grammatically, and lexically appropriate. Such abstract semantic specifications, when constructed in the right way, preserve much of the meaning required in a translation without unduly constraining syntactic form. This is potentially of great value for machine translation since it opens up the possibility of domain-independent, constrained, meaning-based translation. This paper describes how the upper model of the PENMAN text generation system provides a level of semantic abstraction of this kind. It offers examples of the motivation of broader sets of likely translational equivalents than that possible with transfers at lower-levels of abstraction and sets out types of constraints by which the set of likely translational equivalents may be reduced to high-quality renderings of the source text.  相似文献   

4.

Abstractive Text Summarization (ATS), which is the task of constructing summary sentences by merging facts from different source sentences and condensing them into a shorter representation while preserving information content and overall meaning. It is very difficult and time consuming for human beings to manually summarize large documents of text. In this paper, we propose an LSTM-CNN based ATS framework (ATSDL) that can construct new sentences by exploring more fine-grained fragments than sentences, namely, semantic phrases. Different from existing abstraction based approaches, ATSDL is composed of two main stages, the first of which extracts phrases from source sentences and the second generates text summaries using deep learning. Experimental results on the datasets CNN and DailyMail show that our ATSDL framework outperforms the state-of-the-art models in terms of both semantics and syntactic structure, and achieves competitive results on manual linguistic quality evaluation.

  相似文献   

5.
This paper describes the design and implementation of a computational model for Arabic natural language semantics, a semantic parser for capturing the deep semantic representation of Arabic text. The parser represents a major part of an Interlingua-based machine translation system for translating Arabic text into Sign Language. The parser follows a frame-based analysis to capture the overall meaning of Arabic text into a formal representation suitable for NLP applications that need for deep semantics representation, such as language generation and machine translation. We will show the representational power of this theory for the semantic analysis of texts in Arabic, a language which differs substantially from English in several ways. We will also show that the integration of WordNet and FrameNet in a single unified knowledge resource can improve disambiguation accuracy. Furthermore, we will propose a rule based algorithm to generate an equivalent Arabic FrameNet, using a lexical resource alignment of FrameNet1.3 LUs and WordNet3.0 synsets for English Language. A pilot study of motion and location verbs was carried out in order to test our system. Our corpus is made up of more than 2000 Arabic sentences in the domain of motion events collected from Algerian first level educational Arabic books and other relevant Arabic corpora.  相似文献   

6.
非受限文本中深层空间语义的识别方法   总被引:1,自引:0,他引:1  
利用地理空间描述模型中的相关概念扩展自然语言中空间语义角色,通过空间语义角色标注、短语识别以及句法模式分析达到识别非受限文本中深层空间语义的目的。实验表明,该方法具有较好的准确率、召回率与通常的信息提取性能相当。  相似文献   

7.
一种基于稀疏典型性相关分析的图像检索方法   总被引:1,自引:0,他引:1  
庄凌  庄越挺  吴江琴  叶振超  吴飞 《软件学报》2012,23(5):1295-1304
图像语义检索的一个关键问题就是要找到图像底层特征与语义之间的关联,由于文本是表达语义的一种有效手段,因此提出通过研究文本与图像两种模态之间关系来构建反映两者间潜在语义关联的有效模型的思路,基于该模型,可使用自然语言形式(文本语句)来表达检索意图,最终检索到相关图像.该模型基于稀疏典型性相关分析(sparse canonical correlation analysis,简称sparse CCA),按照如下步骤训练得到:首先利用隐语义分析方法构造文本语义空间,然后以视觉词袋(bag of visual words)来表达文本所对应的图像,最后通过Sparse CCA算法找到一个语义相关空间,以实现文本语义与图像视觉单词间的映射.使用稀疏的相关性分析方法可以提高模型可解释性和保证检索结果稳定性.实验结果验证了Sparse CCA方法的有效性,同时也证实了所提出的图像语义检索方法的可行性.  相似文献   

8.
研究生成选词问题对改善机翻系统的翻译质量有重要意义, 基于语义模式的选词方法是常用的选词方法, 在混合选词模型也扮演了重要角色。本文针对该方法的不足, 提出了语义模式自动获取的思路和模糊语义模式的概念, 对其进行了改进。采用语义模式自动获取的思路可以克服传统手工方法需要巨大工作量的问题, 而模糊语义模式概念的提出则使语义模式能表示语言现象的量化差别。文中首先讨论该研究的重要性, 然后介绍了模糊语义模式的概念, 接着给出了构建模糊语义模式库时使用的一个训练算法, 最后给出了应用模糊语义模式进行选词的具体算法并将它与传统算法进行了比较。  相似文献   

9.
多策略机器翻译系统IHSMTS中候选实例模式检索算法   总被引:2,自引:0,他引:2  
基于实例的机器翻译系统EBMT都需要有一个非常大的实例模式库,其数量级通常在百万句对以上.因此,如何从中快速地选择出一定数量的与待翻译的输入句子比较相似的候选实例,提供给后续句子相似度计算、类比译文构造等模块作进一步的处理,是EBMT系统所必须解决的一大难题.文章基于句子的词表层特征和信息熵提出了一种多层次候选实例模式检索算法,通过在多策略机器翻译系统IHSMTS上的运行测试。结果表明该算法较好的解决了这一难题.  相似文献   

10.
自然语言的语义理解涉及多个层面的问题,包括以谓词为中心的基本命题义、命题义之外的概念义、逻辑补足义等。目前主流的浅层语义分析主要集中在对命题义的分析上,缺少对概念义和逻辑义的支持,难以辅助计算机对文本的深度理解与推理。该文借鉴论元结构理论、事件语义学等相关语言学理论,突破语义角色标注等浅层语义分析的局限,建立了一种融合概念与逻辑的中文深层语义描述体系;并在该体系基础上,采用层层渲染的标注策略,构建了基于真实语料的大规模中文深层语义标注语料库,通过语言工程实践验证该描述体系的完备性和覆盖度。这一理论体系的建立和语言资源的构建,有望推动中文自动语义分析技术和人工智能等相关工作的创新发展。  相似文献   

11.
The potential and the limitation of current machine translation is discussed by comparing the output of human translation and that of virtual machine translation. Here, “virtual machine translation” means a kind of syntax-oriented literal translation which may be regarded as an idealized competence of today's practical machine translation. The above comparison shows that the main reason for the limitation or the incompleteness of current practical machine translation systems is the insufficient ability to treat “structural idiosyncrasies” of sentences. Also, some translation examples tell us that, without “understanding” the total meaning of the source sentence, it is quite difficult to manipulate the idiosyncrasies in sentence structure. Idiosyncratic gaps between source and target sentence structure usually originate in cultural differences, so that the computational treatment of these gaps is a very difficult problem.But the translation examples also give us some encouraging evidence that the principal technologies of today's not-yet-completed machine translation have sufficient potential for producing barely acceptable translation.The current practical efforts to treat such structural idiosyncrasies are also mentioned together with some long-range, basic-research type of approaches.  相似文献   

12.
多策略机器翻译系统IHSMTS中实例模式泛化匹配算法   总被引:1,自引:1,他引:1  
基于精确匹配的EBMT,由于翻译覆盖率过低,导致其难以大规模实际应用。本文提出一种实例模式泛化匹配算法,试图改善EBMT的翻译覆盖率:以输入的待翻译句子为目标导向,对候选翻译实例有针对性地进行实时泛化,使得算法既能满足实时文档翻译对速度的要求,又能充分利用系统使用过程中用户新添加和修改的翻译知识,从而总体上提高了系统的翻译覆盖率和翻译质量。实验结果表明,在语料规模为16 万句对的情况下,系统翻译覆盖率达到了75 %左右,充分说明了本文算法的有效性。  相似文献   

13.
This paper describes a domain-limited system for speech understanding as well as for speech translation. An integrated semantic decoder directly converts the preprocessed speech signal into its semantic representation by a maximum a-posteriori classification. With the combination of probabilistic knowledge on acoustic, phonetic, syntactic, and semantic levels, the semantic decoder extracts the most probable meaning of the utterance. No separate speech recognition stage is needed because of the integration of the Viterbi-algorithm (calculating acoustic probabilities by the use of Hidden-Markov-Models) and a probabilistic chart parser (calculating semantic and syntactic probabilities by special models). The semantic structure is introduced as a representation of an utterance's meaning. It can be used as an intermediate level for a succeeding intention decoder (within a speech understanding system for the control of a running application by spoken inputs) as well as an interlingua-level for a succeeding language production unit (within an automatic speech translation system for the creation of spoken output in another language). Following the above principles and using the respective algorithms, speech understanding and speech translating front-ends for the domains ‘graphic editor’, ‘service robot’, ‘medical image visualisation’ and ‘scheduling dialogues’ could be successfully realised.  相似文献   

14.
该文主要讨论名词的词义描写和研究问题。首先通过对几种主要的词汇语义学理论(包括结构主义语义学、生成主义语义学、概念语义学和自然语义元语言理论)进行介绍和评述,指出它们在对名词进行语义刻画方面存在缺陷和不足;然后,重点引入生成词库理论的物性结构的描写方式,阐明它与前几种理论的区别及其自身的特点;最后,在生成词库理论的基础上,展示物性结构知识在有关名词分析中的四个研究案例(词语缺省、隐喻义生成、供用句、中动句)和在自然语言处理中的可能应用。  相似文献   

15.
Many researches show that the ability of independent, heterogeneous enterprises’ information systems to interoperate is related to the challenges of making their semantics explicit and formal, so that the messages are not merely exchanged, but interpreted, without ambiguity. In this paper, we present an approach to overcome those challenges by developing a method for explication of the systems’ implicit semantics. We define and implement the method for the generation of local ontologies, based on the databases of their systems. In addition, we describe an associated method for the translation between semantic and SQL queries, a process in which implicit semantics of the EIS’s databases and explicit semantics of the local ontologies become interrelated. Both methods are demonstrated in the case of creating the local ontology and the semantic querying of OpenERP Enterprise Resource Planning system, for the benefit of the collaborative supply chain planning.  相似文献   

16.
神经机器翻译凭借其良好性能成为目前机器翻译的主流方法,然而,神经机器翻译编码器能否学习到充分的语义信息一直是学术上亟待探讨的问题.为了探讨该问题,该文通过利用抽象语义表示(abstract meaning rep-resentation,AMR)所包含的语义特征,分别从单词级别、句子级别两种不同的角度去分析神经机器翻译...  相似文献   

17.
为解决三维模型语义检索中用户检索意图不一致问题,建立多粒度语义检索框架,使学习模型能够有效地适应用户的不同检索意图。首先对模型分类知识进行层次划分,形成语义概念的多粒度结构。然后提取一种多视图特征来描述三维模型的形状特性,并采用高斯过程分类器建立不同粒度层次上的学习模型,实现低层特征和查询概念之间的语义一致性描述。和已有研究相比,多粒度语义检索框架使用户可通过语义粒度级别变化进行检索意图设置,从而检索结果尽可能符合用户语义。在实验部分,采用三维模型基准数据库对框架进行算法性能测试。结果表明,检索准确率要明显提高,并且符合人类思维特点。  相似文献   

18.
《Ergonomics》2012,55(13-14):1346-1360
This paper explores theoretical issues in ergonomics related to semantics and the emotional content of design. The aim is to find answers to the following questions: how to design products triggering ‘happiness’ in one's mind; which product attributes help in the communication of positive emotions; and finally, how to evoke such emotions through a product. In other words, this is an investigation of the ‘meaning’ that could be designed into a product in order to ‘communicate’ with the user at an emotional level. A literature survey of recent design trends, based on selected examples of product designs and semantic applications to design, including the results of recent design awards, was carried out in order to determine the common attributes of their design language. A review of Good Design Award winning products that are said to convey and/or evoke emotions in the users has been done in order to define good design criteria. These criteria have been discussed in relation to user emotional responses and a selection of these has been given as examples.  相似文献   

19.
Semantic entities carry the most important semantics of text data. Therefore, the identification and the relationship integration of semantic entities are very important for applications requiring semantics of text data. However, current strategies are still facing many problems such as semantic entity identification, new word identification and relationship integration among semantic entities. To address these problems, a two-phase framework for semantic entity identification with relationship integration in large scale text data is proposed in this paper. In the first semantic entities identification phase, we propose a novel strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. Compared with traditional approaches, our strategy is more effective in detecting semantic entities and more sensitive to new entities that just appear in the fresh data. After extracting the semantic entities, the second phase of our framework is for the integration of Semantic Entities Relationships (SER) which can help to cluster the semantic entities. A novel classification method using features such as similarity measures and co-occurrence probabilities is applied to tackle the clustering problem and discover the relationships among semantic entities. Comprehensive experimental results have shown that our framework can beat state-of-the-art strategies in semantic entity identification and discover over 80% relationship pairs among related semantic entities in large scale text data.  相似文献   

20.
We propose a framework for abstractive summarization of multi-documents, which aims to select contents of summary not from the source document sentences but from the semantic representation of the source documents. In this framework, contents of the source documents are represented by predicate argument structures by employing semantic role labeling. Content selection for summary is made by ranking the predicate argument structures based on optimized features, and using language generation for generating sentences from predicate argument structures. Our proposed framework differs from other abstractive summarization approaches in a few aspects. First, it employs semantic role labeling for semantic representation of text. Secondly, it analyzes the source text semantically by utilizing semantic similarity measure in order to cluster semantically similar predicate argument structures across the text; and finally it ranks the predicate argument structures based on features weighted by genetic algorithm (GA). Experiment of this study is carried out using DUC-2002, a standard corpus for text summarization. Results indicate that the proposed approach performs better than other summarization systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号