首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
This paper describes a first attempt to base a paraphrase generation system upon Meľčuk and Žolkovskij's linguistic meaning-text (MT) model whose purpose is to establish correspondences between meanings, represented by networks, and (ideally) all synonymous texts having this meaning. The system described here contains a Prolog implementation of a small explanatory and combinatorial dictionary (the MT lexicon) and, using unification and backtracking, generates from a given network the sentences allowed by the dictionary and the lexical transformations of the model. The passage from a net to the final texts is done through a series of transformations of intermediary structures that closely correspond to MT utterance representations (semantic, deep-syntax, surface-syntax, and morphological representations). These are graphs and trees with labeled arcs. The Prolog unification (equality predicate) was extended to extract information from these representations and build new ones. The notion of utterance path, used by many authors, is replaced by that of covering by defining subnetworks.  相似文献   

2.
We propose a novel approach to embedding sentences into a high-dimensional space. Independent words in the sentence are located at points in the space, and the sentence is represented by a curve along these words. A set of functions that evaluates a sequence of words is designed over this space and is helpful for searching for words that are likely to follow the observed sentences. More generally, our approach makes sentences sequentially depending on the context. We simplify Japanese grammar and subsequently implement it as a grammar that constrains simple sentences to be generated. In this study, we performed experiments in which we created a dictionary containing 2877 different independent words and constructed a semantic space from texts in eight digital archived books, consisting of 8495 independent words and 161 paragraphs in total. It was demonstrated that several meaningful sentences can be generated that are likely to follow untrained input sentences.  相似文献   

3.
数字地图的局部自动生成与半自动编辑系统由道路编辑模块和房屋编辑模块构成。文章详细描述了各模块中的半自动提取功能的算法,其中道路提取采用了滚球法,房屋提取采用的是基于几何关系的算法。所有半自动提取算法都在图形的基础上操作的,能保证有较好的通用性,不仅可用于纸面地图生成数字地图的开发过程,而且适用于从航空图像开发数字地图等用途。数字地图半自动编辑器已可作为一个独立的系统,在人机交互状态下自动/半自动生成数字道路地图和数字房屋地图。  相似文献   

4.
This paper reports the results of seven different experiments, assessing the benefit that users gain from the inclusion of pictorial features such as pictorial metaphor, visual mnemonics or support for visual imagery in visual languages. The experiments are based on typical programming tasks such as problem solving, construction and interpretation. They employed a number of experimental languages, including both implicit pictorial representations and explicitly verbal metaphorical explanations. The results of these experiments indicate that special design considerations apply to visual languages. Direct application of Graphical User Interface metaphors does not result in automatic improvements in usability of visual languages for typical programming tasks. Visual languages can benefit from pictorial mnemonics, but systematic explanatory metaphors (whether visual or verbal) are less useful than consistent presentation of language abstractions.  相似文献   

5.
The paper addresses the problem of generating sentences from logical formulae. It describes a simple and efficient algorithm for generating text which has been developed for use in machine translation, but will have wider application in natural language processing. An important property of the algorithm is that the logical form used to generate a sentence need not be one which could have been produced by parsing the sentence: formal equivalence between logical forms is allowed for. This is necessary for a machine translation system, such as the one envisaged in this paper, which uses single declarative grammars of individual languages, and declarative statements of translation equivalences for transfer. In such a system, it cannot be guaranteed that transfer will produce a logical form in the same order as would have been produced by parsing some target-language sentence, and it is not practicable to define a normal form for the logical forms. The algorithm is demonstrated using a categorial grammar and a simple indexed logic, as this allows a particularly clear and elegant formulation. It is shown that the algorithm can be adapted to phrase-structure grammars, and to more complex semantic representations than that used here.  相似文献   

6.
This paper presents the online handwriting recognition system NPen++ developed at the University of Karlsruhe and Carnegie Mellon University. The NPen++ recognition engine is based on a multi-state time delay neural network and yields recognition rates from 96% for a 5,000 word dictionary to 93.4% on a 20,000 word dictionary and 91.2% for a 50,000 word dictionary. The proposed tree search and pruning technique reduces the search space considerably without losing too much recognition performance compared to an exhaustive search. This enables the NPen++ recognizer to be run in real-time with large dictionaries. Initial recognition rates for whole sentences are promising and show that the MS-TDNN architecture is suited to recognizing handwritten data ranging from single characters to whole sentences. Received September 3, 2000 / Revised October 9, 2000  相似文献   

7.
自动文摘系统中一个关键的问题是找出能构成摘要的重点句子。找出这些句子的方法很多,但用机器学习的方法却较少,该文提出了一种关于文摘句式的自动学习方法。该方法以经过简单的预处理的若干语句为训练样本集,以正例句为基点进行由底向上的泛化学习,抽象出关于句式的一般概念,形成句式规则集,作为判断文中哪些语句可作为文摘句的有效手段。这是文摘系统实现的核心部分。  相似文献   

8.
随着信息快速增长,如何从大量文档中提取摘要信息成为自然语言处理一个重要的研究方向。文章提出了一种不依赖于任何训练集和自然语言本身信息的自动摘要方法,该方法利用改进后的PageRank公式和HITS公式对文档所有句子打分排序,选取得分高的句子作为摘要。实验证明,该方法简单易行,具有高效性,良好的效果以及扩展性。  相似文献   

9.
Associative memories are conventionally used to represent data with very simple structure: sets of pairs of vectors. This paper describes a method for representing more complex compositional structure in distributed representations. The method uses circular convolution to associate items, which are represented by vectors. Arbitrary variable bindings, short sequences of various lengths, simple frame-like structures, and reduced representations can be represented in a fixed width vector. These representations are items in their own right and can be used in constructing compositional structures. The noisy reconstructions extracted from convolution memories can be cleaned up by using a separate associative memory that has good reconstructive properties.  相似文献   

10.
A Trainable System for Object Detection   总被引:23,自引:2,他引:21  
This paper presents a general, trainable system for object detection in unconstrained, cluttered scenes. The system derives much of its power from a representation that describes an object class in terms of an overcomplete dictionary of local, oriented, multiscale intensity differences between adjacent regions, efficiently computable as a Haar wavelet transform. This example-based learning approach implicitly derives a model of an object class by training a support vector machine classifier using a large set of positive and negative examples. We present results on face, people, and car detection tasks using the same architecture. In addition, we quantify how the representation affects detection performance by considering several alternate representations including pixels and principal components. We also describe a real-time application of our person detection system as part of a driver assistance system.  相似文献   

11.
基于词典的汉藏句子对齐研究与实现   总被引:1,自引:0,他引:1  
双语语料库加工的关键技术之一是对齐,构建句子级别的对齐语料是构建语料库最基本的任务。该文参考其他语言句子对齐的成熟的方法,针对藏文语言的特殊性,提出基于词典的汉藏句子对齐。整理了对齐所用双语词典,并对其词语覆盖率进行了评价。在汉藏句子对齐过程中发现汉语与藏文的分词粒度不同的问题,采用在藏汉词典中进一步查词并在汉语句子中比对的方法,使正确句对的得分增加,从而提高对齐正确率。采用该方法准确率为 81.11%。  相似文献   

12.
机器译文自动评价是机器翻译中的一个重要任务。针对目前译文自动评价中完全忽略源语言句子信息,仅利用人工参考译文度量翻译质量的不足,该文提出了引入源语言句子信息的机器译文自动评价方法: 从机器译文与其源语言句子组成的二元组中提取描述翻译质量的质量向量,并将其与基于语境词向量的译文自动评价方法利用深度神经网络进行融合。在WMT-19译文自动评价任务数据集上的实验结果表明,该文所提出的方法能有效增强机器译文自动评价与人工评价的相关性。深入的实验分析进一步揭示了源语言句子信息在译文自动评价中发挥着重要作用。  相似文献   

13.
飞机自动驾驶仪故障诊断系统   总被引:4,自引:0,他引:4  
本文介绍了利用故障字典进行飞机自动驾驶仪故障诊断的基本原理,结合实例讲述了故障诊断系统软、硬件设计,并给出硬件组成框图和程序流程图。  相似文献   

14.
This article describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the original analyses in the treebank so that they describe the newly created ill-formed sentences. Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people do), and can be used to induce a grammar capable of analysing such sentences. This article demonstrates these two applications using the Penn Treebank. In a robustness evaluation experiment, two state-of-the-art statistical parsers are evaluated on an ungrammatical version of Sect. 23 of the Wall Street Journal (WSJ) portion of the Penn treebank. This experiment shows that the performance of both parsers degrades with grammatical noise. A breakdown by error type is provided for both parsers. A second experiment retrains both parsers using an ungrammatical version of WSJ Sections 2–21. This experiment indicates that an ungrammatical treebank is a useful resource in improving parser robustness to grammatical errors, but that the correct combination of grammatical and ungrammatical training data has yet to be determined.  相似文献   

15.
为了能够更加准确地对语句结构进行划分、对语句表达的内容进行判断,提出了一种全新的基于权值的计算算法,在完善中文分词的基础上对语句进行情感分析。首先利用中文分词算法对句式结构进行分割,然后依据词性对词库进行扩展,词库对句式中干扰词汇进行过滤,最后利用全新的权值计算算法对语句情感进行准确分析。经有效测试结果表明,情感分析准确率较高,并广泛适用于网络舆情分析等应用中。  相似文献   

16.
17.
从Vista开始微软操作系统已经完全支持了传统蒙古文的输入、编辑和排版。该文在微软蒙古文输入法的基础上结合蒙古文的自身特点提出了一种新型蒙古文输入法算法。该算法支持自动变形计算、自动联想输入、自动学习和资源共享等功能。文中给出了自动变形计算的原理和详细算法过程,并详细探讨了蒙古文字典数据的存储和使用方法,最后提出了自动学习算法和资源共享技术的解决方案。  相似文献   

18.
Dictionary learning algorithms for sparse representation   总被引:11,自引:0,他引:11  
  相似文献   

19.
Online dictionaries can be important tools for research and application in natural language processing. This paper describes work with a machine-readable version of "Dorland's Illustrated Medical Dictionary". First the characteristics of the dictionary are briefly described, and then the complex process of converting the tape to an online interactive dictionary is discussed. The results of several experiments in automatically deriving information from the online dictionary are presented, and the paper ends with a discussion of the use of the online dictionary as a tool in the development of a natural language processing system designed for the biomedical domain.  相似文献   

20.
简要分析了当前自动答疑系统的缺陷及其重要性,设计了一个基于Lucene的自动答疑系统。该系统充分利用了Lucene强大的检索机制,设计了针对于本答疑系统的专业词典,采用了当前最流行的二级哈希词典存储结构,同时提出了一种优化的最大匹配中文分词算法并应用到Lucene当中,弥补了Lucene自带分词器的不足。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号