共查询到20条相似文献,搜索用时 78 毫秒
1.
This paper describes a first attempt to base a paraphrase generation system upon Meľčuk and Žolkovskij's linguistic meaning-text (MT) model whose purpose is to establish correspondences between meanings, represented by networks, and (ideally) all synonymous texts having this meaning. The system described here contains a Prolog implementation of a small explanatory and combinatorial dictionary (the MT lexicon) and, using unification and backtracking, generates from a given network the sentences allowed by the dictionary and the lexical transformations of the model. The passage from a net to the final texts is done through a series of transformations of intermediary structures that closely correspond to MT utterance representations (semantic, deep-syntax, surface-syntax, and morphological representations). These are graphs and trees with labeled arcs. The Prolog unification (equality predicate) was extended to extract information from these representations and build new ones. The notion of utterance path, used by many authors, is replaced by that of covering by defining subnetworks. 相似文献
2.
We propose a novel approach to embedding sentences into a high-dimensional space. Independent words in the sentence are located at points in the space, and the sentence is represented by a curve along these words. A set of functions that evaluates a sequence of words is designed over this space and is helpful for searching for words that are likely to follow the observed sentences. More generally, our approach makes sentences sequentially depending on the context. We simplify Japanese grammar and subsequently implement it as a grammar that constrains simple sentences to be generated. In this study, we performed experiments in which we created a dictionary containing 2877 different independent words and constructed a semantic space from texts in eight digital archived books, consisting of 8495 independent words and 161 paragraphs in total. It was demonstrated that several meaningful sentences can be generated that are likely to follow untrained input sentences. 相似文献
3.
4.
《Journal of Visual Languages and Computing》2001,12(3):223-252
This paper reports the results of seven different experiments, assessing the benefit that users gain from the inclusion of pictorial features such as pictorial metaphor, visual mnemonics or support for visual imagery in visual languages. The experiments are based on typical programming tasks such as problem solving, construction and interpretation. They employed a number of experimental languages, including both implicit pictorial representations and explicitly verbal metaphorical explanations. The results of these experiments indicate that special design considerations apply to visual languages. Direct application of Graphical User Interface metaphors does not result in automatic improvements in usability of visual languages for typical programming tasks. Visual languages can benefit from pictorial mnemonics, but systematic explanatory metaphors (whether visual or verbal) are less useful than consistent presentation of language abstractions. 相似文献
5.
John D. Phillips 《Machine Translation》1993,8(4):209-235
The paper addresses the problem of generating sentences from logical formulae. It describes a simple and efficient algorithm for generating text which has been developed for use in machine translation, but will have wider application in natural language processing. An important property of the algorithm is that the logical form used to generate a sentence need not be one which could have been produced by parsing the sentence: formal equivalence between logical forms is allowed for. This is necessary for a machine translation system, such as the one envisaged in this paper, which uses single declarative grammars of individual languages, and declarative statements of translation equivalences for transfer. In such a system, it cannot be guaranteed that transfer will produce a logical form in the same order as would have been produced by parsing some target-language sentence, and it is not practicable to define a normal form for the logical forms. The algorithm is demonstrated using a categorial grammar and a simple indexed logic, as this allows a particularly clear and elegant formulation. It is shown that the algorithm can be adapted to phrase-structure grammars, and to more complex semantic representations than that used here. 相似文献
6.
S. Jaeger S. Manke J. Reichert A. Waibel 《International Journal on Document Analysis and Recognition》2001,3(3):169-180
This paper presents the online handwriting recognition system NPen++ developed at the University of Karlsruhe and Carnegie
Mellon University. The NPen++ recognition engine is based on a multi-state time delay neural network and yields recognition
rates from 96% for a 5,000 word dictionary to 93.4% on a 20,000 word dictionary and 91.2% for a 50,000 word dictionary. The
proposed tree search and pruning technique reduces the search space considerably without losing too much recognition performance
compared to an exhaustive search. This enables the NPen++ recognizer to be run in real-time with large dictionaries. Initial
recognition rates for whole sentences are promising and show that the MS-TDNN architecture is suited to recognizing handwritten
data ranging from single characters to whole sentences.
Received September 3, 2000 / Revised October 9, 2000 相似文献
7.
自动文摘系统中一个关键的问题是找出能构成摘要的重点句子。找出这些句子的方法很多,但用机器学习的方法却较少,该文提出了一种关于文摘句式的自动学习方法。该方法以经过简单的预处理的若干语句为训练样本集,以正例句为基点进行由底向上的泛化学习,抽象出关于句式的一般概念,形成句式规则集,作为判断文中哪些语句可作为文摘句的有效手段。这是文摘系统实现的核心部分。 相似文献
8.
随着信息快速增长,如何从大量文档中提取摘要信息成为自然语言处理一个重要的研究方向。文章提出了一种不依赖于任何训练集和自然语言本身信息的自动摘要方法,该方法利用改进后的PageRank公式和HITS公式对文档所有句子打分排序,选取得分高的句子作为摘要。实验证明,该方法简单易行,具有高效性,良好的效果以及扩展性。 相似文献
9.
Associative memories are conventionally used to represent data with very simple structure: sets of pairs of vectors. This paper describes a method for representing more complex compositional structure in distributed representations. The method uses circular convolution to associate items, which are represented by vectors. Arbitrary variable bindings, short sequences of various lengths, simple frame-like structures, and reduced representations can be represented in a fixed width vector. These representations are items in their own right and can be used in constructing compositional structures. The noisy reconstructions extracted from convolution memories can be cleaned up by using a separate associative memory that has good reconstructive properties. 相似文献
10.
A Trainable System for Object Detection 总被引:23,自引:2,他引:21
This paper presents a general, trainable system for object detection in unconstrained, cluttered scenes. The system derives much of its power from a representation that describes an object class in terms of an overcomplete dictionary of local, oriented, multiscale intensity differences between adjacent regions, efficiently computable as a Haar wavelet transform. This example-based learning approach implicitly derives a model of an object class by training a support vector machine classifier using a large set of positive and negative examples. We present results on face, people, and car detection tasks using the same architecture. In addition, we quantify how the representation affects detection performance by considering several alternate representations including pixels and principal components. We also describe a real-time application of our person detection system as part of a driver assistance system. 相似文献
11.
12.
机器译文自动评价是机器翻译中的一个重要任务。针对目前译文自动评价中完全忽略源语言句子信息,仅利用人工参考译文度量翻译质量的不足,该文提出了引入源语言句子信息的机器译文自动评价方法: 从机器译文与其源语言句子组成的二元组中提取描述翻译质量的质量向量,并将其与基于语境词向量的译文自动评价方法利用深度神经网络进行融合。在WMT-19译文自动评价任务数据集上的实验结果表明,该文所提出的方法能有效增强机器译文自动评价与人工评价的相关性。深入的实验分析进一步揭示了源语言句子信息在译文自动评价中发挥着重要作用。 相似文献
13.
飞机自动驾驶仪故障诊断系统 总被引:4,自引:0,他引:4
本文介绍了利用故障字典进行飞机自动驾驶仪故障诊断的基本原理,结合实例讲述了故障诊断系统软、硬件设计,并给出硬件组成框图和程序流程图。 相似文献
14.
Jennifer Foster 《International Journal on Document Analysis and Recognition》2007,10(3-4):129-145
This article describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences.
The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences
in an existing treebank, and the minimal transformation of the original analyses in the treebank so that they describe the
newly created ill-formed sentences. Such a treebank can be used to test how well a parser is able to ignore grammatical errors
in texts (as people do), and can be used to induce a grammar capable of analysing such sentences. This article demonstrates
these two applications using the Penn Treebank. In a robustness evaluation experiment, two state-of-the-art statistical parsers
are evaluated on an ungrammatical version of Sect. 23 of the Wall Street Journal (WSJ) portion of the Penn treebank. This
experiment shows that the performance of both parsers degrades with grammatical noise. A breakdown by error type is provided
for both parsers. A second experiment retrains both parsers using an ungrammatical version of WSJ Sections 2–21. This experiment
indicates that an ungrammatical treebank is a useful resource in improving parser robustness to grammatical errors, but that
the correct combination of grammatical and ungrammatical training data has yet to be determined. 相似文献
15.
16.
17.
18.
Dictionary learning algorithms for sparse representation 总被引:11,自引:0,他引:11
Kreutz-Delgado K Murray JF Rao BD Engan K Lee TW Sejnowski TJ 《Neural computation》2003,15(2):349-396
19.
Online dictionaries can be important tools for research and application in natural language processing. This paper describes work with a machine-readable version of "Dorland's Illustrated Medical Dictionary". First the characteristics of the dictionary are briefly described, and then the complex process of converting the tape to an online interactive dictionary is discussed. The results of several experiments in automatically deriving information from the online dictionary are presented, and the paper ends with a discussion of the use of the online dictionary as a tool in the development of a natural language processing system designed for the biomedical domain. 相似文献
20.
简要分析了当前自动答疑系统的缺陷及其重要性,设计了一个基于Lucene的自动答疑系统。该系统充分利用了Lucene强大的检索机制,设计了针对于本答疑系统的专业词典,采用了当前最流行的二级哈希词典存储结构,同时提出了一种优化的最大匹配中文分词算法并应用到Lucene当中,弥补了Lucene自带分词器的不足。 相似文献