首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A common practice in operational Machine Translation (MT) and Natural Language Processing (NLP) systems is to assume that a verb has a fixed number of senses and rely on a precompiled lexicon to achieve large coverage. This paper demonstrates that this assumption is too weak to cope with the similar problems of lexical divergences between languages and unexpected uses of words that give rise to cases outside of the pre-compiled lexicon coverage. We first examine the lexical divergences between English verbs and Chinese verbs. We then focus on a specific lexical selection problem—translating Englishchange-of-state verbs into Chinese verb compounds. We show that an accurate translation depends not only on information about the participants, but also on contextual information. Therefore, selectional restrictions on verb arguments lack the necessary power for accurate lexical selection. Second, we examine verb representation theories and practices in MT systems and show that under the fixed sense assumption, the existing representation schemes are not adequate for handling these lexical divergences and extending existing verb senses to unexpected usages. We then propose a method of verb representation based on conceptual lattices which allows the similarities among different verbs in different languages to be quantitatively measured. A prototype system UNICON implements this theory and performs more accurate MT lexical selection for our chosen set of verbs. An additional lexical module for UNICON is also provided that handles sense extension.  相似文献   

2.
3.
This paper presents a lexical model dedicated to the semanticrepresentation and interpretation of individual words inunrestricted text, where sense discrimination is difficult toassess. We discuss the need of a lexicon including local inferencemechanisms and cooperating with as many other knowledge sources(about syntax, semantics and pragmatics) as possible. We suggest aminimal representation (that is, the smallest representationpossible) acting as a bridge between a conceptual representation andthe microscopic sense variations of lexical semantics. We describean interpretation method providing one or many alternativecandidate(s) to the word, as representatives of its meaning in thesentence (and text).  相似文献   

4.
In this paper, we discuss the use of binary decision diagrams to represent general matrices. We demonstrate that binary decision diagrams are an efficient representation for every special-case matrix in common use, notably sparse matrices. In particular, we demonstrate that for any matrix, the BDD representation can be no larger than the corresponding sparse-matrix representation. Further, the BDD representation is often smaller than any other conventional special-case representation: for the n×n Walsh matrix, for example, the BDD representation is of size O(log n). No other special-case representation in common use represents this matrix in space less than O(n2). We describe termwise, row, column, block, and diagonal selection over these matrices, standard an Strassen matrix multiplication, and LU factorization. We demonstrate that the complexity of each of these operations over the BDD representation is no greater than that over any standard representation. Further, we demonstrate that complete pivoting is no more difficult over these matrices than partial pivoting. Finally, we consider an example, the Walsh Spectrum of a Boolean function.  相似文献   

5.
This article focuses on the variability of one of the subtypes of multi-word expressions, namely those consisting of a verb and a particle or a verb and its complement(s). We build on evidence from Estonian, an agglutinative language with free word order, analysing the behaviour of verbal multi-word expressions (opaque and transparent idioms, support verb constructions and particle verbs). Using this data we analyse such phenomena as the order of the components of a multi-word expression, lexical substitution and morphosyntactic flexibility.  相似文献   

6.
We develop a goodness-of-fit measure with desirable properties for use in the hierarchical logistic regression setting. The statistic is an unweighted sum of squares (USS) of the kernel smoothed model residuals. We develop expressions for the moments of this statistic and create a standardized statistic with hypothesized asymptotic standard normal distribution under the null hypothesis that the model is correctly specified. Extensive simulation studies demonstrate satisfactory adherence to Type I error rates of the Kernel smoothed USS statistic in a variety of likely data settings. Finally, we discuss issues of bandwidth selection for using our proposed statistic in practice and illustrate its use in an example.  相似文献   

7.
Facial expression recognition has recently become an important research area, and many efforts have been made in facial feature extraction and its classification to improve face recognition systems. Most researchers adopt a posed facial expression database in their experiments, but in a real-life situation the facial expressions may not be very obvious. This article describes the extraction of the minimum number of Gabor wavelet parameters for the recognition of natural facial expressions. The objective of our research was to investigate the performance of a facial expression recognition system with a minimum number of features of the Gabor wavelet. In this research, principal component analysis (PCA) is employed to compress the Gabor features. We also discuss the selection of the minimum number of Gabor features that will perform the best in a recognition task employing a multiclass support vector machine (SVM) classifier. The performance of facial expression recognition using our approach is compared with those obtained previously by other researchers using other approaches. Experimental results showed that our proposed technique is successful in recognizing natural facial expressions by using a small number of Gabor features with an 81.7% recognition rate. In addition, we identify the relationship between the human vision and computer vision in recognizing natural facial expressions.  相似文献   

8.
This paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 2600 new relationships that did not appear in WordNet originally. The precision of these relationships depends on the degree of generality chosen for the patterns and the type of relation, being around 60–70% for the best combinations proposed.  相似文献   

9.
10.
This case study describes the specification and formal verification of the key part of SPaS, a development tool for the design of open loop programmable control developed at the University of Applied Sciences in Leipzig. SPaS translates the high-level representation of an open loop programmable control into a machine executable instruction list. The produced instruction list has to exhibit the same behaviour as suggested by the high-level representation. We discuss the following features of the case study: characterization of the correctness requirements, design of a verification strategy, the correctness proof, and the relation to the Common Criteria evaluation standard.  相似文献   

11.
This article describes a probabilistic approach for improving the accuracy of general object pose estimation algorithms. We propose a histogram filter variant that uses the exploration capabilities of robots, and supports active perception through a next-best-view proposal algorithm. For the histogram-based fusion method we focus on the orientation of the 6 degrees of freedom (DoF) pose, since the position can be processed with common filtering techniques. The detected orientations of the object, estimated with a pose estimator, are used to update the hypothesis of its actual orientation. We discuss the design of experiments to estimate the error model of a detection method, and describe a suitable representation of the orientation histograms. This allows us to consider priors about likely object poses or symmetries, and use information gain measures for view selection. The method is validated and compared to alternatives, based on the outputs of different 6 DoF pose estimators, using real-world depth images acquired using different sensors, and on a large synthetic dataset.  相似文献   

12.
13.
In this paper we develop a formalization of semantic relations that facilitates efficient implementations of relations in lexical databases or knowledge representation systems using bases. The formalization of relations is based on a modeling of hierarchical relations in Formal Concept Analysis. Further, relations are analyzed according to Relational Concept Analysis, which allows a representation of semantic relations consisting of relational components and quantificational tags. This representation utilizes mathematical properties of semantic relations. The quantificational tags imply inheritance rules among semantic relations that can be used to check the consistency of relations and to reduce the redundancy in implementations by storing only the basis elements of semantic relations. The research presented in this paper is an example of an application of Relational Concept Analysis to lexical databases and knowledge representation systems (cf. Priss 1996) which is part of a larger framework of research on natural language analysis and formalization.  相似文献   

14.
在真实语言环境中,词语间的联系普遍存在、错综复杂。为了更好融合和使用各种语义资源库中的语义关系,构建可计算的汉语词汇语义资源,该文提出了通过构建语义关系图整合各种语义资源的方法,并在《知网》上实现。《知网》作为一个知识库系统,对各个词语义项是以分条记录的形式存储的,各种词汇语义关系隐含在词典文件和义原描述文件中。为提取《知网》中语义间的关系,本文首先将《知网》中的概念以概念树的形式重新表示,并从概念树中提取适当的语义关系,构建语义关系图。经过处理,得到88种589 984条语义关系,图上各种节点具有广泛的联系,为基于语义关系图的进一步分析和计算打下了基础。  相似文献   

15.
汉英翻译系统英文生成中选词模型的设计   总被引:1,自引:1,他引:0  
本文描述了一种基于实例比较,辅以语义模式匹配的英文选词模型的设计。首先,我们讨论了汉英翻译系统英文生成中选词的重要性,然后比较了几种可能的选词策略并提出我们的选词模型,接着我们较详细地描述了生成词典的结构以及选词算法。文中,我们还简要介绍了我们所使用的语义知识资源——《知网》。  相似文献   

16.
We introduce an exponential language model which models a whole sentence or utterance as a single unit. By avoiding the chain rule, the model treats each sentence as a “bag of features", where features are arbitrary computable properties of the sentence. The new model is computationally more efficient, and more naturally suited to modeling global sentential phenomena, than the conditional exponential (e.g. maximum entropy) models proposed to date. Using the model is straightforward. Training the model requires sampling from an exponential distribution. We describe the challenge of applying Monte Carlo Markov Chain and other sampling techniques to natural language, and discuss smoothing and step-size selection. We then present a novel procedure for feature selection, which exploits discrepancies between the existing model and the training corpus. We demonstrate our ideas by constructing and analysing competitive models in the Switchboard and Broadcast News domains, incorporating lexical and syntactic information.  相似文献   

17.
18.
This paper describes the lexical-semantic basis for UNITRAN, an implemented scheme for translating Spanish, English, and German bidirectionally. Two claims made here are that the current representation handles many distinctions (or divergences) across languages without recourse to language-specific rules and that the lexical-semantic framework provides the basis for a systematic mapping between the interlingua and the syntactic structure. The representation adopted is an extended version of lexical conceptual structure which is suitable to the task of translating between divergent structures for two reasons: (1) it provides an abstraction of language-independent properties from structural idiosyncrasies; and (2) it is compositional in nature. The lexical-semantic approach addresses the divergence problem by using a linguistically grounded mapping that has access to parameter settings in the lexicon. We will examine a number of relevant issues including the problem of defining primitives, the issue of interlinguality, the cross-linguistic coverage of the system, and the mapping between the syntactic structure and the interlingua. A detailed example of lexical-semantic composition will be presented.  相似文献   

19.
构建能够表达语义特征的词语表示形式是自然语言处理的关键问题。该文首先介绍了基于分布假设和基于预测模型的词汇语义表示方法,并给出目前词表示方法的评价指标;进而介绍了基于词汇表示所蕴含的语义信息而产生的新应用;最后,对词汇语义表示研究的方法和目前面临的问题进行了分析和展望。
  相似文献   

20.
Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies. In this paper we propose three path-based measures for computing document relatedness in the conceptual space formed by the hierarchical organization of a Wikipedia Category Graph (WCG). We compare the proposed approaches with the standard Path Length method to establish the best relatedness measure for the WCG representation. To test overall WCG efficiency, we compare the proposed representations with the BoW method. The evaluation was performed with two different types of clustering algorithms (OPTICS and K-Means), used for categorization of keyword-based search results. The experiments have shown that our approach outperforms the standard Path Length approach, and the WCG representation achieves better results than BoW.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号