首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 171 毫秒
1.
基于Ontology的数据库自然语言查询接口的研究   总被引:3,自引:1,他引:2  
提出了一种基于Ontology的关系数据库自然语言查询接口的系统模型及设计框架.采用WordNet作为基本数据库并在WordNet之上定义领域词库,可以提高语法分析的识别率;同时利用Ontlogly知识表达能力存储关系数据库概念模型,并对概论模型的内容进行扩充;另外对Ontology和WordNet的同义词集进行关联,可以提高语义的识别率.用户的输入查询语句通过语法分析、语义分析生成中间表达式语言DRS,然后通过模板技术转换成SQL,通过DBMS执行SQL并返回结果.实验证明,这种方案不但实用可行,而且通过逐步完善Ontology知识库的定义,可以大大提高查询的命中率;另外通过WordNet和Ontology定义领域词库和领域知识,提高了系统的可移植性.最后,所提供的方法可以很容易地移植到其他领域.  相似文献   

2.
在英语及其它的欧洲语言里,词汇语意关系已有相当充分的研究。例如,欧语词网( EuroWordNet ,Vossen 1998) 就是一个以语意关系来勾勒词汇词义的数据库。也就是说,词汇意义的掌握是透与其它词汇语意的关连来获致的。为了确保数据库建立的品质与一致性,欧语词网计画就每一个处理的语言其词汇间的词义关系是否成立提出相应的语言测试。实际经验显示,利用这些语言测试,人们可以更容易且更一致地辨识是否一对词义之间确实具有某种词义关系。而且,每一个使用数据库的人也可以据以检验其中关系连结的正确性。换句话说,对一个可检验且独立于语言的词汇语意学理论而言,这些测试提供了一个基石。本文中,我们探究为中文词义关系建立中文语言测试的可能性。尝试为一些重要的语意关系提供测试的句式和规则来评估其可行性。这项研究除了建构中文词汇语意学的理论基础,也对Miller的词汇网络架构(WordNet ,Fellbaum 1998) 提供了一个有力的支持,这个架构在词汇表征和语言本体架构研究上开拓了关系为本的进路。  相似文献   

3.
基于特征词关联性的同义词集挖掘算法*   总被引:2,自引:0,他引:2  
一词多义和多词同义是语言中广泛存在的现象,它给自然语言处理带来了很多困难,解决这个难题的有效办法是建立包含上下文信息的同义词集。深入分析了概念、词汇和特征词三者的内在关系,并在此基础上提出了一种基于同义词汇的特征词的关联性,从文本中挖掘同义词集的算法。根据特征词之间存在关联性的特点,算法以成熟的关联规则挖掘算法作为基础,获得了明显优于同类算法的实验效果。算法获得的同义词集附带上下文信息,可有效解决文本中词汇的多义性和同义性问题。  相似文献   

4.
WNCT:一种WordNet概念自动翻译方法   总被引:2,自引:1,他引:1  
WordNet是在自然语言处理领域有重要作用的英语词汇知识库,该文提出了一种将WordNet中词汇概念自动翻译为中文的方法。首先,利用电子词典和术语翻译工具将英语词汇在义项的粒度上翻译为中文;其次,将特定概念中词汇的正确义项选择看作分类问题,归纳出基于翻译唯一性、概念内和概念间翻译交集、中文短语结构规则,以及基于PMI的翻译相关性共12个特征,训练分类模型实现正确义项的选择。实验结果表明,该方法对WordNet 3.0中概念翻译的覆盖率为85.21%,准确率为81.37%。  相似文献   

5.
在问答系统中,只通过问题中所包含的词汇表达查询意图,难以从数据源中获得理想的答案。为此,针对基于知识图谱的问答系统,提出一种语义查询扩展方法。利用WordNet对问题三元组中的查询术语从同义词、上义词和下义词3个语义角度进行扩展,采用Microsoft Concept Graph对查询术语从上义词和下义词2个角度进行扩展。针对每种语义角度的扩展结果,设计不同的过滤策略进行筛选,根据查询术语的语义扩展结果实现对问题三元组的扩展。实验结果表明,该方法平均准确率大于83%,对问题三元组的多语义角度的扩展效果较好。  相似文献   

6.
针对专利搜索日志中同义词出现的特点,改进了词共现相似度算法,提出了一种基于专利搜索日志的同义词挖掘方法.利用专利搜索日志中同义词出现的规律挖掘同义词集的结构模板,根据这些模板抽取出候选同义词集,利用改进的词共现方法计算词汇相似度.对称共现的词对正确率达到85.66%,召回率达到78.98%,F值0.82.该方法可用于专利搜索引擎中提高专利检索的效率.  相似文献   

7.
本体采用基于语法词汇的表述方式,使本体自身表示可能存在模糊性、错误理解等问题,部分本体的概念可以通过自身的上下文信息推测出其含义,但是有些本体根据已有信息不能清晰表达其概念的确切含义.针对这个问题,提出基于背景知识的本体注释方法,对本体本身进行注释和澄清.包括基于WordNet和Web搜索引擎的注释方法,利用WordNet查找本体概念的正确词义,利用Web搜索引擎搜索本体概念的snippets,分别将词义和snippets作为其属性注释到本体中.实验表明本体注释率达到99.12%,表明本文方法的是可行的,本体注释正确率达到80.76%,比同类方法更高.  相似文献   

8.
针对在自然语言文本信息隐写术中,采用基于同义词替换方法来嵌入秘密信息时,常由于候选同义词选择不准确,导致替换后文本语句出现明显错误或逻辑歧义等问题,提出了基于二元依存同义词替换隐写算法。该算法先从WordNet词库中得出与目标词词性相同、语义相似的词语,然后对目标语句利用依存句法提取同义词对应的二元依存关系,从大规模语料库中计算二元依存关系的向量距离,得出最佳替换的同义词词集。实验结果表明,该算法生成的隐写文本保持嵌入秘密信息后文本特征属性不变,比目前改进的同义词替换算法更能保证文本语法正确、语义完整,更高效地抵抗同义词结对和相对词频统计分析检测,提高了秘密信息传递的安全性。  相似文献   

9.
跨语言知识链接是指在描述相同内容的不同语言的在线百科文章之间建立联系。跨语言知识链接可分为候选集选择和候选集排序两部分。首先,把候选集选择问题转换为跨语言信息检索问题,提出一种将标题与关键词相结合从而生成查询的方法,该方法将候选集选择的召回率大幅提高至93.8%;在候选集排序部分,提出一种融合双语主题模型及双语词向量的排序模型,实现了英文维基百科和中文百度百科之间军事领域的跨语言知识链接。实验结果表明,该模型取得了75%的准确率,显著提高了跨语言知识链接的性能,并且提出的方法不依赖于语言特性和领域特性,因此可以很容易地扩展至其他语言和其他领域的跨语言知识链接。  相似文献   

10.
周由  戴牡红 《计算机科学》2013,40(Z11):267-269,300
在新闻项目的推荐系统中,通常使用TF-IDF权重技术结合余弦相似性度量方法,然而这种技术没有考虑到文字本身的实际语义,因此,提出了基于内容和语义分析相结合的一种新方法。此方法将同义词集合的逆文档频率及语义相似性相结合,采用WordNet同义词集合做相似性计算。构建用户配置文件进行实验测试,验证了该方法的有效性。实验结果表明,提出的语义方法性能优于TF-IDF方法。  相似文献   

11.
This paper presents an automatic construction of Korean WordNet from pre-existing lexical resources. We develop a set of automatic word sense disambiguation techniques to link a Korean word sense collected from a bilingual machine-readable dictionary to a single corresponding English WordNet synset. We show how individual links provided by each word sense disambiguation method can be non-linearly combined to produce a Korean WordNet from existing English WordNet for nouns.  相似文献   

12.
The Princeton WordNet® (PWN) is a widely used lexical knowledge database for semantic information processing. There are now many wordnets under creation for languages worldwide. In this paper, we endeavor to construct a wordnet for Pre-Qin ancient Chinese (PQAC), called PQAC WordNet (PQAC-WN), to process the semantic information of PQAC. In previous work, most recently constructed wordnets have been established either manually by experts or automatically using resources from which translation pairs between English and the target language can be extracted. The former method, however, is time-consuming, and the latter method, owing to a lack of language resources, cannot be performed on PQAC. As a result, a method based on word definitions in a monolingual dictionary is proposed. Specifically, for each sense, kernel words are first extracted from its definition, and the senses of each kernel word are then determined by graph-based Word Sense Disambiguation. Finally, one optimal sense is chosen from the kernel word senses to guide the mapping between the word sense and PWN synset. In this research, we obtain 66 % PQAC senses that can be shared with English and another 14 % language-specific senses that were added to PQAC-WN as new synsets. Overall, the automatic mapping achieves a precision of over 85 %.  相似文献   

13.
Our goal is to construct large-scale lexicons for interlingual MT of English, Arabic, Korean, and Spanish. We describe techniques that predict salient linguistic features of a non-English word using the features of its English gloss (i.e., translation) in a bilingual dictionary. While not exact, owing to inexact glosses and language-to-language variations, these techniques can augment an existing dictionary with reasonable accuracy, thus saving significant time. We have conducted two experiments that demonstrate the value of these techniques. The first tested the feasibility of building a database of thematic grids for over 6500 Arabic verbs based on a mapping between English glosses and the syntactic codes in Longman's Dictionary of Contemporary English (LDOCE) (Procter, 1978). We show that it is more efficient and less error-prone to hand-verify the automatically constructed grids than it would be to build the thematic grids by hand from scratch. The second experiment tested the automatic classification of verbs into a richer semantic typology based on (Levin, 1993), from which we can derive a more refined set of thematic grids. In this second experiment, we show that a brute-force, non-robust technique provides 72% accuracy for semantic classification of LDOCE verbs; we then show that it is possible to approach this yield with a more robust technique based on fine-tuned statistical correlations. We further suggest the possibility of raising this yield by taking into account linguistic factors such as polysemy and positive and negative constraints on the syntax-semantics relation. We conclude that, while human intervention will always be necessary for the construction of a semantic classification from LDOCE, such intervention is significantly minimized as more knowledge about the syntax-semantics relation is introduced.  相似文献   

14.
Wordnets are large-scale lexical databases of related words and concepts, useful for language-aware software applications. They have recently been built for many languages by using various approaches. The Finnish wordnet, FinnWordNet (FiWN), was created by translating the more than 200,000 word senses in the English Princeton WordNet (PWN) 3.0 in 100 days. To ensure quality, they were translated by professional translators. The direct translation approach was based on the assumption that most synsets in PWN represent language-independent real-world concepts. Thus also the semantic relations between synsets were assumed mostly language-independent, so the structure of PWN could be reused as well. This approach allowed the creation of an extensive Finnish wordnet directly aligned with PWN and also provided us with a translation relation and thus a bilingual wordnet usable as a dictionary. In this paper, we address several concerns raised with regard to our approach, many of them for the first time. We evaluate the craftsmanship of the translators by checking the spelling and translation quality, the viability of the approach by assessing the synonym quality both on the lexeme and concept level, as well as the usefulness of the resulting lexical resource both for humans and in a language-technological task. We discovered no new problems compared with those already known in PWN. As a whole, the paper contributes to the scientific discourse on what it takes to create a very large wordnet. As a side-effect of the evaluation, we extended FiWN to contain 208,645 word senses in 120,449 synsets, effectively making version 2.0 of FiWN currently the largest wordnet in the world by these statistics.  相似文献   

15.
In this paper, we describe both a multi-lingual, interlingual MT system (ULTRA) and a method of extracting lexical entries for it automatically from an existing machine-readable dictionary (LDOCE). We believe the latter is original and the former, although not the first interlingual MT System by any means, may be first that is symmetrically multi-lingual. It translates between English, German, Chinese, Japanese and Spanish and has vocabularies in each language based on about 10,000 word senses.This research was supported by the New Mexico State University Computing Research Laboratory through NSF Grant IRI-9101232.  相似文献   

16.
Studies of lexical–semantic relations aim to understand the mechanism of semantic memory and the organization of the mental lexicon. However, standard paradigmatic relations such as “hypernym” and “hyponym” cannot capture connections among concepts from different parts of speech. WordNet, which organizes synsets (i.e., synonym sets) using these lexical–semantic relations, is rather sparse in its connectivity. According to WordNet statistics, the average number of outgoing/incoming arcs for the hypernym/hyponym relation per synset is 1.33. Evocation, defined as how much a concept (expressed by one or more words) brings to mind another, is proposed as a new directed and weighted measure for the semantic relatedness among concepts. Commonly applied semantic relations and relatedness measures do not seem to be fully compatible with data that reflect evocations among concepts. They are compatible but evocation captures MORE. This work aims to provide a reliable and extendable dataset of concepts evoked by, and evoking, other concepts to enrich WordNet, the existing semantic network. We propose the use of disambiguated free word association data (first responses to verbal stimuli) to infer and collect evocation ratings. WordNet aims to represent the organization of mental lexicon, and free word association which has been used by psycholinguists to explore semantic organization can contribute to the understanding. This work was carried out in two phases. In the first phase, it was confirmed that existing free word association norms can be converted into evocation data computationally. In the second phase, a two-stage association-annotation procedure of collecting evocation data from human judgment was compared to the state-of-the-art method, showing that introducing free association can greatly improve the quality of the evocation data generated. Evocation can be incorporated into WordNet as directed links with scales, and benefits various natural language processing applications.  相似文献   

17.
This paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 2600 new relationships that did not appear in WordNet originally. The precision of these relationships depends on the degree of generality chosen for the patterns and the type of relation, being around 60–70% for the best combinations proposed.  相似文献   

18.
This paper is a contribution to the discussion on compiling computational lexical resources from conventional dictionaries. It describes the theoretical as well as practical problems that are encountered when reusing a conventional dictionary for compiling a lexical-semantic resource in terms of a wordnet. More specifically, it describes the methodological issues of compiling a wordnet for Danish, DanNet, from a monolingual basis, and not—as is often seen—by applying the translational expansion method with Princeton WordNet as the English source. Thus, we apply as our basis a large, corpus-based printed dictionary of modern Danish. Using this approach, we discuss the issues of readjusting inconsistent and/or underspecified hyponymy hierarchies taken from the conventional dictionary, sense distinctions as opposed to the synonym sets of wordnets, generating semantic wordnet relations on the basis of sense definitions, and finally, supplementing missing or implicit information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号