首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
杨陟卓 《计算机应用》2015,35(4):1006-1008
针对传统词义消歧方法面临的数据稀疏问题,提出一种基于上下文语境的词义消歧方法。该方法假设同一篇文章中的句子之间共享一些相同的话题,首先,抽取在同一篇文章中包含相同歧义词的句子,这些句子可以作为歧义句的上下文语境,为其中的一个歧义句子提供消歧知识;其次,通过一种无监督的词义消歧方法进行词义消歧。在真实的语料上实验结果表明,使用2个上下文语境句子,窗口大小为1时,该方法的消歧准确率比基线方法(OrigDisam)提高了3.26%。  相似文献   

2.
词义消歧一直是自然语言处理领域中的重要问题,该文将知网(HowNet)中表示词语语义的义原信息融入到语言模型的训练中。通过义原向量对词语进行向量化表示,实现了词语语义特征的自动学习,提高了特征学习效率。针对多义词的语义消歧,该文将多义词的上下文作为特征,形成特征向量,通过计算多义词词向量与特征向量之间相似度进行词语消歧。作为一种无监督的方法,该方法大大降低了词义消歧的计算和时间成本。在SENSEVAL-3的测试数据中准确率达到了37.7%,略高于相同测试集下其他无监督词义消歧方法的准确率。  相似文献   

3.
《计算机工程》2017,(9):210-213
词义消歧在机器翻译、信息检索、语音语义识别等方面具有重要作用。为提高消歧质量,细化特征粒度,提出一种多特征词义消歧方案。通过依存句法分析提取上下文中多义词及义项的词性、依存结构、依存词等特征,细化特征粒度,并根据多特征构造权值函数,选择权值最大的义项作为多义词的义项。实验结果表明,与单一特征词义消歧相比,采用依存句法分析的多特征词义消歧方案细化了特征粒度,提高了消歧准确率。  相似文献   

4.
杨陟卓 《计算机科学》2017,44(4):252-255, 280
针对目前有监督词义消歧方法存在的数据稀疏问题,提出一种基于上下文翻译的词义消歧方法。该方法假设由歧义词上下文的译文所组成的语境与原上下文语境所表述的意义相似。根据此假设,首先,将译文所组成的上下文生成大量的伪训练语料;然后,利用真实训练语料和伪训练语料训练一个贝叶斯消歧模型;最后,利用该消歧模型决策歧义词的词义。实验结果表明, 与传统的消歧方法相比,所提出的方法消歧准确率提高了4.35%,并且超过了参加SemEval-2007测评的最好的有监督消歧系统。  相似文献   

5.
针对现有基于语义的词义消歧方法存在两点不足:一,利用部分具有歧义的上下文语境词进行消歧存在不合理性;二,未考虑上下文语境词距离远近对语义相关度计算的影响,提出一种改进的方法,采用分步策略和距离加权两种方法分别进行改进。实验结果表明,改进方法在消歧效果上有明显的改善。  相似文献   

6.
为了提高词义消歧的质量, 对歧义词汇的上下文进行结构分析, 提出了一种利用句法知识来指导消歧过程的方法。在歧义词汇上下文的句法树中, 提取句法信息和词性信息作为消歧特征; 同时, 使用朴素贝叶斯模型作为消歧分类器。利用词义标注语料对分类器的参数进行优化, 然后对测试数据中的歧义词汇进行消歧。实验结果表明, 消歧的准确率有所提升, 达到了66. 7%。  相似文献   

7.
一种基于知网的中文词义消歧算法   总被引:1,自引:2,他引:1  
词义消歧对自然语言处理领域许多问题的研究具有重要的理论和实践价值.针对该问题,提出了一种基于知网的中文词义消歧算法.为了考虑上下文词汇对词义消歧的不同影响,以语义相似度计算为基础,设计了三种语义联系强度计算方法,并且制定了四条词义消歧规则,依此实现中文词义消歧.实验数据显示该方法可获得65%左右的召回率和75%左右的准确率.  相似文献   

8.
词义消歧一直是自然语言处理领域中的关键性问题。为了提高词义消歧的准确率,从目标歧义词汇出发,挖掘左右词单元的语义知识。以贝叶斯模型为基础,结合左右词单元的语义信息,提出了一种新的词义消歧方法。以SemEval-2007:Task#5作为训练语料和测试语料,对词义消歧分类器进行优化,并对优化后的分类器进行测试。实验结果表明:词义消歧的准确率有所提高。  相似文献   

9.
《软件》2019,(2):11-15
在计算机语言学中,词义消歧是自然语言处理的一个重要问题,词义消歧即指根据上下文确定对象语义的过程,在词义、句义、篇章中都会出现这种词语在上下文的语义环境中有不同的含义的现象。本文提出一种基于神经网络的模型实现词义消歧,将词向量输入神经网络,通过分类的方式实现消歧的目的。实验表明,基于神经网络的词义消歧比传统的统计方法消歧具有更高的准确度。  相似文献   

10.
词义消歧是自然语言处理中的一个关键问题,为提高大规模词义消歧的准确率,提出了一种基于模板的无导词义消歧方法。利用多义词不同义项的同义或近义单义词对该义项进行表述,综合考虑共现词出现的位置、上下文距离及出现频次,据此构造语境模板,有效地解决了多义词义项确定的困难。实验结果表明,本文提出的方法在消歧性能方面有较明显的改善。  相似文献   

11.
Concrete concepts are often easier to understand than abstract concepts. The notion of abstractness is thus closely tied to the organisation of our semantic memory, and more specifically our internal lexicon, which underlies our word sense disambiguation (WSD) mechanisms. State-of-the-art automatic WSD systems often draw on a variety of contextual cues and assign word senses by an optimal combination of statistical classifiers. The validity of various lexico-semantic resources as models of our internal lexicon and the cognitive aspects pertinent to the lexical sensitivity of WSD are seldom questioned. We attempt to address these issues by examining psychological evidence of the internal lexicon and its compatibility with the information available from computational lexicons. In particular, we compare the responses from a word association task against existing lexical resources, WordNet and SUMO, to explore the relation between sense abstractness and semantic activation, and thus the implications on semantic network models and the lexical sensitivity of WSD. Our results suggest that concrete senses are more readily activated than abstract senses, and broad associations are more easily triggered than narrow paradigmatic associations. The results are expected to inform the construction of lexico-semantic resources and WSD strategies.  相似文献   

12.
为解决词义消歧问题,引入了语义相关度计算。研究并设计了词语相关度计算模型,即在充分考虑语义资源《知网》中概念间结构特点、概念信息量和概念释义的基础上,利用概念词与实例词间的搭配所表征的词语间强关联来进行词语相关度的计算。实验结果表明,该模型得到的语义相关度结果对于解决WSD问题提供了良好的支撑依据。  相似文献   

13.
词义消歧是一项具有挑战性的自然语言处理难题。作为词义消歧中的一种优秀的半监督消歧算法,遗传蚁群词义消歧算法能快速进行全文词义消歧。该算法采用了一种局部上下文的图模型来表示语义关系,以此进行词义消歧。然而,在消歧过程中却丢失了全局语义信息,出现了消歧结果冲突的问题,导致算法精度降低。因此, 提出了一种基于全局领域和短期记忆因子改进的图模型来表示语义以解决这个问题。该图模型引入了全局领域信息,增强了图对全局语义信息的处理能力。同时根据人的短期记忆原理,在模型中引入了短期记忆因子,增强了语义间的线性关系,避免了消歧结果冲突对词义消歧的影响。大量实验结果表明:与经典词义消歧算法相比,所提的改进图模型提高了词义消歧的精度。  相似文献   

14.
序列标注是自然语言处理领域的基本任务。目前大多数序列标注方法采用循环神经网络及其变体直接提取序列中的上下文语义信息,尽管有效地捕捉到了词之间的连续依赖关系并取得了不错的性能,但捕获序列中离散依赖关系的能力不足,同时也忽略了词与标签之间的联系。因此,提出了一种多级语义信息融合编码方式,首先,通过双向长短期记忆网络提取序列上下文语义信息;然后,利用注意力机制将标签语义信息添加到上下文语义信息中,得到融合标签语义信息的上下文语义信息;接着,引入自注意力机制捕捉序列中的离散依赖关系,得到含有离散依赖关系的上下文语义信息;最后,使用融合机制将3种语义信息融合,得到一种全新的语义信息。实验结果表明,相比于采用循环神经网络或其变体对序列直接编码的方式,多级语义信息融合编码方式能明显提升模型性能。  相似文献   

15.
Word sense disambiguation (WSD) is traditionally considered an AI-hard problem. A break-through in this field would have a significant impact on many relevant Web-based applications, such as Web information retrieval, improved access to Web services, information extraction, etc. Early approaches to WSD, based on knowledge representation techniques, have been replaced in the past few years by more robust machine learning and statistical techniques. The results of recent comparative evaluations of WSD systems, however, show that these methods have inherent limitations. On the other hand, the increasing availability of large-scale, rich lexical knowledge resources seems to provide new challenges to knowledge-based approaches. In this paper, we present a method, called structural semantic interconnections (SSI), which creates structural specifications of the possible senses for each word in a context and selects the best hypothesis according to a grammar G, describing relations between sense specifications. Sense specifications are created from several available lexical resources that we integrated in part manually, in part with the help of automatic procedures. The SSI algorithm has been applied to different semantic disambiguation problems, like automatic ontology population, disambiguation of sentences in generic texts, disambiguation of words in glossary definitions. Evaluation experiments have been performed on specific knowledge domains (e.g., tourism, computer networks, enterprise interoperability), as well as on standard disambiguation test sets.  相似文献   

16.
Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD. A.S. is also an Adjust Professor at the Department of Computer Science and Engineering, University of New South Wales; and a Visiting Professor at the Computing Laboratory, University of Oxford.  相似文献   

17.
闫蓉  张蕾 《微机发展》2006,16(3):22-25
针对自然语言处理领域词义消歧这一难点,提出一种新的汉语词义消歧方法。该方法以《知网》为语义资源,充分利用词语之间的优先组合关系。根据优先组合库得到句中各个实词与歧义词之间的优先组合关系;将各实词按照优先组合关系大小进行排列;计算各实词概念与歧义词概念之间的相似度,以判断歧义词词义。实验结果表明该方法对于高频多义词消歧是有效的,可作为进一步结构消歧的基础。  相似文献   

18.
针对目前图像视觉领域对输电线路缺销螺丝部件研究较少,且在传统图像处理方法上,螺丝的识别精度不高等问题。文章采用一种基于上下文语义分割信息的缺销螺丝识别方法,在Deeplab v3+网络的基础上,对输电线路数据集进行图像裁剪分块和自适应Gamma校正增强预处理,将缺销螺丝识别的mIoU提升了17%左右;对于普通螺丝误识别,提出了结合上下文语义分割信息的方法,将分割出缺销螺丝区域分别和周围若干部件区域进行拓扑关系分析,根据拓扑关系类别排除误识别到的普通螺丝。通过多组实验结果表明,采用预处理和结合上下文语义信息的缺销螺丝识别方法要优于Deeplab v3+算法。  相似文献   

19.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号