期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘亚清张瑾于纯妍《微机发展》2006,16(5):184-185

词义排歧在自然语言处理领域占有重要地位。词义排歧的精确率依赖于排歧知识的完备性。但是目前使用的基于词典的和基于语料库的词义排歧方法来获取排歧知识的效果都不令人满意。文中借助《知网》,以义原同现频率矩阵作为排歧知识,在其基础上设计并实现了一个基于义原同现频率的汉语词义排歧系统,大大地提高词义排歧的精确率。相似文献

2.

SYMCON—A Hybrid Symbolic/Connectionist System for Word Sense Disambiguation

Xinyu Wu Michael Mctear Piyush Ojha 《Applied Intelligence》1997,7(1):5-26

Connectionist methods and knowledge-based techniques are two largely complementary approaches to natural language processing (NLP). However, they both have some potential problems which preclude their being a general purpose processing method. Research reveals that a hybrid processing approach that combines connectionist with symbolic techniques may be able to use the strengths of one processing paradigm to address the weakness of the other one. Hence, a system that effectively combines the two different approaches can be superior to either one in isolation. This paper describes a hybrid system—SYMCON (SYMbolic and CONnectionist) which integrates symbolic and connectionist techniques in an attempt to solve the problem of word sense disambiguation (WSD), which is arguably one of the most fundamental and difficult issues in NLP. It consists of three sub-systems: first, a distributed simple recurrent network (SRN) is trained by using the standard back-propagation algorithm to learn the semantic relationships among concepts, thereby generating categorical constraints that are supplied to the other two sub-systems as the initial results of pre-processing. The second sub-system of SYMCON is a knowledge-based symbolic component consisting of a knowledge base containing general inferencing rules in a certain application domain. Third, a localist network is used to select the best interpretation among multiple alternatives and potentially ambiguous inference paths by spreading activation throughout the network. The structure, initial states, and connection weights of the network are determined by the processing outcome in the other two sub-systems. This localist network can be viewed as a medium between the distributed network and the symbolic sub-system. Such a hybrid symbolic/connectionist system combines information from all three sources to select the most plausible interpretation for ambiguous words. 相似文献

3.

Combining Supervised and Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation

E. Agirre G. Rigau L. Padró J. Atserias 《Computers and the Humanities》2000,34(1-2):103-108

This work combines a set of available techniques – whichcould be further extended – to perform noun sense disambiguation. We use several unsupervised techniques (Rigau et al., 1997) that draw knowledge from a variety of sources. In addition, we also apply a supervised technique in order to show that supervised and unsupervised methods can be combined to obtain better results. This paper tries to prove that using an appropriate method to combine those heuristics we can disambiguate words in free running text with reasonable precision. 相似文献

4.

一种改进的MM分词算法 总被引：28，自引：0，他引：28

郭辉苏中义王文崔骏《微型电脑应用》2002,18(1):13-15

本文首先提出一种对中文句子进行分词预处理的方法，在预处理过程中就能完成分词过程中所有的数据库访问操作，这种方法可以不加修改地应用于所有机械分词算法以及消除歧义，然后在预处理的基础上实现一种改进的MM法，更加密全地遵照“长词优先”的原则，使分词系统在机械分词阶段能有比MM法更好的效果。相似文献

5.

Hierarchical Decision Lists for Word Sense Disambiguation

David Yarowsky 《Computers and the Humanities》2000,34(1-2):179-186

This paper describes a supervised algorithm for word sensedisambiguation based on hierarchies of decision lists. This algorithmsupports a useful degree of conditional branching while minimizing thetraining data fragmentation typical of decision trees. Classificationsare based on a rich set of collocational, morphological and syntacticcontextual features, extracted automatically from training data andweighted sensitive to the nature of the feature and feature class. Thealgorithm is evaluated comprehensively in the SENSEVAL framework,achieving the top performance of all participating supervised systems onthe 36 test words where training data is available. 相似文献

6.

利用上下文信息解决汉语自动分词中的组合型歧义 总被引：15，自引：2，他引：15

肖云孙茂松邹嘉彦《计算机工程与应用》2001,37(19):87-89

组合型歧义切分字段一直是汉语自动分词研究中的一个难点。该文将之视为与WordSenseDisambiguation(WSD)相等价的问题。文章借鉴了WSD研究中广泛使用的向量空间法,选取了20个典型的组合型歧义进行了详尽讨论。提出了根据它们的分布“分而治之”的策略,继而根据实验确定了与特征矩阵相关联的上下文窗口大小、窗口位置区分、权值估计等要素,并且针对数据稀疏问题,利用词的语义代码信息对特征矩阵进行了降维处理,取得了较好的效果。笔者相信,这个模型对组合型歧义切分字段的排歧具有一般性。相似文献

7.

无监督词义消歧研究 总被引：3，自引：0，他引：3

下载免费PDF全文

王瑞琴孔繁胜《软件学报》2009,20(8):2138-2152

研究的目的是对现有的无监督词义消歧技术进行总结,以期为进一步的研究指明方向.首先,介绍了无监督词义消歧研究的意义.然后,重点总结分析了国内外各类无监督词义消歧研究中的各项关键技术,包括使用的数据源、采用的消歧方法、评价体系以及达到的消歧效果等方面.最后,对14个较有特色的无监督词义消歧方法进行了总结,并指出无监督词义消歧的现有研究成果和可能的发展方向. 相似文献

8.

Natural language understanding and Montague grammar^{1, 2}

Per-Kristian Halvorsen 《Computational Intelligence》1986,2(1):54-62

This paper reevaluates some of the contributions of Montague grammar in view of the increasing importance of computational considerations in linguistic theory and the demand for linguistic theories that can provide support in the design of natural language systems. It also considers Montague grammar in relation to work on lexical semantics and semantic nets. In this perspective the techniques of Montague grammar for systematically linking syntactic form and a model-theoretic semantics emerge as the most significant feature of the theory, while a number of the specific semantic assumptions recede in importance. Yet, with different ways of thinking about the structure mapping between levels of linguistic form and interpretations (e.g., constraint systems), we can also implement this connection using different techniques from what Montague had at his disposal. 相似文献

9.

基于语义计算的中文歧义字段消歧算法

DENG Fan YU Bin 《微机发展》2008,18(6):107-110

针对中文中歧义字段对中文处理及理解带来的诸多问题提出了一种基于自然语言理解的中文汉字歧义消除算法。对于交集型歧义和组合型歧义,利用《知网》为主要语义资源,以知识图知识表示方法,通过提出的字段消歧算法,对歧义字段以及上下文的语义进行计算,从而选出正确的句子切分方案,达到消除歧义的目的。经过实验数据表明本算法提高了中文歧义字段歧义切分的正确率。相似文献

10.

一种改进的MM中文分词算法

石正喜张捍东赵黎明陈玉燕《计算机与网络》2009,35(2):48-50

对汉语的特点和分词概念作了简单介绍,详细说明了常用的分词算法,在此基础上,提出了一种改进的Ⅲ中文分词算法。该算法兼顾了最大正向匹配法（MM）和逆向最大匹配法（RMM）的优点,克服他们的不足,使得切分准确率和分词效率均有明显的提高,是一种比较实用的分词算法。实验也进一步证明,该算法能有效地提高切分准确率和分词效率。相似文献

11.

对中文自动分词机制的研究和改进

GUO Yi 《数字社区&智能家居》2008,(7)

本文研究了中文分词技术,改进了传统的整词二分分词机制,设计了一种按照词的字数分类组织的新的词典结构,该词典的更新和添加更加方便,并根据此词典结构提出了相应的快速分词算法。通过对比实验表明,与传统的整词二分、逐字二分和TRIE索引树分词方法相比,该分词方法分词速度更快。相似文献

12.

融合语义信息的矩阵分解词向量学习模型

陈培景丽萍《智能系统学报》2017,12(5):661-667

词向量在自然语言处理中起着重要的作用,近年来受到越来越多研究者的关注。然而,传统词向量学习方法往往依赖于大量未经标注的文本语料库,却忽略了单词的语义信息如单词间的语义关系。为了充分利用已有领域知识库（包含丰富的词语义信息）,文中提出一种融合语义信息的词向量学习方法（KbEMF）,该方法在矩阵分解学习词向量的模型上加入领域知识约束项,使得拥有强语义关系的词对获得的词向量相对近似。在实际数据上进行的单词类比推理任务和单词相似度量任务结果表明,KbEMF比已有模型具有明显的性能提升。相似文献

13.

An historical overview of natural language processing systems that learn

Robin Collier 《Artificial Intelligence Review》1994,8(1):17-54

A fundamental issue in natural language processing is the prerequisite of an enormous quantity of preprogrammed knowledge concerning both the language and the domain under examination. Manual acquisition of this knowledge is tedious and error prone. Development of an automated acquisition process would prove invaluable.This paper references and overviews a range of the systems that have been developed in the domain of machine learning and natural language processing. Each system is categorised into either a symbolic or connectionist paradigm, and has its own characteristics and limitations described. 相似文献

14.

基于BERT的短文本相似度判别模型

方子卿陈一飞《数字社区&智能家居》2021,(5):14-18

短文本的表示方法和特征提取方法是自然语言处理基础研究的一个重要方向,具有广泛的应用价值.本文提出了BERT_BLSTM_TCNN模型,该神经网络模型利用BERT的迁移学习,并在词向量编码阶段引入对抗训练方法,训练出包括句的语义和结构特征的且泛化性能更优的句特征,并将这些特征输入BLSTM_TCNN层中进行特征抽取以完成... 相似文献

15.

一种智能辅导系统中建模的实现方法

罗瑶 ;赵克 ;李亚涛 ;丁蛟腾《微机发展》2008,(9):46-49

针对自然语言中数学领域的特点,提出了一种智能辅导系统中建立模型的方法。该方法通过对语义理解结果的名词聚类分析,调用静态知识库中相应的内涵模型,根据静态知识库中对内涵模型的描述在语义理解结果中搜集相应的信息使抽象概念具体化。在此基础上依据领域知识对题目进行信息挖掘,并根据题目类型对信息进行约简,从而建立模型。这种建模方法在智能辅导系统中得到了较好的应用。相似文献

16.

基于AdaBoost.MH算法的汉语多义词消歧 总被引：1，自引：1，他引：0

刘风成黄德根姜鹏《中文信息学报》2006,20(3):8-15

本文提出一种基于AdaBoost MH算法的有指导的汉语多义词消歧方法,该方法利用AdaBoost MH算法对决策树产生的弱规则进行加强,经过若干次迭代后,最终得到一个准确度更高的分类规则;并给出了一种简单的终止算法中迭代的方法;为获取多义词上下文中的知识源,在采用传统的词性标注和局部搭配序列等知识源的基础上,引入了一种新的知识源,即语义范畴,提高了算法的学习效率和排歧的正确率。通过对6个典型多义词和SENSEVAL3中文语料中20个多义词的词义消歧实验,AdaBoost MH算法获得了较高的开放测试正确率(85.75%)。相似文献

17.

下载免费PDF全文

Zhang Zhao-Bo Zhong Zhi-Man Yuan Ping-Peng Jin Hai 《计算机科学技术学报》2023,38(1):196-210

Journal of Computer Science and Technology - Entity linking refers to linking a string in a text to corresponding entities in a knowledge base through candidate entity generation and candidate... 相似文献

18.

A Proposal for Implementing Multi-User Data Base (MUD) Technology in and Academic Library

《国际互联网参考资料服务季刊》2013,18(2):75-96

Multi-user Data base (MUD) technology, and its object-oriented descendant (MOO), is one of the most exciting tools to surface on the Internet, and offers libraries and librarians a unique opportunity to participate in creating user-friendly standardized interfaces to many of our most frequently used resources. These resources include those we access through the Internet already, using gopher, telnet, WWW, and FTP, and also those proprietary databases that we currently access through leased lines such as OCLC First Search, Prism, DIALOG, and many others. MOO technology is already being used successfully to create user-extensible collaborative professional environments for educators, astronomers, and computer network systems administrators. With the growing relevance of the Internet for libraries and other information professionals, it behooves the library community to engage those emerging technologies which will aloow us to interface most effectively with the Internet and its many resources, with the proprietary databases which we depend on, and with one another. 相似文献

19.

基于HNC理论的语义相关度计算方法 总被引：8，自引：0，他引：8

张运良张全《计算机工程与应用》2005,41(34):1-3,18

语义相关度计算对于语句的语义结构的分析有重要作用,同时也可以用于自动文本分类和信息检索的语义处理。该文以HNC理论的概念基元树表和词语的HNC符号映射方法为依据,提出并实现了语义相关度计算的方法。论文分析了这种方法的优势,并验证了它在语句语义结构分析中的作用。相似文献

20.

融合知识图谱和多模态的文本分类研究

景丽姚克《计算机工程与应用》2023,59(2):102-109

传统文本分类方法主要是基于单模态数据所驱动的经验主义统计学习方法,缺乏对数据的理解能力,鲁棒性较差,单个模态的模型输入也难以有效分析互联网中越来越丰富的多模态化数据。针对此问题提出两种提高分类能力的方法：引入多模态信息到模型输入,旨在弥补单模态信息的局限性;引入知识图谱实体信息到模型输入,旨在丰富文本的语义信息,提高模型的泛化能力。模型使用BERT提取文本特征,改进的ResNet提取图像特征,TransE提取文本实体特征,通过前期融合方式输入到BERT模型中进行分类,在研究多标签分类问题的MM-IMDB数据集上F1值达到66.5%,在情感分析数据集Twitter15&17上ACC值达到71.1%,结果均优于其他模型。实验结果表明,引入多模态信息和实体信息能够提高模型的文本分类能力。相似文献