首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
一种基于LDA的社区问答问句相似度计算方法   总被引:2,自引:0,他引:2  
传统的问答系统(QA)只是直接返回问题的答案,而且没有用户交互特性,而基于社区的问答系统(CQA),含有大量的“问答对”可以利用。该文提出了一种基于LDA的匹配框架来解决相似问句的匹配问题,分别从问句的统计信息、语义信息和主题信息三个方面来计算问句相似度,综合得到整体相似度。实验是在Yahoo! Answers上抽取的真实标注数据集上进行,最终的实验结果表明,该文的方法达到了很好的性能。  相似文献   

2.
来自异构数据源的语义数据集之间关联的缺失严重影响了数据网的构建和发展。语义数据集中,实例数据之间共指关系的发现和构建能够丰富数据集之间的关联,从而有助于在数据集之间进行推理和查询。在基于相似度分析的共指关系构建的过程中,实例属性的权重及属性值的相似度对实例相似度具有重要作用。提出一种新的基于数据集统计信息计算属性权重的模型,并从概率统计的角度证明其合理性。同时分析了这种权重计算模型相对于传统的权重计算方法的优势。基于新的权重计方法,实现了共指关系构建系统,并利用开放的语义数据集验证了其正确性。  相似文献   

3.
该文提出基于Word Embedding的歧义词多个义项语义表示方法,实现基于知识库的无监督字母缩略术语消歧。方法分两步聚类,首先采用显著相似聚类获得高置信度类簇,构造带有语义标签的文档集作为训练数据。利用该数据训练多份Word Embedding模型,以余弦相似度均值表示两个词之间的语义关系。在第二步聚类时,提出使用特征词扩展和语义线性加权来提高歧义分辨能力,提高消歧性能。该方法根据语义相似度扩展待消歧文档的特征词集合,挖掘聚类文档中缺失的语义信息,并使用语义相似度对特征词权重进行线性加权。针对25个多义缩略术语的消歧实验显示,特征词扩展使系统F值提高约4%,使用语义线性加权后F值再提高约2%,达到89.40%。  相似文献   

4.
将传统的文本相似度量方法直接移植到短文本时,由于短文本内容简短的特性会导致数据稀疏而造成计算结果出现偏差。该文通过使用复杂网络表征短文本,提出了一种新的短文本相似度量方法。该方法首先对短文本进行预处理,然后对短文本建立复杂网络模型,计算短文本词语的复杂网络特征值,再借助外部工具计算短文本词语之间的语义相似度,然后结合短文本语义相似度定义计算短文本之间的相似度。最后在基准数据集上进行聚类实验,验证本文提出的短文本相似度计算方法在基于F-度量值标准上,优于传统的TF-IDF方法和另一种基于词项语义相似度的计算方法。  相似文献   

5.
跨语言句子语义相似度计算旨在计算不同语言句子之间的语义相似程度。近年来,前人提出了基于神经网络的跨语言句子语义相似度模型,这些模型多数使用卷积神经网络来捕获文本的局部语义信息,缺少对句子中远距离单词之间语义相关信息的获取。该文提出一种融合门控卷积神经网络和自注意力机制的神经网络结构,用于获取跨语言文本句子中的局部和全局语义相关关系,从而得到文本的综合语义表示。在SemEval-2017多个数据集上的实验结果表明,该文提出的模型能够从多个方面捕捉句子间的语义相似性,结果优于基准方法中基于纯神经网络的模型方法。  相似文献   

6.
实体链接是指将文本中具有歧义的实体指称项链接到知识库中相应实体的过程。该文首先对实体链接系统进行了分析,指出实体链接系统中的核心问题—实体指称项文本与候选实体之间的语义相似度计算。接着提出了一种基于图模型的维基概念相似度计算方法,并将该相似度计算方法应用在实体指称项文本与候选实体语义相似度的计算中。在此基础上,设计了一个基于排序学习算法框架的实体链接系统。实验结果表明,相比于传统的计算方法,新的相似度计算方法可以更加有效地捕捉实体指称项文本与候选实体间的语义相似度。同时,融入了多种特征的实体链接系统在性能上获得了达到state-of-art的水平。  相似文献   

7.
基于本体的概念语义相似度度量   总被引:4,自引:2,他引:2  
针对概念语义相似度度量问题,提出结合基于图理论和信息量2种方法的语义相似度度量算法。计算2个概念在概念图中连接的路径长度、局部密度以及在连接2个概念之间的路径上连接关系的连接力度,结合连结路径权重和信息量来度量概念之间的语义相似度。实验结果表明,该算法能取得较好的度量效果。  相似文献   

8.
篇章分析是自然语言处理领域的一个重要任务。分析篇章主次关系有助于理解篇章的结构和语义,并为自然语言处理的应用提供有力的支持。该文在微观篇章主次关系识别研究的基础上,重点研究宏观篇章主次关系,提出了一种基于word2vec和LDA的主题相似度的宏观篇章主次关系识别模型。基于word2vec的主题相似度和基于LDA的主题相似度在不同维度上计算语义相似度,两者在语义层面形成互补,因而增强了模型识别宏观篇章主次关系的能力。该模型在宏观汉语篇章树库(MCDTB)上实验的F1值达到79.9%,正确率达到81.82%,相较基准系统分别提升了1.7%和1.81%。  相似文献   

9.
复述识别任务,即判断两个句子是否表达相同的语义。传统的复述识别任务针对的是通用领域,模型通过理解两个句子的语义,比较句子的语义相似度从而进行复述判断。而在特定领域的复述识别任务中,模型必须结合该领域的专业知识,才能准确地理解两个句子的语义,并进一步判断出它们的区别与联系。该文针对特定领域提出了一种基于领域知识融合的复述识别方法。方法首先为句子检索专业知识,再将专业知识融入到每个句子的语义中,最后实现更准确的语义相似度判断。该文在计算机科学领域的复述识别数据集PARADE上进行了相关实验,实验结果显示,该文方法在F1指标上达到了73.9,比基线方法提升了3.1。  相似文献   

10.
作文跑题检测任务的核心问题是文本相似度计算。传统的文本相似度计算方法一般基于向量空间模型,即把文本表示成高维向量,再计算文本之间的相似度。这种方法只考虑文本中出现的词项(词袋模型),而没有利用词项的语义信息。该文提出一种新的文本相似度计算方法:基于词扩展的文本相似度计算方法,将词袋模型(Bag-of-Words)方法与词的分布式表示相结合,在词的分布式表示向量空间中寻找与文本出现的词项语义上相似的词加入到文本表示中,实现文本中单词的扩展。然后对扩展后的文本计算相似度。该文将这种方法运用到英文作文的跑题检测中,构建一套跑题检测系统,并在一个真实数据中进行测试。实验结果表明该文的跑题检测系统能有效识别跑题作文,性能明显高于基准系统。
  相似文献   

11.
Many researchers have focused on the fuzzy shortest path problem in a network with non-deterministic information due to its importance to various applications. The goal of this paper is to select the shortest path in multi-constrained network using multi-criteria decision method based on vague similarity measure. In our approach, each arc length represents multiple metrics. The multi-constraints are equivalent to the concept of multi-criteria based on vague sets. We propose a similarity measure of vague sets in which the positive constraints and the negative constraints are defined. Furthermore, the procedures are developed to obtain the “best” and “worst” ideal paths. We evaluate similarity degrees between all candidate paths and two ideal paths with the proposed similarity measure. Through comparing the relative degrees of paths, it is shown that the path with the largest relative degree is the shortest path. Finally, we conduct two sets of numerical experiments—using Matlab to verify the feasibility and correctness of the proposed algorithm and developing a routing decision simulation system (RDSS) to demonstrate that the proposed approach is reasonable and effective.  相似文献   

12.
于东  刘春花  田悦 《计算机应用》2016,36(2):455-459
针对从非结构化文本中抽取指定人物职衔履历属性问题,提出一种基于远距离监督和模式匹配的属性抽取方法。该方法从字符串模式和依存模式两个层面描述人物职衔履历特征,将问题分为两阶段。首先利用远距离监督知识和人工标注知识,挖掘具有高覆盖度的模式库,用于发现职衔履历属性和抽取候选集;其次利用职衔机构等属性间的文字接续关系,以及特定人物与候选属性的依存关系,设计候选集的过滤规则对候选项进行筛选,实现高准确度的属性抽取。实验结果显示,所提方法在CLP2014-PAE测试集上的F值达到55.37%,显著高于评测最好成绩(F值34.38%)和基于条件随机场(CRF)的有监督序列标注方法(F值43.79%),表明该方法能高覆盖度挖掘并抽取非结构化文档中的职衔履历属性。  相似文献   

13.
In keyword spotting from handwritten documents by text query, the word similarity is usually computed by combining character similarities, which are desired to approximate the logarithm of the character probabilities. In this paper, we propose to directly estimate the posterior probability (also called confidence) of candidate characters based on the N-best paths from the candidate segmentation-recognition lattice. On evaluating the candidate segmentation-recognition paths by combining multiple contexts, the scores of the N-best paths are transformed to posterior probabilities using soft-max. The parameter of soft-max (confidence parameter) is estimated from the character confusion network, which is constructed by aligning different paths using a string matching algorithm. The posterior probability of a candidate character is the summation of the probabilities of the paths that pass through the candidate character. We compare the proposed posterior probability estimation method with some reference methods including the word confidence measure and the text line recognition method. Experimental results of keyword spotting on a large database CASIA-OLHWDB of unconstrained online Chinese handwriting demonstrate the effectiveness of the proposed method.  相似文献   

14.
Spatial data mining algorithms heavily depend on the efficient processing of neighborhood relations since the neighbors of many objects have to be investigated in a single run of a typical algorithm. Therefore, providing general concepts for neighborhood relations as well as an efficient implementation of these concepts will allow a tight integration of spatial data mining algorithms with a spatial database management system. This will speed up both, the development and the execution of spatial data mining algorithms. In this paper, we define neighborhood graphs and paths and a small set of database primitives for their manipulation. We show that typical spatial data mining algorithms are well supported by the proposed basic operations. For finding significant spatial patterns, only certain classes of paths “leading away” from a starting object are relevant. We discuss filters allowing only such neighborhood paths which will significantly reduce the search space for spatial data mining algorithms. Furthermore, we introduce neighborhood indices to speed up the processing of our database primitives. We implemented the database primitives on top of a commercial spatial database management system. The effectiveness and efficiency of the proposed approach was evaluated by using an analytical cost model and an extensive experimental study on a geographic database.  相似文献   

15.
传统特征提取方法大多基于嵌入表达,常忽略了路径语义;基于关系路径的推理方法多考虑单一路径,性能仍有提升空间。为进一步提升知识推理能力,使用自定义的卷积神经网络框架编码随机游走生成的多条路径,利用双向长短期记忆网络的隐藏状态合并向量序列,结合注意力机制实现差异化的多路径语义信息集成,计算候选关系与实体对的概率得分,用于判断三元组是否成立。NELL995和FB15k-237数据集上的链路预测结果证明方案可行,F1等指标相比主流模型也有一定优势;进一步在大型数据集和稀疏数据集上验证方案可行。  相似文献   

16.
One of the key elements of the Semantic Web technologies is domain ontologies and those ontologies are important constructs for multi-agent system. The Semantic Web relies on domain ontologies that structure underlying data enabling comprehensive and transportable machine understanding. It takes so much time and efforts to construct domain ontologies because these ontologies can be manually made by domain experts and knowledge engineers. To solve these problems, there have been many researches to semi-automatically construct ontologies. Most of the researches focused on relation extraction part but manually selected terms for ontologies. These researches have some problems. In this paper, we propose a hybrid method to extract relations from domain documents which combines a named relation approach and an unnamed relation approach. Our named relation approach is based on the Hearst’s pattern and the Snowball system. We merge a generalized pattern scheme into their methods. In our unnamed relation approach, we extract unnamed relations using association rules and clustering method. Moreover, we recommend candidate relation names of unnamed relations. We evaluate our proposed method by using Ziff document set offered by TREC.  相似文献   

17.
针对实体链接中候选集构建问题提出了一种多策略结合的候选集构建算法。综合利用多种策略提取上下文中的完整指称,降低候选实体数量,同时提高正确实体的召回率,构建一个高质量的实体候选集。在TAC2014英文语料上使用本文提出的多种策略进行了实验和分析,确定最优候选集构建策略的同时,也证明了本文方法确实能够达到提升候选集召回率和准确率的目的。进一步验证了候选集质量对完整的实体链接系统的性能影响明显。相比基准算法,使用最优候选集构建策略提取的候选集能使整体的实体链接系统的性能提升3.7%。  相似文献   

18.
应用第一原理的故障诊断思想,基于故障测试矩阵,提出了一种根据系统每一个故障都可检测的行为与系统所有故障不能检测的行为之间的不相容推理技术,来判断和求取系统存在的所有最小完全测试集的方法。方法分两步:一是根据系统的结构和测试矢量等知识,结合故障测试矩阵,识别冲突集候选;二是根据冲突集候选,确定最小命中集合组,生成最小完全测试集。该方法可有效求解最小完全测试集,减少测试矢量施加的工作量,提高故障诊断的效率。  相似文献   

19.
针对手写汉字笔画提取的重点和难点--模糊区域的识别和解析问题,提出了一种新的基于模糊区域检测的笔画提取算法.该算法首先利用细化算法提取的fork候选点和fork候选点附近的轮廓信息来检测模糊区域;然后利用图模型来对子笔画和模糊区域进行建模,同时通过构造贝叶斯分类器来分析子笔画对的连续性,并通过路径搜索来得到子笔画序列;最后通过进行B样条插值来提取细化后的笔画.对比实验结果表明,该算法不仅能够有效地用于模糊区域检测和笔画提取,而且能够避免细化结果在模糊区域内的形状畸变.  相似文献   

20.
Path testing is the strongest coverage criterion in white box testing. Finding target paths is a key challenge in path testing. Genetic algorithms have been successfully used in many software testing activities such as generating test data, selecting test cases and test cases prioritization. In this paper, we introduce a new genetic algorithm for generating test paths. In this algorithm the length of the chromosome varies from iteration to another according to the change in the length of the path. Based on the proposed algorithm, we present a new technique for automatically generating a set of basis test paths which can be used as testing paths in any path testing method. The proposed technique uses a method to verify the independency of the generated paths to be included in the basis set of paths. In addition, this technique employs a method for checking the feasibility of the generated paths. We introduce new definitions for the key concepts of genetic algorithm such as chromosome representation, crossover, mutation, and fitness function to be compatible with path generation. In addition, we present a case study to show the efficiency of our technique. We conducted a set of experiments to evaluate the effectiveness of the proposed path generation technique. The results showed that the proposed technique causes substantial reduction in path generation effort, and that the proposed GA algorithm is effective in test path generation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号