首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 195 毫秒
1.
目前针对中医古籍实体识别研究较少,且大多使用有监督学习方法。但古籍数字化程度低、标注语料稀少,且其语言多为文言文,专业术语也不断发展,现有方法无法有效解决以上问题。故而,该文在构建了中医古籍语料库的基础上,通过对中医古籍中实体名的分析研究,提出了一种基于半监督学习和规则相结合的中医古籍实体识别方法。以条件随机场模型为基本框架,在引入词、词性、词典等有监督特征的同时也引入了通过词向量获得的无监督语义特征,对比不同特征组合的识别性能,确定最优的半监督学习模型,并与其他模型进行了对比。之后,结合古籍语言学特点构建规则库对其进行基于规则的后处理。实验结果中最终F值达到83.18%,证明了该方法的有效性。  相似文献   

2.
基于CPB (Chinese Proposition Bank)提出一种基于LSTM-Bi-LSTM的汉语自动语义角色标注方法,并提出语义密度聚类进行数据预处理以及"模糊"机制利用于词向量转换过程。语义密度聚类通过密度的概念对谓词进行全局统一的聚类,将稀疏谓词替换为其所属聚类集合中的常见谓词;利用语义距离概念,将"模糊"机制引入词向量的转换过程,能适当地减少词向量的语义性,并提升与谓词词向量的相关性。利用Bi-LSTM网络自动学习特征表达,然后利用CRF和IOBES标注策略转化为词序列标注问题,引进一种词性学习方法;利用LSTM网络学习生成的词性特征向量与"模糊化"后的词向量融合后一同作为模型的输入向量;训练过程中采用了小批量梯度下降算法和Dropout正则化,这既加快了训练速度,又易于得到全局最优解,还防止了参数过拟合情况的出现。多组对比实验表明,该方法标注结果的F值最高达到了81.24%。  相似文献   

3.
术语是由一个到多个单词按照某种语义角色组合而成的,传统的基于统计的相似度计算方法,将术语看作一个基本单元来进行计算,忽略了术语内部的语义角色,且对于上下文信息不丰富的术语,无法利用统计的方法取得理想的效果;基于语义资源的相似度计算方法,所涵盖的词语有限,因此不包含在语义资源中的术语便无法计算相似度。针对这些问题,该文针对专利提出了基于语义角色的术语相似度计算方法,该方法弥补了传统方法的不足。该文对术语内部的单词进行语义角色标注,通过共享最近邻方法计算单词的相似度,然后根据不同的语义角色,利用单词相似度来计算术语相似度。实验表明,该方法与传统方法相比,取得了较好的效果。  相似文献   

4.
语义知识库是自然语言处理任务的基础性资源,广泛应用于语义计算和语义推理等任务。现有的大规模语义知识库基本都是通用型知识库,缺乏特定领域的语义知识。为了弥补这种不足,该文基于HowNet的语义理论体系,提出了一种辅助构建航空术语语义知识库的方法。该方法根据航空术语的特点将辅助构建分成四个关键过程,构建了2 000条术语概念描述(DEF)。最后通过对人工标注的术语间相似度与根据术语DEF计算的术语间相似度结果的对比,验证了该构建方法的有效性。  相似文献   

5.
术语内部动态角色标注是航空领域HowNet构建的关键环节,其直接影响航空领域HowNet的规模与质量.针对动态角色种类多造成标注困难的问题,提出一种基于KN N的术语内部动态角色标注方法.通过对术语内部词语DEF项的分析进行样本预选择,并在最近邻样本选择阶段融合基于DEF的语义相似性及基于词向量的语境分布相似性.实验结...  相似文献   

6.
该文提出了一种基于深度学习框架的图像语义分割方法,通过使用由相对深度点对标注训练的网络模型,实现了基于彩色图像的深度图像预测,并将其与原彩色图像共同输入到包含带孔卷积的全卷积神经网络中。考虑到彩色图像与深度图像作为物体不同的属性表征,在特征图上用合并连接操 作而非传统的相加操作对其进行融合,为后续卷积层提供特征图输入时保持了两种表征的差异。在两个数据集上的实验结果表明,该法可以有效提升语义分割的性能。  相似文献   

7.
探讨基于Neo4j构建《伤寒论》知识图谱的方法.以中医古籍《伤寒论》为知识图谱构建的数据源.在《中医临床术语标准规范》等规范的指导下,采用人工知识抽取对中医药相关术语的提取、预处理以及标准化,并利用图数据库Neo4j对所构建的知识图谱进行存储.构建了包含639个中医实体以及2076条实体关系的基于Neo4j的《伤寒论》...  相似文献   

8.
知识资源的建设在语言信息处理中具有重要作用,中医基础理论知识库建设是进行中医文献处理和语义计算的基础工作。该文在分析中医基础理论术语特点的基础上,借鉴HowNet的构建思想,提出一种基于KDML的中医基础理论知识库构建方法。包括知识库构建过程中的义原选择方法及关系获取方法。  相似文献   

9.
由于传统的基于内容图像检索存在的语义鸿沟问题,其在某些特定的领域无法满足用户的需求。图像语义自动标注的出现能够有效地解决这方面的问题。该文提出了先使用Normalized Cuts方法对图像进行区域分割并提取出每个区域的低层视觉特征,再利用BP神经网络算法来学习图像区域和标注字的对应关系来进行图像语义的自动标注的方法,实验结果证明了此方法的有效性和准确性。  相似文献   

10.
基于语义组块分析的汉语语义角色标注   总被引:1,自引:1,他引:0  
近些年来,中文语义角色标注得到了大家的关注,不过大多是传统的基于句法树的系统,即对句法树上的节点进行语义角色识别和分类。该文提出了一种与传统方法不同的处理策略,我们称之为基于语义组块分析的语义角色标注。在新的方法中,语义角色标注的流程不再是传统的“句法分析——语义角色识别——语义角色分类”,而是一种简化的“语义组块识别——语义组块分类”流程。这一方法将汉语语义角色标注从一个节点的分类问题转化为序列标注问题,我们使用了条件随机域这一模型,取得了较好的结果。同时由于避开了句法分析这个阶段,使得语义角色标注摆脱了对句法分析的依赖,从而突破了汉语语法分析器的时间和性能限制。通过实验我们可以看出,新的方法可以取得较高的准确率,并且大大节省了分析的时间。通过对比,我们可以发现在自动切分和词性标注上的结果与在完全正确的切分和词性标注上的结果相比,还有较大差距。  相似文献   

11.
Decision trees (DTs) represent one of the most important and popular solutions to the problem of classification. They have been shown to have excellent performance in the field of data mining and machine learning. However, the problem of DTs is that they are built using data instances assigned to crisp classes. In this paper, we generalize decision trees so that they can take into account weighted classes assigned to the training data instances. Moreover, we propose a novel method for discovering weights for the training instances. Our method is based on emerging patterns (EPs). EPs are those itemsets whose supports (probabilities) in one class are significantly higher than their supports (probabilities) in the other classes. Our experimental evaluation shows that the new proposed method has good performance and excellent noise tolerance.  相似文献   

12.
The well known laboratory quadruple-tank process (QTP) has been introduced in the laboratories of many schools around the world as it is ideally suited to illustrate concepts in multivariable control. In this paper the QTP is extended to include independent multivariable dead times (DTs) and their effects on the properties and control of the QTP are studied. DTs are very common in many various processes and make the control of the QTP more interesting and challenging. The addition of DTs may introduce infinite, finite or not any number of non-minimum-phase (NMP) zeros. As shown in the paper it depends on a particular combination of the multivariable DTs. The conditions for each case are stated and the location and behavior of the zeros closest to the imaginary axis due to the DTs are specified. Other properties of the QTP with DTs as the output real NMP zero directions, the decentralized integral controllability of the process and time-domain bounds on closed loop performance are derived and discussed. Also, a novel laboratory QTP with DTs is described and used to demonstrate the main results.  相似文献   

13.
In recent years, the concept of digital twin (DT) is attracting more and more attention from researchers and engineers. But there is still no consensus on what a right DT is. On one hand, some common models are renamed as DTs. On the other hand, some DTs extremely pursue ‘the same’ as physical objects, which bring unnecessary complexities to them. In this paper, we try to answer two questions from the point of view of model engineering: how to define a right digital twin, and how to build a right digital twin. The concept and related technologies of model engineering are introduced. Some basic principles and a set of metrics for a right DT are given. An evolutionary concurrent modeling method for DT (ECoM4DT) is proposed not only inheriting the theory from classic M&S methods but also highlighting the characteristics of DT compared with traditional models to systemically guide the DT modeling process.  相似文献   

14.
15.
一种连续条件属性值的决策表的归纳学习方法   总被引:1,自引:0,他引:1  
对由连续条件属性值和离散决策属性值组成的决策表,提出了一种归纳学习方法。把决策表中的连续条件属性值看作一矩阵,进行矩阵的奇异值分解,以确定决策表条件属性的数目。用模糊C均值聚类的方法对连续条件属性值进行不同聚类数目的聚类,得到不同聚类数目下的离散决策表,对这些决策表进行条件属性简化,从而得到不同的条件属性数目。比较矩阵奇异值分解后决策表条件属性的数目和上述不同聚类数目下的离散决策表简化后的条件属性的数目,并考虑决策属性的数目,确定最终的聚类数目。在此基础上,给出了由连续条件属性值和离散决策属性值组成的决策表的归纳学习方法,并验证了其有效性。  相似文献   

16.
Machine learning offers the potential for effective and efficient classification of remotely sensed imagery. The strengths of machine learning include the capacity to handle data of high dimensionality and to map classes with very complex characteristics. Nevertheless, implementing a machine-learning classification is not straightforward, and the literature provides conflicting advice regarding many key issues. This article therefore provides an overview of machine learning from an applied perspective. We focus on the relatively mature methods of support vector machines, single decision trees (DTs), Random Forests, boosted DTs, artificial neural networks, and k-nearest neighbours (k-NN). Issues considered include the choice of algorithm, training data requirements, user-defined parameter selection and optimization, feature space impacts and reduction, and computational costs. We illustrate these issues through applying machine-learning classification to two publically available remotely sensed data sets.  相似文献   

17.
Field Association (FA) Terms—words or phrases that serve to identify document fields are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract and select relevant FA Terms to build a comprehensive dictionary of FA Terms. This paper presents a new method to extract, select and rank FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules, corpora comparison and modified tf-idf weighting. Experimental evaluation on 21 fields using 306 MB of domain-specific corpora obtained from English Wikipedia dumps selected up to 2,517 FA Terms (single and compound) per field at precision and recall of 74–97 and 65–98. This is better than the traditional methods. The FA Terms dictionary constructed using this method achieved an average accuracy of 97.6% in identifying the fields of 10,077 test documents collected from Wikipedia, Reuters RCV1 corpus and 20 Newsgroup data set.  相似文献   

18.
19.
20.
In this paper, we proposed an improved two-level dynamic Bayesian network layered time series model (LTSM), which aims to solve the limitations hindering the application of available dynamic Bayesian networks, the hidden Markov model (HMM) and the dynamic texture (DT) model to gait recognition. In the first level, a gait silhouette or feature cycle is divided into several temporally adjacent clusters. Each cluster is modeled by a DT or logistic DT (LDT). In the second level, HMM is built to describe the relationship among the DTs/LDTs. Besides LTSM, LDT is also an improved dynamic Bayesian network presented in this paper to describe the binary image sequence, which introduces the logistic principle component analysis (PCA) to learning its parameters. We demonstrated the validity of LTSM with experiments on both the CMU Mobo gait database and CASIA gait database (dataset B), and that of LDT on the CMU Mobo gait database. Experimental results showed the superiority of the improved dynamic Bayesian networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号