首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 859 毫秒
1.
Key concept extraction is a major step for ontology learning that aims to build an ontology by identifying relevant domain concepts and their semantic relationships from a text corpus. The success of ontology development using key concept extraction strongly relies on the degree of relevance of the key concepts identified. If the identified key concepts are not closely relevant to the domain, the constructed ontology will not be able to correctly and fully represent the domain knowledge. In this paper, we propose a novel method, named CFinder, for key concept extraction. Given a text corpus in the target domain, CFinder first extracts noun phrases using their linguistic patterns based on Part-Of-Speech (POS) tags as candidates for key concepts. To calculate the weights (or importance) of these candidates within the domain, CFinder combines their statistical knowledge and domain-specific knowledge indicating their relative importance within the domain. The calculated weights are further enhanced by considering an inner structural pattern of the candidates. The effectiveness of CFinder is evaluated with a recently developed ontology for the domain of ‘emergency management for mass gatherings’ against the state-of-the-art methods for key concept extraction including—Text2Onto, KP-Miner and Moki. The comparative evaluation results show that CFinder statistically significantly outperforms all the three methods in terms of F-measure and average precision.  相似文献   

2.
Accurate and fast approaches for automatic ECG data classification are vital for clinical diagnosis of heart disease. To this end, we propose a novel multistage algorithm that combines various procedures for dimensionality reduction, consensus clustering of randomized samples and fast supervised classification algorithms for processing of the highly dimensional large ECG datasets. We carried out extensive experiments to study the effectiveness of the proposed multistage clustering and classification scheme using precision, recall and F-measure metrics. We evaluated the performance of numerous combinations of various methods for dimensionality reduction, consensus functions and classification algorithms incorporated in our multistage scheme. The results of the experiments demonstrate that the highest precision, recall and F-measure are achieved by the combination of the rank correlation coefficient for dimensionality reduction, HBGF consensus function and the SMO classifier with the polynomial kernel.  相似文献   

3.
In this paper, the concept of finding an appropriate classifier ensemble for named entity recognition is posed as a multiobjective optimization (MOO) problem. Our underlying assumption is that instead of searching for the best-fitting feature set for a particular classifier, ensembling of several classifiers those are trained using different feature representations could be a more fruitful approach, but it is crucial to determine the appropriate subset of classifiers that are most suitable for the ensemble. We use three heterogenous classifiers namely maximum entropy, conditional random field, and support vector machine in order to build a number of models depending upon the various representations of the available features. The proposed MOO-based ensemble technique is evaluated for three resource-constrained languages, namely Bengali, Hindi, and Telugu. Evaluation results yield the recall, precision, and F-measure values of 92.21, 92.72, and 92.46%, respectively, for Bengali; 97.07, 89.63, and 93.20%, respectively, for Hindi; and 80.79, 93.18, and 86.54%, respectively, for Telugu. We also evaluate our proposed technique with the CoNLL-2003 shared task English data sets that yield the recall, precision, and F-measure values of 89.72, 89.84, and 89.78%, respectively. Experimental results show that the classifier ensemble identified by our proposed MOO-based approach outperforms all the individual classifiers, two different conventional baseline ensembles, and the classifier ensemble identified by a single objective?Cbased approach. In a part of the paper, we formulate the problem of feature selection in any classifier under the MOO framework and show that our proposed classifier ensemble attains superior performance to it.  相似文献   

4.
One of the goals of the knowledge puzzle project is to automatically generate a domain ontology from plain text documents and use this ontology as the domain model in computer-based education. This paper describes the generation procedure followed by TEXCOMON, the knowledge puzzle ontology learning tool, to extract concept maps from texts. It also explains how these concept maps are exported into a domain ontology. Data sources and techniques deployed by TEXCOMON for ontology learning from texts are briefly described herein. Then, the paper focuses on evaluating the generated domain ontology and advocates the use of a three-dimensional evaluation: structural, semantic, and comparative. Based on a set of metrics, structural evaluations consider ontologies as graphs. Semantic evaluations rely on human expert judgment, and finally, comparative evaluations are based on comparisons between the outputs of state-of-the-art tools and those of new tools such as TEXCOMON, using the very same set of documents in order to highlight the improvements of new techniques. Comparative evaluations performed in this study use the same corpus to contrast results from TEXCOMON with those of one of the most advanced tools for ontology generation from text. Results generated by such experiments show that TEXCOMON yields superior performance, especially regarding conceptual relation learning.  相似文献   

5.
马超 《计算机系统应用》2015,24(12):273-276
领域本体是对领域概念及其关系的一种高效合理的展现形式.在构建领域本体过程中,常常遇到的问题就是尽管本体概念完备但概念间关系复杂多样导致人工标记关系代价过高.使用无监督学习的关系抽取算法对包含丰富的领域概念的web信息进行抽取解决了这一问题.然而,传统的无监督学习的算法没有考虑到"单样例多概念对"的问题,导致最终抽取的概念关系不完整.本文利用交通领域的Web信息构建本体,将样例概念关系对权重引入传统的无监督学习方法Kmeans中,解决了此项问题并通过实验证明该算法取得了良好的效果.  相似文献   

6.
This paper proposes an ontology learning method which is used to generate a graphical ontology structure called ontology graph. The ontology graph defines the ontology and knowledge conceptualization model, and the ontology learning process defines the method of semiautomatic learning and generates ontology graphs from Chinese texts of different domains, the so-called domain ontology graph (DOG). Meanwhile, we also define two other ontological operations—document ontology graph generation and ontology graph-based text classification, which can be carried out with the generated DOG. This research focuses on Chinese text data, and furthermore, we conduct two experiments: the DOG generation and ontology graph-based text classification, with Chinese texts as the experimental data. The first experiment generates ten DOGs as the ontology graph instances to represent ten different domains of knowledge. The generated DOGs are then further used for the second experiment to provide performance evaluation. The ontology graph-based approach is able to achieve high text classification accuracy (with 92.3 % in f-measure) over other text classification approaches (such as 86.8 % in f-measure for tf–idf approach). The better performance in the comparative experiments reveals that the proposed ontology graph knowledge model, the ontology learning and generation process, and the ontological operations are feasible and effective.  相似文献   

7.
The task of reviewing scientific publications and keeping up with the literature in a particular domain is extremely time-consuming. Extraction and exploration of methodological information, in particular, requires systematic understanding of the literature, but in many cases is performed within a limited context of publications that can be manually reviewed by an individual or group. Automated methodology identification could provide an opportunity for systematic retrieval of relevant documents and for exploring developments within a given discipline. In this paper we present a system for the identification of methodology mentions in scientific publications in the area of natural language processing, and in particular in automatic terminology recognition. The system comprises two major layers: the first layer is an automatic identification of methodological sentences; the second layer highlights methodological phrases (segments). Each mention is categorised in four semantic categories: Task, Method, Resource/Feature and Implementation. Extraction and classification of the segments is formalised as a sequence tagging problem and four separate phrase-based Conditional Random Fields are used to accomplish the task. The system has been evaluated on a manually annotated corpus comprising 45 full text articles. The results for the segment level annotation show an F-measure of 53% for identification of Task and Method mentions (with 70% precision), whereas the F-measures for Resource/Feature and Implementation identification were 61% (with 67% precision) and 75% (with 86% precision) respectively. At the document-level, an F-measure of 72% (with 81% precision) for Task mentions, 60% (with 81% precision) for Method mentions, 74% (with 78% precision) for the Resource/Feature and 79% (with 81% precision) for the Implementation categories have been achieved. We provide a detailed analysis of errors and explore the impact that the particular groups of features have on the extraction of methodological segments.  相似文献   

8.
领域概念非分类关系的获取是本体学习的一项重要任务,提出了一种基于非监督学习的非分类关系自动获取方法。该方法首先通过关联规则获取特定领域概念对,然后将概念对之间的高频动词作为候选的非分类关系标签,接着利用VF*ICF度量法来确定非分类关系标签,最后通过对数似然比评估方法将得到的非分类关系标签分配给对应的领域概念对。实验结果表明该方法可以有效提高非分类关系抽取的准确率和召回率。  相似文献   

9.
This paper develops techniques to extract conceptual graphs from a patent claim using syntactic information (POS, and dependency tree) and semantic information (background ontology). Due to plenteous technical domain terms and lengthy sentences prevailing in patent claims, it is difficult to apply a NLP Parser directly to parse the plain texts in the patent claim. This paper combines techniques such as finite state machines, Part-Of-Speech tags, conceptual graphs, domain ontology and dependency tree to convert a patent claim into a formally defined conceptual graph. The method of a finite state machine splits a lengthy patent claim sentence into a set of shortened sub-sentences so that the NLP Parser can parse them one by one effectively. The Part-Of-Speech and dependency tree of a patent claim are used to build the conceptual graph based on the pre-established domain ontology. The result shows that 99% sub-sentences split from 1700 patent claims can be efficiently parsed by the NLP Parser. There are two types of nodes in a conceptual graph, the concept and the relation nodes. Each concept or relation can be extracted directly from a patent claim and each relation can link with a fixed number of concepts in a conceptual graph. From 100 patent claims, the average precision and recall of a concept class mapping from the patent claim to domain ontology are 96% and 89%, respectively, and the average precision and recall for Real relation class mapping are 97% and 98%, respectively. For the concept linking of a relation, the average precision is 79%. Based on the extracted conceptual graphs from patents, it would facilitate automated comparison and summarization among patents for judgment of patent infringement.  相似文献   

10.
本体作为一种概念模型建模工具,被应用到计算机的各个领域,用来信息组织和知识管理。本体扩展是一种将新概念以及概念间的关系添加到已有本体的合适位置,以扩大本体为目的的方法。提出一种基于词间语义关联性从文本中扩展本体的方法,该方法主要利用共现分析、词过滤技术和词间语义关联性从文本中发现潜在的概念,作为待扩展概念,并使用扩展规则、包含分析等关系识别技术将概念添加到已有本体中。以电子政务领域的教育子领域为例,使用该方法扩展了一个教育领域的领域本体,结果表明该方法扩展的本体比较合理,具备较强的应用能力。  相似文献   

11.
In this paper, we propose a simulated annealing (SA) based multiobjective optimization (MOO) approach for classifier ensemble. Several different versions of the objective functions are exploited. We hypothesize that the reliability of prediction of each classifier differs among the various output classes. Thus, in an ensemble system, it is necessary to find out the appropriate weight of vote for each output class in each classifier. Diverse classification methods such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) are used to build different models depending upon the various representations of the available features. One most important characteristics of our system is that the features are selected and developed mostly without using any deep domain knowledge and/or language dependent resources. The proposed technique is evaluated for Named Entity Recognition (NER) in three resource-poor Indian languages, namely Bengali, Hindi and Telugu. Evaluation results yield the recall, precision and F-measure values of 93.95%, 95.15% and 94.55%, respectively for Bengali, 93.35%, 92.25% and 92.80%, respectively for Hindi and 84.02%, 96.56% and 89.85%, respectively for Telugu. Experiments also suggest that the classifier ensemble identified by the proposed MOO based approach optimizing the F-measure values of named entity (NE) boundary detection outperforms all the individual models, two conventional baseline models and three other MOO based ensembles.  相似文献   

12.
Validation of overlapping clustering: A random clustering perspective   总被引:1,自引:0,他引:1  
As a widely used clustering validation measure, the F-measure has received increased attention in the field of information retrieval. In this paper, we reveal that the F-measure can lead to biased views as to results of overlapped clusters when it is used for validating the data with different cluster numbers (incremental effect) or different prior probabilities of relevant documents (prior-probability effect). We propose a new “IMplication Intensity” (IMI) measure which is based on the F-measure and is developed from a random clustering perspective. In addition, we carefully investigate the properties of IMI. Finally, experimental results on real-world data sets show that IMI significantly alleviates biased incremental and prior-probability effects which are inherent to the F-measure.  相似文献   

13.
《Ergonomics》2012,55(7):838-858
Ontologies, as a possible element of organizational memory information systems, appear to support organizational learning. Ontology tools can be used to share knowledge among the members of an organization. However, current ontology-viewing user interfaces of ontology tools do not fully support organizational learning, because most of them lack proper history representation in their display. In this study, a conceptual model was developed that emphasized the role of ontology in the organizational learning cycle and explored the integration of history representation in the ontology display. Based on the experimental results from a split-plot design with 30 participants, two conclusions were derived: first, appropriately selected history representations in the ontology display help users to identify changes in the ontologies; and second, compatibility between types of ontology display and history representation is more important than ontology display and history representation in themselves.  相似文献   

14.
Hwang W  Salvendy G 《Ergonomics》2005,48(7):838-858
Ontologies, as a possible element of organizational memory information systems, appear to support organizational learning. Ontology tools can be used to share knowledge among the members of an organization. However, current ontology-viewing user interfaces of ontology tools do not fully support organizational learning, because most of them lack proper history representation in their display. In this study, a conceptual model was developed that emphasized the role of ontology in the organizational learning cycle and explored the integration of history representation in the ontology display. Based on the experimental results from a split-plot design with 30 participants, two conclusions were derived: first, appropriately selected history representations in the ontology display help users to identify changes in the ontologies; and second, compatibility between types of ontology display and history representation is more important than ontology display and history representation in themselves.  相似文献   

15.
The Ontological Adaptive Service-Sharing Integration System (OASIS) facilitates reverse engineering tool interoperability by sharing services among tools that represent software in a conceptually equivalent manner. OASIS uses a domain ontology to record the representational and service-related concepts each tool offers. Specialized adapters use a filtering process to map factbase instances to domain ontology concepts and apply shared services. This paper examines three issues related to the filtering process: representational correspondence, loss of precision and information dilution.  相似文献   

16.
基于上下文的领域本体概念和关系的提取*   总被引:5,自引:1,他引:4  
目前本体学习的研究重点在于概念及关系的提取,概念提取领域一致度与领域相关度相结合的方法取得了比较好的效果,而关系提取则主要采用基于关联规则的方法。这种本体概念、关系学习方法由于只考虑词频,提取结果准确性欠缺。针对这种缺陷,在统计的基础上考虑了语义因素,利用词汇上下文计算概念的语义相似度并将其应用到概念与关系提取中。实验结果表明,词汇上下文与传统统计相结合的方法能够有效改进概念和关系提取的准确度。  相似文献   

17.
The capability of humans in distinguishing salient objects from background is at par excellence. The researchers are yet to develop a model that matches the detection accuracy as well as computation time taken by the humans. In this paper we attempted to improve the detection accuracy without capitalizing much of computation time. The model utilizes the fact that maximal amount of information is present at the corners and edges of an object in the image. Firstly the keypoints are extracted from the image by using multi-scale Harris and multi-scale Gabor functions. Then the image is roughly segmented into two regions: a salient region and a background region, by constructing a convex hull over these keypoints. Finally the pixels of the two regions are considered as samples to be drawn from a multivariate kernel function whose parameters are estimated using expectation maximization algorithm, to yield a saliency map. The performance of the proposed model is evaluated in terms of precision, recall, F-measure, area under curve and computation time using six publicly available image datasets. Experimental results demonstrate that the proposed model outperformed the existing state-of-the-art methods in terms of recall, F-measure and area under curve on all the six datasets, and precision on four datasets. The proposed method also takes comparatively less computation time in comparison to many existing methods.  相似文献   

18.
Tacit guidance for collaborative multimedia learning   总被引:1,自引:0,他引:1  
Collaborative multimedia learning is a scenario placing various demands on the learners that go beyond understanding complex issues and coordinating a learning discourse. On the one hand, individuals have to mentally interrelate multiple external representations in order to understand the learning material and the underlying concepts; on the other hand, during collaboration, learners have to use the differently coded information in order to exchange conceptual knowledge. In this paper, the development and experimental evaluation of a group awareness tool (collaborative integration tool) is presented that is intended to simultaneously support both individual and collaborative learning processes during dyadic collaborative multimedia learning. The tool was experimentally compared with an integration task that already proved to foster meaningful individual learning processes. The results suggest that providing group awareness can lead to better individual learning gains by reducing demanding processes and by tacitly guiding learner interactions.  相似文献   

19.
20.
《Applied Soft Computing》2008,8(2):839-848
For dealing with the adjacent input fuzzy sets having overlapping information, non-additive fuzzy rules are formulated by defining their consequent as the product of weighted input and a fuzzy measure. With the weighted input, need arises for the corresponding fuzzy measure. This is a new concept that facilitates the evolution of new fuzzy modeling. The fuzzy measures aggregate the information from the weighted inputs using the λ-measure. The output of these rules is in the form of the Choquet fuzzy integral. The underlying non-additive fuzzy model is investigated for identification of non-linear systems. The weighted input which is the additive S-norm of the inputs and their membership functions provides the strength of the rules and fuzzy densities required to compute fuzzy measures subject to q-measure are the unknown functions to be estimated. The use of q-measure is a powerful way of simplifying the computation of λ-measure that takes account of the interaction between the weighted inputs. Two applications; one real life application on signature verification and forgery detection, and another benchmark problem of a chemical plant illustrate the utility of the proposed approach. The results are compared with those existing in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号