首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 779 毫秒
1.
主题检测近年来在文本挖掘和自然语言处理领域得到了广泛的应用,对主题进行结构建模是主题检测的基础。为了对文本流中的多粒度主题进行建模,提出一种基于语义层次树的主题结构模型。该模型利用领域本体的特点,将主题同本体作一一映射,结合概率理论,将概念集里的概念用主题树的叶子节点表示,每一层中的节点均是下一层节点的多项分布,使之更适合描述文本流中多粒度的主题结构。为了便于构建主题的空间结构,提出主题的相似度和事件相关度计算方法。该文结尾设计了实验构造真实新闻文本流数据上的主题树。实验结果表明,该结构模型能够体现主题丰富的多粒度空间语义特征。  相似文献   

2.
随着本体在数据集成方面的广泛应用,面向本体的概念相似度计算成为人们关注的热点问题.针对当前领域本体概念相似度的计算过程都比较复杂的问题,提出一种基于树结构的本体概念相似度的计算方法.该方法通过添加和重组虚拟节点重构本体树,再通过属性比较映射对象,最后通过计算,得到本体概念的语义相似度结果.实验结果表明,该方法有效利用了本体概念的语义信息,得到了合理的计算结果,并简化了计算过程.  相似文献   

3.
文中介绍了数据仓库领域一种基于本体的语义集成方法。首先建立领域本体和数据源的局部本体,然后通过局部本体对应的概念树间的映射算法得到数据源全局本体,再和领域本体映射,得到映射关系。最后通过本体推理,得出隐含的语义关系,用最终的语义关系来指导数据抽取、转换和加载过程,实现数据仓库语义程度上的数据集成。  相似文献   

4.
数据集成的难点是如何解决数据之间的语义异构问题,本文利用本体在语义集成上的优点,提出了一种基于本体语义映射的数据集成框架。根据本体概念的定义及其结构,给出了一种本体语义映射算法,该算法通过属性集合间的比较确定概念语义关系,在计算概念相似度时,考虑了概念名称、概念属性集合和相关概念的语义信息。最后通过概念的属性集映射算法和概念映射算法实现了本体语义映射,从而重点解决了数据集成中的语义映射问题。  相似文献   

5.
基于本体的概念语义相似度近年来在信息科学的多个领域获得了广泛的应用,其计算方法也为诸多学者所关注。分析现有基于本体的概念语义相似度计算方法的工作原理和优缺点,提出一种对概念共享路径的重合度和概念最低共同祖先节点的深度进行综合加权的概念语义相似度算法。该算法灵活简便、可扩展性强,能够应用于不同类型的本体。使用基因本体和植物本体的部分数据进行了实验并与两种现有算法进行了比较,实验结果证明了提出的计算方法的正确性和有效性。  相似文献   

6.
针对目前矿山领域异构数据融合时先验知识获取困难、物联网本体库实时性差、实例对象数据手动标注方式效率较低等问题,提出了一种矿山语义物联网自动语义标注方法。给出了传感数据语义化处理框架:一方面,确定本体的专业领域和范畴,通过重用流注释本体(SAO)构建领域本体,作为驱动语义标注的基础;另一方面,使用机器学习方法对感知数据流进行特征提取与数据分析,从海量数据中挖掘出概念间的关系;通过数据挖掘知识来驱动本体的更新与完善,实现本体的动态更新、拓展与更精确的语义标注,增强机器的理解力。以矿井提升系统主轴故障为例阐述从本体到实例化的语义标注过程:结合领域专家知识及本体重用,采用"七步法"建立矿井提升系统主传动故障本体;为了加强实例数据属性描述的准确性,使用主成分分析法(PCA)与K-means聚类方法对数据集进行降维和分组,提取出数据属性与概念的关系;通过基于语义Web的规则语言(SWRL)标注具体先行条件与后续概念的关系,优化领域本体。实验结果表明:在本体实例化过程中,可利用机器学习技术从传感数据中自动提取概念,实现传感数据的自动语义标注。  相似文献   

7.
针对民航突发事件应急管理领域本体的自动更新问题,提出了基于LDA的领域本体概念获取方法。以文本信息作为数据源,采用NLPIR自适应分词与过滤方法获取候选术语集,设计了领域本体的LDA主题模型,通过吉布斯采样进行LDA模型训练与主题推断,实现了领域本体核心概念的相关术语提取;基于LDA主题概率分布研究了语义关系识别规则的构建方法,给出了概念及其相关术语语义关系的识别与实现过程。实验效果表明,该方法可以有效解决大规模领域本体概念的自动更新问题,为大数据环境下民航突发事件跨媒体信息的共享与推理提供了良好的数据支持。  相似文献   

8.
基于本体的异构数据集成方法及其实现   总被引:2,自引:0,他引:2  
分析了传统异构数据集成中存在的困难,给出一个改进的基于本体的异构数据集成方法.该方法采用本体描述信息源领域中的概念,通过构建语义映射关系,解决数据集成中存在的语义异构问题.  相似文献   

9.
本体辅助的自动化模式匹配技术   总被引:5,自引:0,他引:5  
刘强  赵迪  钟华  黄涛 《软件学报》2009,20(2):234-245
在基于映射的数据交换系统框架下,提出了一种本体辅助的模式匹配方法.它利用WordNet词汇本体和决策树学习相结合的方法进行属性名称匹配,构建数据类型本体计算属性数据类型的语义距离,依赖领域本体发现一对多的语义匹配关系,这3个过程逐步提高了匹配质量.建立在实际应用数据上的实验结果表明,该方法具有较高的精确度和召回率.  相似文献   

10.
针对当前语义物联网尚未形成一套比较完善、可共享的语义系统,无法支持不同领域资源描述和语义互操作性问题,提出了一个语义物联网中基于多个领域本体及链接开放数据的语义互操作方法。该方法可以半自动化标注新部署的传感器,为传感器数据添加语义,使机器理解数据含义,并对其进行推理;采用本体实体表管理同一领域本体,来规范统一本体;基于链接开放数据获取命名实体相关的语义,对信息进行补充。最后通过具体实例阐述其工作过程,并与其它经典的方法进行比较。结果表明,其可以较好的实现多领域信息的互操作。  相似文献   

11.
There has been an explosion in the types, availability and volume of data accessible in an information system, thanks to the World Wide Web (the Web) and related inter-networking technologies. In this environment, there is a critical need to replace or complement earlier database integration approaches and current browsing and keyword-based techniques with concept-based approaches. Ontologies are increasingly becoming accepted as an important part of any concept or semantics based solution, and there is increasing realization that any viable solution will need to support multiple ontologies that may be independently developed and managed. In particular, we consider the use of concepts from pre-existing real world domain ontologies for describing the content of the underlying data repositories. The most challenging issue in this approach is that of vocabulary sharing, which involves dealing with the use of different terms or concepts to describe similar information. In this paper, we describe the architecture, design and implementation of the OBSERVER system. Brokering across the domain ontologies is enabled by representing and utilizing interontology relationships such as (but not limited to) synonyms, hyponyms and hypernyms across terms in different ontologies. User queries are rewritten by using these relationships to obtain translations across ontologies. Well established metrics like precision and recall based on the extensions underlying the concepts are used to estimate the loss of information, if any.  相似文献   

12.
13.
The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to provide interoperability—both between sites with different terminologies and with existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world’s data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies.This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, and it maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it.  相似文献   

14.
Integration of geographic information has increased in importance because of new possibilities arising from the interconnected world and the increasing availability of geographic information. Ontologies support the creation of conceptual models and help with information integration. In this paper, we propose a way to link the formal representation of semantics (i.e., ontologies) to conceptual schemas describing information stored in databases. The main result is a formal framework that explains a mapping between a spatial ontology and a geographic conceptual schema. The mapping of ontologies to conceptual schemas is made using three different levels of abstraction: formal, domain, and application levels. At the formal level, highly abstract concepts are used to express the schema and the ontologies. At the domain level, the schema is regarded as an instance of a generic data model. At the application level, we focus on the particular case of geographic applications. We also discuss the influence of ontologies in both the traditional and geographic systems development methodologies, with an emphasis on the conceptual design phase.  相似文献   

15.
The conceptualization of knowledge required for an efficient processing of textual data is usually represented as ontologies. Depending on the knowledge domain and tasks, different types of ontologies are constructed: formal ontologies, which involve axioms and detailed relations between concepts; taxonomies, which are hierarchically organized concepts; and informal ontologies, such as Internet encyclopedias created and maintained by user communities. Manual construction of ontologies is a time-consuming and costly process requiring the participation of experts; therefore, in recent years, there have appeared many systems that automate this process in a greater or lesser degree. This paper provides an overview of methods for automatic construction and enrichment of ontologies, with the focus being placed on informal ontologies.  相似文献   

16.
Efficient retrieval of ontology fragments using an interval labeling scheme   总被引:1,自引:0,他引:1  
Nowadays very large domain ontologies are being developed in life-science areas like Biomedicine, Agronomy, Astronomy, etc. Users and applications can benefit enormously from these ontologies in very different tasks, such as visualization, vocabulary homogenizing and data classification. However, due to their large size, they are often unmanageable for these applications. Instead, it is necessary to provide small and useful fragments of these ontologies so that the same tasks can be performed as if the whole ontology is being used. In this work we present a novel method for efficiently indexing and generating ontology fragments according to the user requirements. Moreover, the generated fragments preserve relevant inferences that can be made with the selected symbols in the original ontology. Such a method relies on an interval labeling scheme that efficiently manages the transitive relationships present in the ontologies. Additionally, we provide an interval’s algebra to compute some logical operations over the ontology concepts. We have evaluated the proposed method over several well-known biomedical ontologies. Results show very good performance and scalability, demonstrating the applicability of the proposed method in real scenarios.  相似文献   

17.
The paper argues that Guarino is right that ontologies are different from thesauri and similar objects, but not in the ways he believes: they are distinguished from essentially linguistic objects like thesauri and hierarchies of conceptual relations because they unpack, ultimately, in terms of sets of objects and individuals. However this is a lonely status, and without much application outside strict scientific and engineering disciplines, and of no direct relevance to language processing (NLP). More interesting structures, of NLP relevance, that encode conceptual knowledge, cannot be subjected to the “cleaning up” techniques that Guarino advocates, because his conditions are too strict to be applicable, and because the terms used in such structures retain their language-like features of ambiguity and vagueness, and in a way that cannot be eliminated by reference to sets of objects, as it can be in ontologies in the narrow sense. Wordnet is a structure that remains useful to NLP, and has within it features of both types (ontologies and conceptual hierarchies) and its function and usefulness will remain, properly, resistant to Guarino’s techniques, because those rest on a misunderstanding about concepts. The ultimate way out of such disputes can only come from automatic construction and evaluation procedures for conceptual and ontological structures from data, which is to say, corpora.  相似文献   

18.
Abstract: Vast amounts of medical information reside within text documents, so that the automatic retrieval of such information would certainly be beneficial for clinical activities. The need for overcoming the bottleneck provoked by the manual construction of ontologies has generated several studies and research on obtaining semi-automatic methods to build ontologies. Most techniques for learning domain ontologies from free text have important limitations. Thus, they can extract concepts so that only taxonomies are generally produced although there are other types of semantic relations relevant in knowledge modelling. This paper presents a language-independent approach for extracting knowledge from medical natural language documents. The knowledge is represented by means of ontologies that can have multiple semantic relationships among concepts.  相似文献   

19.
Upper-level ontologies comprise general concepts and properties which need to be extended to include more diverse and specific domain vocabularies. We present the extension of NASA's Semantic Web for Earth and Environmental Terminology (SWEET) ontologies to include part of the hydrogeology domain. We describe a methodology that can be followed by other allied domain experts who intend to adopt the SWEET ontologies in their own discipline. We have maintained the modular design of the SWEET ontologies for maximum extensibility and reusability of our ontology in other fields, to ensure inter-disciplinary knowledge reuse, management, and discovery.The extension of the SWEET ontologies involved identification of the general SWEET concepts (classes) to serve as the super-class of the domain concepts. This was followed by establishing the special inter-relationships between domain concepts (e.g., equivalence for vadose zone and unsaturated zone), and identifying the dependent concepts such as physical properties and units, and their relationship to external concepts. Ontology editing tools such as SWOOP and Protégé were used to analyze and visualize the structure of the existing OWL files. Domain concepts were introduced either as standalone new classes or as subclasses of existing SWEET ontologies. This involved changing the relationships (properties) and/or adding new relationships based on domain theories. In places, in the Owl files, the entire structure of the existing concepts needed to be changed to represent the domain concept more meaningfully. Throughout this process, the orthogonal structure of SWEET ontologies was maintained and the consistency of the concepts was tested using the Racer reasoner. Individuals were added to the new concepts to test the modified ontologies. Our work shows that SWEET ontologies can successfully be extended and reused in any field without losing their modular or reference structure, or disrupting their URI links.  相似文献   

20.
数据时效性是影响数据质量的重要因素,可靠的数据时效性对数据检索的精确度、数据分析结论的可信性起到关键作用.数据时效不精确、数据过时等现象给大数据应用带来诸多问题,很大程度上影响着数据价值的发挥.对于缺失了时间戳或者时间不准确的数据,精确恢复其时间戳是困难的,但可以依据一定的规则对其时间先后顺序进行还原恢复,满足数据清洗及各类应用需求.在数据时效性应用需求分析的基础上,首先明确了属性的时效规则相关概念,对属性的时效规则等进行了形式化定义;然后提出了基于图模型的时效规则发现以及数据时序修复算法;随后,对相关算法进行了实现,并在真实数据集上对算法运行效率、修复正确率等进行了测试,分析了影响算法修复数据正确率的一些影响因素,对算法进行了较为全面的分析评价.实验结果表明,算法具有较高的执行效率和较好的时效修复效果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号