首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with respect to a semantic artefact, such as an ontology or a knowledge graph, is often referred to as Semantic Table Interpretation (STI) or Semantic Table Annotation. In this survey paper, we aim to provide a comprehensive and up-to-date state-of-the-art review of the different tasks and methods that have been proposed so far to perform STI. First, we propose a new categorization that reflects the heterogeneity of table types that one can encounter, revealing different challenges that need to be addressed. Next, we define five major sub-tasks that STI deals with even if the literature has mostly focused on three sub-tasks so far. We review and group the many approaches that have been proposed into three macro families and we discuss their performance and limitations with respect to the various datasets and benchmarks proposed by the community. Finally, we detail what are the remaining scientific barriers to be able to truly automatically interpret any type of tables that can be found in the wild Web.  相似文献   

2.
In the framework of Axiomatic Fuzzy Set (AFS) theory, we propose a new approach to data clustering. The objective of this clustering is to adhere to some principles of grouping exercised by humans when determining a structure in data. Compared with other clustering approaches, the proposed approach offers more detailed insight into the cluster's structure and the underlying decision making process. This contributes to the enhanced interpretability of the results via the representation capabilities of AFS theory. The effectiveness of the proposed approach is demonstrated by using real-world data, and the obtained results show that the performance of the clustering is comparable with other fuzzy rule-based clustering methods, and benchmark fuzzy clustering methods FCM and K-means. Experimental studies have shown that the proposed fuzzy clustering method can discover the clusters in the data and help specify them in terms of some comprehensive fuzzy rules.  相似文献   

3.
In this paper, we present an ontology-based information extraction and retrieval system and its application in the soccer domain. In general, we deal with three issues in semantic search, namely, usability, scalability and retrieval performance. We propose a keyword-based semantic retrieval approach. The performance of the system is improved considerably using domain-specific information extraction, inferencing and rules. Scalability is achieved by adapting a semantic indexing approach and representing the whole world as small independent models. The system is implemented using the state-of-the-art technologies in Semantic Web and its performance is evaluated against traditional systems as well as the query expansion methods. Furthermore, a detailed evaluation is provided to observe the performance gain due to domain-specific information extraction and inferencing. Finally, we show how we use semantic indexing to solve simple structural ambiguities.  相似文献   

4.
5.
Abstract Recent work on applying semantic technologies to learning has concentrated on providing novel means of accessing and making use of learning objects. However, this is unnecessarily limiting: semantic technologies will make it possible to develop a range of educational Semantic Web services, such as interpretation, structure-visualization, support for argumentation, novel forms of content customization, novel mechanisms for aggregating learning material, citation services and so on. In this paper, we outline an initial framework that extends the use of semantic technologies as a means of providing learning services that are owned and created by learning communities.  相似文献   

6.
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by specifying the concepts and the relationships within the data. Such models are the key ingredients to automatically publish the data into knowledge graphs. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the related work focuses on semantic annotation of the data fields (source attributes). However, constructing a semantic model that explicitly describes the relationships between the attributes in addition to their semantic types is critical.We present a novel approach that exploits the knowledge from a domain ontology and the semantic models of previously modeled sources to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and the known semantic models to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. These precise models make it possible to automatically integrate the data across sources and provide rich support for source discovery and service composition. They also make it possible to automatically publish semantic data into knowledge graphs.  相似文献   

7.
The semantic web vision is one in which rich, ontology-based semantic markup will become widely available. The availability of semantic markup on the web opens the way to novel, sophisticated forms of question answering. AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input, and returns answers drawn from one or more knowledge bases (KBs). We say that AquaLog is portable because the configuration time required to customize the system for a particular ontology is negligible. AquaLog presents an elegant solution in which different strategies are combined together in a novel way. It makes use of the GATE NLP platform, string metric algorithms, WordNet and a novel ontology-based relation similarity service to make sense of user queries with respect to the target KB. Moreover it also includes a learning component, which ensures that the performance of the system improves over the time, in response to the particular community jargon used by end users.  相似文献   

8.
With the development of mobile technology, the users browsing habits are gradually shifted from only information retrieval to active recommendation. The classification mapping algorithm between users interests and web contents has been become more and more difficult with the volume and variety of web pages. Some big news portal sites and social media companies hire more editors to label these new concepts and words, and use the computing servers with larger memory to deal with the massive document classification, based on traditional supervised or semi-supervised machine learning methods. This paper provides an optimized classification algorithm for massive web page classification using semantic networks, such as Wikipedia, WordNet. In this paper, we used Wikipedia data set and initialized a few category entity words as class words. A weight estimation algorithm based on the depth and breadth of Wikipedia network is used to calculate the class weight of all Wikipedia Entity Words. A kinship-relation association based on content similarity of entity was therefore suggested optimizing the unbalance problem when a category node inherited the probability from multiple fathers. The keywords in the web page are extracted from the title and the main text using N-gram with Wikipedia Entity Words, and Bayesian classifier is used to estimate the page class probability. Experimental results showed that the proposed method obtained good scalability, robustness and reliability for massive web pages.  相似文献   

9.
The increasing availability of High Spatial Resolution (HSR) satellite images is an opportunity to characterize and identify urban objects. Thus, the augmentation of the precision led to a need of new image analysis methods using region-based (or object-based) approaches. In this field, an important challenge is the use of domain knowledge for automatic urban objects identification, and a major issue is the formalization and exploitation of this knowledge. In this paper, we present the building steps of a knowledge-base of urban objects allowing to perform the interpretation of HSR images in order to help urban planners to automatically map the territory. The knowledge-base is used to assign segmented regions (i.e. extracted from the images) into semantic objects (i.e. concepts of the knowledge-base). A matching process between the regions and the concepts of the knowledge-base is proposed, allowing to bridge the semantic gap between the images content and the interpretation. The method is validated on Quickbird images of the urban areas of Strasbourg and Marseille (France). The results highlight the capacity of the method to automatically identify urban objects using the domain knowledge.  相似文献   

10.
F.  G.  D.  D.   《Data & Knowledge Engineering》2006,58(3):436-465
Designing applications for supporting user activity on accessing Web information sources is one of the most appealing challenges for researchers in the Artificial Intelligence area. We meet the above goal by proposing an agent-based approach relying on a new model, called concept-graph, capable of representing user-behavior-dependent relationships among concepts and, importantly, dealing with structural and semantic heterogeneity of Web sources. The latter feature is a nice peculiarity of our approach, which witnesses how recent results coming from the Cooperative Information Systems area, can be profitably exploited in the recommender systems field. Finally, in order to test the validity of our approach, we have implemented the proposed framework in a Java prototype.  相似文献   

11.
An expert system is considered to be reliable if it generates reliable hypotheses. The quality of the hypotheses depends mainly on the effectiveness of system's knowledge base. This paper discusses the problem of designing effective knowledge bases for rule-based systems with uncertainty. The knowledge is acquired from aggregate data stored in various repositories. The data can differ, to some extent, both in syntax and in semantics. The first part of an algorithm for rules' generation and refinement operates by means of semantic data integration. It allows to join aggregate data from different repositories and generate strong production rules. The second part of the algorithm is based on a formal concept of the normal base form. For having the property of normality, a knowledge base has to be internally consistent and not redundant. In the process of rules' refinement, the rules violating the normality are eliminated. The effectiveness of the obtained knowledge base, dependent on the base's size and on rules' reliabilities, is high. The considerations are illustrated with medical examples.  相似文献   

12.
We present Wiser, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking.Wiser indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author’s publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author’s expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia.At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author’s documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author’s expertise and the query topic via the above graph-based profiles.The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that Wiser achieves better performance than all the other competitors, thus proving the effectiveness of modeling author’s profile via our “semantic” graph of entities. Finally, we comment on the use of Wiser for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University.  相似文献   

13.
The distributed nature of the Web, as a decentralized system exchanging information between heterogeneous sources, has underlined the need to manage interoperability, i.e., the ability to automatically interpret information in Web documents exchanged between different sources, necessary for efficient information management and search applications. In this context, XML was introduced as a data representation standard that simplifies the tasks of interoperation and integration among heterogeneous data sources, allowing to represent data in (semi-) structured documents consisting of hierarchically nested elements and atomic attributes. However, while XML was shown most effective in exchanging data, i.e., in syntactic interoperability, it has been proven limited when it comes to handling semantics, i.e.,  semantic interoperability, since it only specifies the syntactic and structural properties of the data without any further semantic meaning. As a result, XML semantic-aware processing has become a motivating challenge in Web data management, requiring dedicated semantic analysis and disambiguation methods to assign well-defined meaning to XML elements and attributes. In this context, most existing approaches: (i) ignore the problem of identifying ambiguous XML elements/nodes, (ii) only partially consider their structural relationships/context, (iii) use syntactic information in processing XML data regardless of the semantics involved, and (iv) are static in adopting fixed disambiguation constraints thus limiting user involvement. In this paper, we provide a new XML Semantic Disambiguation Framework titled XSDFdesigned to address each of the above limitations, taking as input: an XML document, and then producing as output a semantically augmented XML tree made of unambiguous semantic concepts extracted from a reference machine-readable semantic network. XSDF consists of four main modules for: (i) linguistic pre-processing of simple/compound XML node labels and values, (ii) selecting ambiguous XML nodes as targets for disambiguation, (iii) representing target nodes as special sphere neighborhood vectors including all XML structural relationships within a (user-chosen) range, and (iv) running context vectors through a hybrid disambiguation process, combining two approaches: concept-basedand context-based disambiguation, allowing the user to tune disambiguation parameters following her needs. Conducted experiments demonstrate the effectiveness and efficiency of our approach in comparison with alternative methods. We also discuss some practical applications of our method, ranging over semantic-aware query rewriting, semantic document clustering and classification, Mobile and Web services search and discovery, as well as blog analysis and event detection in social networks and tweets.  相似文献   

14.
An explosive growth in the volume, velocity, and variety of the data available on the Internet has been witnessed recently. The data originated from multiple types of sources including mobile devices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, health data has led to one of the most challenging research issues of the big data era. In this paper, Knowle—an online news management system upon semantic link network model is introduced. Knowle is a news event centrality data management system. The core elements of Knowle are news events on the Web, which are linked by their semantic relations. Knowle is a hierarchical data system, which has three different layers including the bottom layer (concepts), the middle layer (resources), and the top layer (events). The basic blocks of the Knowle system—news collection, resources representation, semantic relations mining, semantic linking news events are given. Knowle does not require data providers to follow semantic standards such as RDF or OWL, which is a semantics-rich self-organized network. It reflects various semantic relations of concepts, news, and events. Moreover, in the case study, Knowle is used for organizing and mining health news, which shows the potential on forming the basis of designing and developing big data analytics based innovation framework in the health domain.  相似文献   

15.
XML plays an important role as the standard language for representing structured data for the traditional Web, and hence many Web-based knowledge management repositories store data and documents in XML. If semantics about the data are formally represented in an ontology, then it is possible to extract knowledge: This is done as ontology definitions and axioms are applied to XML data to automatically infer knowledge that is not explicitly represented in the repository. Ontologies also play a central role in realizing the burgeoning vision of the semantic Web, wherein data will be more sharable because their semantics will be represented in Web-accessible ontologies. In this paper, we demonstrate how an ontology can be used to extract knowledge from an exemplar XML repository of Shakespeare’s plays. We then implement an architecture for this ontology using de facto languages of the semantic Web including OWL and RuleML, thus preparing the ontology for use in data sharing. It has been predicted that the early adopters of the semantic Web will develop ontologies that leverage XML, provide intra-organizational value such as knowledge extraction capabilities that are irrespective of the semantic Web, and have the potential for inter-organizational data sharing over the semantic Web. The contribution of our proof-of-concept application, KROX, is that it serves as a blueprint for other ontology developers who believe that the growth of the semantic Web will unfold in this manner.
Henry M. KimEmail:
  相似文献   

16.
With the advancement of scientific and engineering research, a huge number of academic literature are accumulated. Manually reviewing the existing literature is the main way to explore embedded knowledge, and the process is quite time-consuming and labor intensive. As the quantity of literature is increasing exponentially, it would be more difficult to cover all aspects of the literature using the traditional manual review approach. To overcome this drawback, bibliometric analysis is used to analyze the current situation and trend of a specific research field. In the bibliometric analysis, only a few key phrases (e.g., authors, publishers, journals, and citations) are usually used as the inputs for analysis. Information other than those phrases is not extracted for analysis, while that neglected information (e.g., abstract) might provide more detailed knowledge in the article. To tackle with this problem, this study proposed an automatic literature knowledge graph and reasoning network modeling framework based on ontology and Natural Language Processing (NLP), to facilitate the efficient knowledge exploration from literature abstract. In this framework, a representation ontology is proposed to characterize the literature abstract data into four knowledge elements (background, objectives, solutions, and findings), and NLP technology is used to extract the ontology instances from the abstract automatically. Based on the representation ontology, a four-space integrated knowledge graph is built using NLP technology. Then, reasoning network is generated according to the reasoning mechanism defined in the proposed ontology model. To validate the proposed framework, a case study is conducted to analyze the literature in the field of construction management. The case study proves that the proposed ontology model can be used to represent the knowledge embedded in the literatures’ abstracts, and the ontology elements can be automatically extracted by NLP models. The proposed framework can be an enhancement for the bibliometric analysis to explore more knowledge from the literature.  相似文献   

17.
我国"十三五"期间在飞运载火箭、武器型号进入高密度发射期,在研型号进入研制关键期。高密度发射期间,与以往发射任务相比,发射任务繁重,测试及发射周期明显缩短;在测试及发射过程中,测控数据判读是重要环节,直接关系到飞行器各系统功能和性能判定,整体性能评估;传统模式下,靶场测试、发射期间需要大量的测试判读人员现场开展数据判读分析,耗费大量人力成本;为解决上述问题,提出了基于模型驱动的自动判读方法,同时文中设计了面向特定领域的数据判读建模语言AIPScript (AutoIP Script),实现了姿控、制导、动力、电气总体及测量各系统或专业的判据描述、数学建模及自动判读,并自动生成判读分析报告;实际应用表明,该方法的提出不仅节省了人力资源而且大大地提高了判读效率和准确度。  相似文献   

18.
19.
20.
The labeling of the regions of a segmented image according to a semantic representation (ontology) is usually associated with the notion of understanding. The high combinatorial aspect of this problem can be reduced with local checking of constraints between the elements of the ontology. In the classical definition of Finite Domain Constraint Satisfaction Problem, it is assumed that the matching problem between regions and labels is bijective. Unfortunately, in image interpretation the matching problem is often non-univocal. Indeed, images are often over-segmented: one object is made up of several regions. This non-univocal matching between data and a conceptual graph was not possible until a decisive step was accomplished by the introduction of arc consistency with bilevel constraint (FDCSPBC). However, this extension is only adequate for a matching corresponding to surjective functions. In medical image analysis, the case of non-functional relations is often encountered, for example, when an unexpected object like a tumor appears. In this case, the data cannot be mapped to the conceptual graph, with a classical approach. In this paper we propose an extension of the FDCSPBC to solve the constraint satisfaction problem for non-functional relations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号