首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mining large amounts of unstructured data for extracting meaningful, accurate, and actionable information, is at the core of a variety of research disciplines including computer science, mathematical and statistical modelling, as well as knowledge engineering. In particular, the ability to model complex scenarios based on unstructured datasets is an important step towards an integrated and accurate knowledge extraction approach. This would provide a significant insight in any decision making process driven by Big Data analysis activities. However, there are multiple challenges that need to be fully addressed in order to achieve this, especially when large and unstructured data sets are considered.In this article we propose and analyse a novel method to extract and build fragments of Bayesian networks (BNs) from unstructured large data sources. The results of our analysis show the potential of our approach, and highlight its accuracy and efficiency. More specifically, when compared with existing approaches, our method addresses specific challenges posed by the automated extraction of BNs with extensive applications to unstructured and highly dynamic data sources.The aim of this work is to advance the current state-of-the-art approaches to the automated extraction of BNs from unstructured datasets, which provide a versatile and powerful modelling framework to facilitate knowledge discovery in complex decision scenarios.  相似文献   

2.
3.
互联网上聚集了大量的文本、图像等非结构化信息,RDF作为W3C提出的互联网上的资源描述框架,非常适合于描述网络上的非结构化信息,因此形成了大量的RDF知识库,如Freebase、Yago、DBPedia等。RDF知识库中包含丰富的语义信息,可以对来自网页的名字实体进行标注,实现语义扩充。将网页上的名字实体映射到知识库中对应实体上称作实体标注。实体标注包括两个主要部分:实体间的映射和标注去歧义。利用海量RDF知识库的特性,提出了一种有效的实体标注方法。该方法采用简单的图加权及计算解决实体标注的去歧义问题。该方法已在云平台上实现,并通过实验验证了其准确度和可扩展性。  相似文献   

4.
Nowadays, semantic lexical resources, like ontologies, are becoming increasingly important in many systems, in particular those providing access to unstructured textual data. Typically, such resources are built based on already existing repositories and by analyzing available texts. In practice, however, building new or enriching existing resources of such type cannot be accomplished without using an appropriate tool. In this paper the SAUText is presented; it is a new system which provides the infrastructure for carrying out research involving the usage of semantic resources and the analysis of unstructured textual data. In the system a dedicated repository for storing various kinds of text data is used and parallelization is taken advantage of in order to speed up the analysis. As an example of a method for knowledge discovery available in the system, a new approach for synonym discovery is introduced.  相似文献   

5.
In order to create better decisions for business analytics, organizations increasingly use external structured, semi-structured, and unstructured data in addition to the (mostly structured) internal data. Current Extract-Transform-Load (ETL) tools are not suitable for this “open world scenario” because they do not consider semantic issues in the integration processing. Current ETL tools neither support processing semantic data nor create a semantic Data Warehouse (DW), a repository of semantically integrated data. This paper describes our programmable Semantic ETL (SETL) framework. SETL builds on Semantic Web (SW) standards and tools and supports developers by offering a number of powerful modules, classes, and methods for (dimensional and semantic) DW constructs and tasks. Thus it supports semantic data sources in addition to traditional data sources, semantic integration, and creating or publishing a semantic (multidimensional) DW in terms of a knowledge base. A comprehensive experimental evaluation comparing SETL to a solution made with traditional tools (requiring much more hand-coding) on a concrete use case, shows that SETL provides better programmer productivity, knowledge base quality, and performance.  相似文献   

6.
This paper presents an approach to query decomposition in a multidatabase environment. The unique aspect of this approach is that it is based on performing transformations over an object algebra that can be used as the basis for a global query language. In the paper, we first present our multidatabase environment and semantic framework, where a global conceptual schema based on the Object Data Management Group standard encompasses the information from heterogeneous data sources that include relational databases as well as object-oriented databases and flat file sources. The meta-data about the global schema is enhanced with information about virtual classes as well as virtual relationships and inheritance hierarchies that exist between multiple sources. The AQUA object algebra is used as the formal foundation for manipulation of the query expression over the multidatabase. AQUA is enhanced with distribution operators for dealing with data distribution issues. During query decomposition we perform an extensive analysis of traversals for path expressions that involve virtual relationships and hierarchies for access to several heterogeneous sources. The distribution operators defined in algebraic terms enhance the global algebra expression with semantic information about the structure, distribution, and localization of the data sources relevant to the solution of the query. By using an object algebra as the basis for query processing, we are able to define algebraic transformations and exploit rewriting techniques during the decomposition phase. Our use of an object algebra also provides a formal and uniform representation for dealing with an object-oriented approach to multidatabase query processing. As part of our query processing discussion, we include an overview of a global object identification approach for relating semantically equivalent objects from diverse data sources, illustrating how knowledge about global object identity is used in the decomposition and assembly processes.  相似文献   

7.
In Digital Enterprises Structured Data and Semi/Unstructured Content are normally stored in two different repositories, with the first often being stored in relational Databases and the second in a content manager which is frequently at an external outsourcer. This storage of complementary information in two different silos has led to the information being processed and data mined separately which is undesirable. Effective knowledge and information use requires seamless access and intelligent analysis of information in its totality to allow enterprises to gain enhanced insights. In this paper, we develop techniques to carry out correlation of the information across different sources and then carryout out knowledge discovery across these complementary sources in a conjoint manner. The techniques developed in our research will then be used to address significant issues in four application areas namely Business, Logistics, Bioinformatics and Electric Power Systems but potential applications with significant impact are much more extensive.  相似文献   

8.
In this paper we illustrate an integrated approach combining Business Intelligence, Big Data and Internet of Things (IoT), which is applied to information resources including structured and unstructured contents, Geo-Spatial and Social Network data, Multimedia (MM), multiple domain vocabularies, classifiers and ontologies. This is implemented in an information system which exploits Associative in-memory technologies in the context of Cloud Computing, as well as Semantic technologies for merging and analyzing information coming from heterogeneous sources. The primary aim is supporting Cultural Heritage Asset crowdsourcing, promotion, publication, management and usage. We describe and discuss, in particular, the application of this system for the analysis of behavior and interest of visitors in different types of populations and visits: on-site/ad-hoc (exhibitions, museums, cultural events) and territorial (historical downtown, archaeological or other touristic areas and routes including cultural resources). In this way it will be possible to provide a common ICT infrastructure and a set of advanced services for all types of subjects interested in the Cultural Heritage domain. The results of the experimentation encourage a Business Intelligence approach which is suitable for both nonprofit, research and business oriented organizations.  相似文献   

9.
Large content networks like the World Wide Web contain huge amounts of information that have the potential of being integrated because their components fit within common concepts and/or are connected through hidden, implicit relationships. One attempt at such an integration is the program called the “Web of Data,” which is an evolution of the Semantic Web. It targets semi-structured information sources such as Wikipedia and turns them into fully structured ones in the form of Web-based databases like DBpedia and then integrates them with other public databases such as Geonames. On the other hand, the vast majority of the information residing on the Web is still totally unstructured, which is the starting point for our approach that aims to integrate unstructured information sources. For this purpose, we exploit techniques from Probabilistic Topic Modeling, in order to cluster Web pages into concepts (topics), which are then related through higher-level concept networks; we also make implicit semantic relationships emerge between single Web pages. The approach has been tested through a number of case studies that are here described. While the applicative focus of the research reported here is on knowledge integration on the specific and relevant case of the WWW, the wider aim is to provide a framework for integration generally applicable to all complex content networks where information propagates from multiple sources.  相似文献   

10.
Most organisations using Open Data currently focus on data processing and analysis. However, although Open Data may be available online, these data are generally of poor quality, thus discouraging others from contributing to and reusing them. This paper describes an approach to publish statistical data from public repositories by using Semantic Web standards published by the W3C, such as RDF and SPARQL, in order to facilitate the analysis of multidimensional models. We have defined a framework based on the entire lifecycle of data publication including a novel step of Linked Open Data assessment and the use of external repositories as knowledge base for data enrichment. As a result, users are able to interact with the data generated according to the RDF Data Cube vocabulary, which makes it possible for general users to avoid the complexity of SPARQL when analysing data. The use case was applied to the Barcelona Open Data platform and revealed the benefits of the application of our approach, such as helping in the decision-making process.  相似文献   

11.
Supporting geographically-aware web document foraging and sensemaking   总被引:1,自引:0,他引:1  
This paper reports on the development and application of strategies and tools for geographic information seeking and knowledge building that leverages unstructured text resources found on the web. Geographic knowledge building from unstructured web sources starts with web document foraging during which the quantity, scope and diversity of web-based information create incredible cognitive burdens on an analyst’s or researcher’s ability to judge information relevancy. Determining information relevancy is ultimately a process of sensemaking. In this paper, we present our research on visually supporting web document foraging and sensemaking. In particular, we present the Sense-of-Place (SensePlace) analytic environment. The scientific goal of SensePlace is to visually and computationally support analyst sensemaking with text artifacts that have potential place, time, and thematic relevance to an analytical problem through identification and visual highlighting of named entities (people, places, times, and organizations) in documents, automated inference to determine document relevance using stored knowledge, and a visual interface with coupled geographic map, timeline, and concept graph displays that are used to contextualize the contexts of potentially relevant documents. We present the results of a case study analysis using SensePlace to uncover potential population migration, geopolitical, and other infectious disease dynamics drivers for measles and other epidemics in Niger. Our analysis allowed us to demonstrate how our approach can support analysis of complex situations along (a) multi-scale geographic dimensions (i.e., vaccine coverage areas), (b) temporal dimensions (i.e., seasonal population movement and migrations), and (c) diverse thematic dimensions (effects of political upheaval, food security, transient movement, etc.).  相似文献   

12.
From unstructured data to actionable intelligence   总被引:1,自引:0,他引:1  
Rao  R. 《IT Professional》2003,5(6):29-35
There's content everywhere, but not the information you need. Content analysis can organize a pile of text into a richly accessible repository. This article explains two key technologies for generating metadata about content - automatic categorization and information extraction. These technologies, and the applications that metadata makes possible, can transform an organization's reservoir of unstructured content into a well-organized repository of knowledge. With metadata available, a company's search system can move beyond simple dialogs to richer means of access that work in more situations. Information visualization, for example, uses metadata and our innate visual abilities to improve access. Besides better access, metadata enables intelligent switching in the content flows of various organizational processes - for example, making it possible to automatically route the right information to the right person. A third class of metadata applications involves mining text to extract features for analysis using the statistical approaches typically applied to structured data. For example, if you turn the text fields in a survey into data, you can then analyze the text along with other data fields. All these metadata-powered applications can improve your company's use of its information resources.  相似文献   

13.
分类体系完善、药品信息全面的药品知识库能够为临床决策以及临床合理用药提供依据和支持。该文以国内的多个医药资源作为参考和数据来源,建立了药品库知识描述体系和分类体系,对药品进行标准化分类并形成详细的知识描述,构建了多来源的中文药品知识库(Chinese Medicine Knowledge Base,CMKB)。所构建的CMKB的分类包括27种一级类别和119种二级类别,从药品的适应证、用法用量等多个层面对14 141种药品进行描述并采用BiLSTM-CRF和T-BiLSTM-CRF模型将非结构化描述中的疾病实体进行了信息抽取,形成了对药品属性的结构化信息抽取,建立了药品实体与自动抽取的疾病实体之间的知识关联。所构建的CMKB能够与中文医学知识图谱进行连接,扩充药品信息,并能够为智能诊断和医疗问答等提供知识基础。  相似文献   

14.
Many organizations use business policies to govern their business processes, often resulting in huge amounts of policy documents. As new regulations arise such as Sarbanes-Oxley, these business policies must be modified to ensure their correctness and consistency. Given the large amounts of business policies, manually analyzing policy documents to discover process information is very time-consuming and imposes excessive workload. In order to provide a solution to this information overload problem, we propose a novel approach named Policy-based Process Mining (PBPM) to automatically extracting process information from policy documents. Several text mining algorithms are applied to business policy texts in order to discover process-related policies and extract such process components as tasks, data items, and resources. Experiments are conducted to validate the extracted components and the results are found to be very promising. To the best of our knowledge, PBPM is the first approach that applies text mining towards discovering business process components from unstructured policy documents. The initial research results presented in this paper will require more research efforts to make PBPM a practical solution.  相似文献   

15.
Data streams are long, relatively unstructured sequences of characters that contain information such as electronic mail or a tape backup of various documents and reports created in an office. A conceptual framework is presented, using relational algebra and relational databases, within which data streams may be queried. As information is extracted from the data streams, it is put into a relational database that may be queried in the usual manner. The database schema evolves as the user's knowledge of the content of the data stream changes. Operators are defined in terms of relational algebra that can be used to extract data from a specially defined relation that contains all or part of the data stream. This approach to querying data streams permits the integration of unstructured data with structured data. The operators defined extend the functionality of relational algebra in much the same way that the join does relative to the basic operators select, project, union, difference, and Cartesian product  相似文献   

16.
One of the main concerns of wireless sensor networks (WSNs) is to deliver useful information from data sources to users at a minimum power consumption due to constraints that sensor nodes must operate on limited power sources for extended time. In particular, achieving power-efficiency and multihop communication in WSN applications is a major issue. This paper continues on the investigation of a recently proposed Minimum-power Multiresolution Data Dissemination (MMDD) problem for WSNs (whose solution is considered here as a benchmark). We propose an ant-inspired solution to this problem. To the best of our knowledge, no attempts have been made so far in this direction. We have evaluated the performance of our proposed solution by conducting a variety of experiments and have found our solution to be promising in terms of total energy consumption in data dissemination.  相似文献   

17.
ContextData warehouses are systems which integrate heterogeneous sources to support the decision making process. Data from the Web is becoming increasingly more important as sources for these systems, which has motivated the extensive use of XML to facilitate data and metadata interchange among heterogeneous data sources from the Web and the data warehouse. However, the business information that data warehouses manage is highly sensitive and must, therefore, be carefully protected. Security is thus a key issue in the design of data warehouses, regardless of the implementation technology. It is important to note that the idiosyncrasy of the unstructured and semi-structured data requires particular security rules that have been specifically tailored to these systems in order to permit their particularities to be captured correctly. Unfortunately, although security issues have been considered in the development of traditional data warehouses, current research lacks approaches with which to consider security when the target platform is based on XML technology.ObjectiveWe shall focus on defining transformations to obtain a secure XML Schema from the conceptual multidimensional model of a data warehouse.MethodWe have first defined the rationale behind the transformation rules and how they have been developed in natural language, and we have then established them clearly and formally by using the QVT language. Finally, in order to validate our proposal we have carried out a case study.ResultsWe have proposed an approach for the model driven development of Secure XML Data Warehouses, defining a set of QVT transformation rules.ConclusionThe main benefit of our proposal is that it is possible to model security requirements together with the conceptual model of the data warehouse during the early stages of a project, and automatically obtain the corresponding implementation for XML.  相似文献   

18.
学习是智能主体获得解决问题能力的重要途径。当前,大多数的研究工作假设主体从完整和结构化的数据中学习,它们难以应对在很多情况下出现的不完整和无结构的信息。本文提出一种基于信息检索技术的解决问题方法,它能够有效地帮助智能主体从不完整和无结构的文本中寻找有用的知识去解决遇到的问题。在141个故障处理事件上的缺一 交叉测试结果显示,它能使主体的处理事件能力迭3.65分(满分为5分)。这表明,将信息检索和主体技术相整合可以有效地提高后者的解决问题能力。  相似文献   

19.
In many real-world applications of multi-agent systems, agent reasoning suffers from bounded rationality caused by both limited resources and limited knowledge. When agent sensing to overcome its knowledge limitations also requires resource use, the agent’s knowledge refinement is affected due to its inability to always sense when and as accurately as needed, further leading to poor decision making. In this paper, we consider what happens when sensing actions require the use of stateful resources, which we define as resources whose state-dependent behavior changes over time based on usage. Current literature addressing agent sensing with limited resources primarily investigates stateless resources, such as avoiding the use of too much time or energy during sensing. However, sensing itself can change the state of a resource, and thus its behavior, which affects both the information gathered and the resulting knowledge refinement. This produces a phenomenon where the sensing action can and will distort its own outcome (and potentially future outcomes), termed the Observer Effect (OE) after the similar phenomenon in the physical sciences. Under this effect, when deliberating about when and how to perform sensing that requires use of stateful resources, an agent faces a strategic tradeoff between satisfying the need for (1) knowledge refinement to support its reasoning, and (2) avoiding knowledge corruption due to distorted sensing outcomes. To address this tradeoff, we model sensing action selection as a partially observable Markov decision process where an agent optimizes knowledge refinement while considering the (possibly hidden) state of the resources used during sensing. In this model, the agent uses reinforcement learning to learn a controller for action selection, as well as how to predict expected knowledge refinement based on resource use during sensing. Our approach is unique from other bounded rationality and sensing research as we consider how to make decisions about sensing with stateful resources that produce side effects such as the OE, as opposed to simply using stateless resources with no such side effect. We evaluate our approach in a fully and partially observable agent mining simulation. The results demonstrate that considering resource state and the OE during sensing action selection through our approach (1) yielded better knowledge refinement, (2) appropriately balanced current and future refinement to avoid knowledge corruption, and (3) exploited the relationship (i.e., high, positive correlation) between sensing and task performance to boost task performance through improved sensing. Further, our methodology also achieved good knowledge refinement even when the OE is not present, indicating that it can improve sensing performance in a wide variety of environments. Finally, our results also provide insights into the types and configurations of learning algorithms useful for learning within our methodology.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号