首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
To develop a Multi-Sensor System (MSS) one has to consider the following three important issues: (1) modeling the uncertainty that exists in the sensory measurements, (2) modeling the cooperation behavior among the sensors, and, finally, (3) developing fusion strategies that recognize both the uncertainty model and the cooperation behavior. In this article we propose a probabilistic approach for modeling the uncertainty and cooperation in sensory teams. We show how the Information Variation measure can be used to capture both the quality of sensory data and the interdependence relationships that might exist between the different sensors. This allows the sensor fusion procedures to avoid the assumption that the observations made by the different sensors are totally independent, an assumption that lessens the applicability of such procedures in many practical situations. We also show how DeGroot's Consensus model can be combined with the Information Variation model to fuse the uncertain sensory data. The proposed approach develops to an approximation of the Bayesian paradigm when the team constitutes more than two sensors. It is the computational burden as well as the difficulty associated with the construction of the exact Bayesian paradigm that motivated the development of this approach. © 1996 John Wiley & Sons, Inc.  相似文献   

2.
There are numerous applications where there is a need to rapidly infer a story about a given subject from a given set of potentially heterogeneous data sources. In this paper, we formally define a story to be a set of facts about a given subject that satisfies a “story length” constraint. An optimal story is a story that maximizes the value of an objective function measuring the goodness of a story. We present algorithms to extract stories from text and other data sources. We also develop an algorithm to compute an optimal story, as well as three heuristic algorithms to rapidly compute a suboptimal story. We run experiments to show that constructing stories can be efficiently performed and that the stories constructed by these heuristic algorithms are high quality stories. We have built a prototype STORY system based on our model—we briefly describe the prototype as well as one application in this paper.
Antonio PicarielloEmail:
  相似文献   

3.
Presented is a model that integrates three data types (numbers, intervals, and linguistic assessments). Data of these three types come from a variety of sensors. One objective of sensor-fusion models is to provide a common framework for data integration, processing, and interpretation. That is what our model does. We use a small set of artificial data to illustrate how problems as diverse as feature analysis, clustering, cluster validity, and prototype classifier design can all be formulated and attacked with standard methods once the data are converted to the generalized coordinates of our model. The effects of reparameterization on computational outputs are discussed. Numerical examples illustrate that the proposed model affords a natural way to approach problems which involve mixed data types  相似文献   

4.
Two models are discussed that integrate heterogeneous fuzzy data of three types: real numbers, real intervals, and real fuzzy sets. The architecture comprises three modules: 1) an encoder that converts the mixed data into a uniform internal representation; 2) a numerical processing core that uses the internal representation to solve a specified task; and 3) a decoder that transforms the internal representation back to an interpretable output format. The core used in this study is fuzzy clustering, but there are many other operations that are facilitated by the models. Two schemes for encoding the data and decoding it after clustering are presented. One method uses possibility and necessity measures for encoding and several variants of a center of gravity defuzzification method for decoding. The second approach uses piecewise linear splines to encode the data and decode the clustering results. Both procedures are illustrated using two small sets of heterogeneous fuzzy data  相似文献   

5.
In this paper we present the design, implementation and evaluation of SOBA, a system for ontology-based information extraction from heterogeneous data resources, including plain text, tables and image captions. SOBA is capable of processing structured information, text and image captions to extract information and integrate it into a coherent knowledge base. To establish coherence, SOBA interlinks the information extracted from different sources and detects duplicate information. The knowledge base produced by SOBA can then be used to query for information contained in the different sources in an integrated and seamless manner. Overall, this allows for advanced retrieval functionality by which questions can be answered precisely. A further distinguishing feature of the SOBA system is that it straightforwardly integrates deep and shallow natural language processing to increase robustness and accuracy. We discuss the implementation and application of the SOBA system within the SmartWeb multimodal dialog system. In addition, we present a thorough evaluation of the different components of the system. However, an end-to-end evaluation of the whole SmartWeb system is out of the scope of this paper and has been presented elsewhere by the SmartWeb consortium.  相似文献   

6.
在聚类过程中考虑到数据的非确定性,提出了一种改进的K-平均算法——FK-算法。FK-算法思想是减小总均方误差的期望值E(SSE),需特别说明的是对数据对象xi 采用在非确定区域内用非确定密度概率函数pdf f(xi)进行描述。用FK-算法对非确定运动模式的运动对象进行了分析,实验表明考虑数据的非确定因素,在聚类分析处理时有比较精确的结果。  相似文献   

7.
Global viewing of heterogeneous data sources   总被引:10,自引:0,他引:10  
The problem of defining global views of heterogeneous data sources to support querying and cooperation activities is becoming more and more important due to the availability of multiple data sources within complex organizations and in global information systems. Global views are defined to provide a unified representation of the information in the different sources by analyzing conceptual schemas associated with them and resolving possible semantic heterogeneity. We propose an affinity based unification method for global view construction. In the method: (1) the concept of affinity is introduced to assess the level of semantic relationship between elements in different schemas by taking into account semantic heterogeneity; (2) schema elements are classified by affinity levels using clustering procedures so that their different representations can be analyzed for unification; (3) global views are constructed starting from selected clusters by unifying representations of their elements. Experiences of applying the proposed unification method and the associated tool environment ARTEMIS on databases of the Italian Public Administration information systems are described  相似文献   

8.
XML documents are becoming popular for business process integration. To achieve interoperability between applications, XML documents must also conform to various commonly used data type definitions (DTDs). However, most business data are not maintained as XML documents. They are stored in various native formats, such as database tables or LDAP directories. Hence, a middleware is needed to dynamically generate XML documents conforming to predefined DTDs from various data sources. As industrial consortia and large corporations have created various DTDs, it is both challenging and time-consuming to design the necessary middleware to conform to so many different DTDs. This problem is particularly acute for a small- or medium-sized enterprise because it lacks the IT skills to quickly develop such a middleware. In this paper, we present XLE, an XML Lightweight Extractor, as a practical approach to dynamically extracting DTD-conforming XML documents from heterogeneous data sources. XLE is based on a framework called DTD source annotation (DTDSA). It treats a DTD as the control structure of a program. The annotations become the program statements, such as functions and assignments. DTD-conforming XML documents are generated by parsing annotated DTDs. Basically, DTD annotations describe declaratively the mappings between target XML documents and the source data. The XLE engine implements a few basic annotations, providing a practical solution for many small- and medium-sized enterprises. However, XLE is designed to be versatile. It allows sophisticated users to plug in their own implementations to access new types of data or to achieve better performance. Heterogeneous data sources can be simply specified in the annotations. A GUI tool is provided to highlight the places where annotations are needed.  相似文献   

9.
Scaling access to heterogeneous data sources with DISCO   总被引:5,自引:0,他引:5  
Accessing many data sources aggravates problems for users of heterogeneous distributed databases. Database administrators must deal with fragile mediators, that is, mediators with schemas and views that must be significantly changed to incorporate a new data source. When implementing translators of queries from mediators to data sources, database implementers must deal with data sources that do not support all the functionality required by mediators. Application programmers must deal with graceless failures for unavailable data sources. Queries simply return failure and no further information when data sources are unavailable for query processing. The Distributed Information Search COmponent (Disco) addresses these problems. Data modeling techniques manage the connections to data sources, and sources can be added transparently to the users and applications. The interface between mediators and data sources flexibly handles different query languages and different data source functionality. Query rewriting and optimization techniques rewrite queries so they are efficiently evaluated by sources. Query processing and evaluation semantics are developed to process queries over unavailable data sources. In this article, we describe: 1) the distributed mediator architecture of Disco; 2) the data model and its modeling of data source connections; 3) the interface to underlying data sources and the query rewriting process; and 4) query processing semantics. We describe several advantages of our system  相似文献   

10.
Searching XML data with a structured XML query can improve the precision of results compared with a keyword search. However, the structural heterogeneity of the large number of XML data sources makes it difficult to answer the structured query exactly. As such, query relaxation is necessary. Previous work on XML query relaxation poses the problem of unnecessary computation of a big number of unqualified relaxed queries. To address this issue, we propose an adaptive relaxation approach which relaxes a query against different data sources differently based on their conformed schemas. In this paper, we present a set of techniques that supports this approach, which includes schema-aware relaxation rules for relaxing a query adaptively, a weighted model for ranking relaxed queries, and algorithms for adaptive relaxation of a query and top-k query processing. We discuss results from a comprehensive set of experiments that show the effectiveness and the efficiency of our approach.  相似文献   

11.
Facility location decisions are usually determined by cost and coverage related factors although empirical studies show that such factors as infrastructure, labor conditions and competition also play an important role in practice. The objective of this paper is to develop a multi-objective facility location model accounting for a wide range of factors affecting decision-making. The proposed model selects potential facilities from a set of pre-defined alternative locations according to the number of customers, the number of competitors and real-estate cost criteria. However, that requires large amount of both spatial and non-spatial input data, which could be acquired from distributed data sources over the Internet. Therefore, a computational approach for processing input data and representation of modeling results is elaborated. It is capable of accessing and processing data from heterogeneous spatial and non-spatial data sources. Application of the elaborated data gathering approach and facility location model is demonstrated using an example of fast food restaurants location problem.  相似文献   

12.
Facility location decisions are usually determined by cost and coverage related factors although empirical studies show that such factors as infrastructure, labor conditions and competition also play an important role in practice. The objective of this paper is to develop a multi-objective facility location model accounting for a wide range of factors affecting decision-making. The proposed model selects potential facilities from a set of pre-defined alternative locations according to the number of customers, the number of competitors and real-estate cost criteria. However, that requires large amount of both spatial and non-spatial input data, which could be acquired from distributed data sources over the Internet. Therefore, a computational approach for processing input data and representation of modeling results is elaborated. It is capable of accessing and processing data from heterogeneous spatial and non-spatial data sources. Application of the elaborated data gathering approach and facility location model is demonstrated using an example of fast food restaurants location problem.  相似文献   

13.
针对当前互联网上海量的专家信息散乱分布以及准确性和实效性差的问题,设计并实现了一个面向异构信息源的专家搜索系统。该系统使用面向领域的主题爬虫采集技术、中文分词技术、人名消歧技术和学术网络建模技术,实现对互联网上公开的专家科研信息和个人信息自动采集和有效整理。设计了一种利用有向加权图的方法建模学术网络的方法,并通过力导引布局算法可视化地展现专家之间的学术关系网络。专家搜索系统由数据采集模块、数据分析模块、检索模块组成,共收录了1543所高校以及科研单位的共计398432名专家的科研信息和个人信息,为我国学者的学术交流活动提供了便利。  相似文献   

14.
In many of the problems that can be found nowadays, information is scattered across different heterogeneous data sources. Most of the natural language interfaces just focus on a very specific part of the problem (e.g. an interface to a relational database, or an interface to an ontology). However, from the point of view of users, it does not matter where the information is stored, they just want to get the knowledge in an integrated, transparent, efficient, effective, and pleasant way. To solve this problem, this article proposes a generic multi-agent conversational architecture that follows the divide and conquer philosophy and considers two different types of agents. Expert agents are specialized in accessing different knowledge sources, and decision agents coordinate them to provide a coherent final answer to the user. This architecture has been used to design and implement SmartSeller, a specific system which includes a Virtual Assistant to answer general questions and a Bookseller to query a book database. A deep analysis regarding other relevant systems has demonstrated that our proposal provides several improvements at some key features presented along the paper.  相似文献   

15.
Deep learning has shown great strength in many fields and has allowed people to live more conveniently and intelligently. However, deep learning requires a considerable amount of uniform training data, which introduces difficulties in many application scenarios. On the one hand, in real-time systems, training data are constantly generated, but users cannot immediately obtain this vast amount of training data. On the other hand, training data from heterogeneous sources have different data formats. Therefore, existing deep learning frameworks are not able to train all data together. In this paper, we propose the iFusion framework, which achieves efficient intelligence fusion for deep learning from real-time data and heterogeneous data. For real-time data, we train only newly arrived data to obtain a new discrimination model and fuse the previously trained models to obtain the discrimination result. For heterogeneous data, different types of data are trained separately; then, we fuse the different discrimination models so that it is not necessary to consider heterogeneous data formats. We use a method based on Dempster-Shafer theory (DST) to fuse the discrimination models. We apply iFusion to the deep learning of medical image data, and the results of the experiments show the effectiveness of the proposed method.  相似文献   

16.
17.
Uncertain data are common due to the increasing usage of sensors, radio frequency identification(RFID), GPS and similar devices for data collection. The causes of uncertainty include limitations of measurements, inclusion of noise, inconsistent supply voltage and delay or loss of data in transfer. In order to manage, query or mine such data, data uncertainty needs to be considered. Hence,this paper studies the problem of top-k distance-based outlier detection from uncertain data objects. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. The naive approach of distance-based outlier detection makes use of nested loop. This approach is very costly due to the expensive distance function between two uncertain objects. Therefore,a populated-cells list(PC-list) approach of outlier detection is proposed. Using the PC-list, the proposed top-k outlier detection algorithm needs to consider only a fraction of dataset objects and hence quickly identifies candidate objects for top-k outliers. Two approximate top-k outlier detection algorithms are presented to further increase the efficiency of the top-k outlier detection algorithm.An extensive empirical study on synthetic and real datasets is also presented to prove the accuracy, efficiency and scalability of the proposed algorithms.  相似文献   

18.
RDF knowledge graphs (KG) are powerful data structures to represent factual statements created from heterogeneous data sources. KG creation is laborious and demands data management techniques to be executed efficiently. This paper tackles the problem of the automatic generation of KG creation processes declaratively specified; it proposes techniques for planning and transforming heterogeneous data into RDF triples following mapping assertions specified in the RDF Mapping Language (RML). Given a set of mapping assertions, the planner provides an optimized execution plan by partitioning and scheduling the execution of the assertions. First, the planner assesses an optimized number of partitions considering the number of data sources, type of mapping assertions, and the associations between different assertions. After providing a list of partitions and assertions that belong to each partition, the planner determines their execution order. A greedy algorithm is implemented to generate the partitions’ bushy tree execution plan. Bushy tree plans are translated into operating system commands that guide the execution of the partitions of the mapping assertions in the order indicated by the bushy tree. The proposed optimization approach is evaluated over state-of-the-art RML-compliant engines, and existing benchmarks of data sources and RML triples maps. Our experimental results suggest that the performance of the studied engines can be considerably improved, particularly in a complex setting with numerous triples maps and large data sources. As a result, engines that time out in complex cases are enabled to produce at least a portion of the KG applying the planner.  相似文献   

19.
In a number of real life applications, scientists do not have access to temporal data, since budget for data acquisition is always limited. Here we challenge the problem of causal inference between groups of heterogeneous non-temporal observations obtained from multiple sources. We consider a family of probabilistic algorithms for causal inference based on an assumption that in case where X causes Y, P(X) and P(Y|X) are statistically independent. For a number of real world applications, deep learning methods were reported to achieve the most accurate empirical performance, what motivates us to use deep Boltzmann machines to approximate the marginal and conditional probabilities of heterogeneous observations as accurate as possible.We introduce a novel algorithm to infer causal relationships between blocks of variables. The proposed method was tested on a benchmark of multivariate cause-effect pairs. We show by our experiments that our method achieves the state-of-the-art empirical accuracy, and sometimes outperforms the state-of-the-art methods. An important part of our contribution is an application of the proposed algorithm to an original medical data set, where we explore relations between alimentary patters, human gut microbiome composition, and health status.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号