首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Resolving domain incompatibility among independently developed databases often involves uncertain information. DeMichiel (1989) showed that uncertain information can be generated by the mapping of conflicting attributes to a common domain, based on some domain knowledge. We show that uncertain information can also arise when the database integration process requires information not directly represented in the component databases, but can be obtained through some summary of data. We therefore propose an extended relational model based on Dempster-Shafer theory of evidence to incorporate such uncertain knowledge about the source databases. The extended relation uses evidence sets to represent uncertainty in information, which allow probabilities to be attached to subsets of possible domain values. We also develop a full set of extended relational operations over the extended relations. In particular, an extended union operation has been formalized to combine two extended relations using Dempster's rule of combination. The closure and boundedness properties of our proposed extended operations are formulated. We also illustrate the use of extended operations by some query examples  相似文献   

2.
This paper deals with relational databases which are extended in the sense that fuzzily known values are allowed for attributes. Precise as well as partial (imprecise, uncertain) knowledge concerning the value of the attributes are represented by means of [0,1]-valued possibility distributions in Zadeh's sense. Thus, we have to manipulate ordinary relations on Cartesian products of sets of fuzzy subsets rather than fuzzy relations. Besides, vague queries whose contents are also represented by possibility distributions can be taken into account. The basic operations of relational algebra, union, intersection, Cartesian product, projection, and selection are extended in order to deal with partial information and vague queries. Approximate equalities and inequalities modeled by fuzzy relations can also be taken into account in the selection operation. Then, the main features of a query language based on the extended relational algebra are presented. An illustrative example is provided. This approach, which enables a very general treatment of relational databases with fuzzy attribute values, makes an extensive use of dual possibility and necessity measures.  相似文献   

3.
Adding time dimension to relational model and extending relational algebra   总被引:1,自引:0,他引:1  
A methodology for adding the time dimension to the relational model is proposed and relational algebra is extended for this purpose. We propose time-stamping attributes instead of adding time to tuples. Each attribute value is stored along with a time interval over which it is valid. Non-first normal form realations are used. A relation can have atomic, set-valued, triplet-valued, or set triplet-valued attributes. The last two types of attributes preserve the time (history). Furthermore, new algebraic operations are defined to extract information from historical relations. These operations convert one attribute type to another and do selection over the time dimension. Algebraic rules and identities for the new operations are also included.  相似文献   

4.
The fuzzy relational database model originated by the authors permits fuzzy domain values from a discrete, finite universe. The model is extended here by demonstrating that fuzzy numbers may be employed as domain values without loss of consistency with respect to representation or the relational algebra. Where equivalence is required in an ordinary relational database, similarity is employed in a fuzzy relational database. For discrete, finite universes, similarity between atomic elements is described via a fuzzy similarity relation with max-min transitivity. Two or more fuzzy numbers are defined to be α-similar if their union forms a continuous α-level set over the real line. This convention effects the partitioning of fuzzy number domains that is necessary to assure the well-definedness of the fuzzy relational algebra.  相似文献   

5.
基于扩展领域模型的有名属性抽取   总被引:1,自引:0,他引:1  
网页信息抽取是互联网挖掘的重要课题.为了自动化抽取过程,最新的研究利用特定领域的特征,通过机器学习方法对信息抽取过程进行统一建模.但是,对领域特征的依赖使得这类方法难以推广到其他领域中去.因此,对信息抽取问题进行了分析,从中分离出一个可以完全自动化的信息抽取子任务,即有名属性抽取任务.在多个领域的数据集上进行的统计表明,这个子任务覆盖了60%以上的待抽取属性,因此它在整个信息抽取中占有重要地位.并给出了一种基于扩展领域模型的有名属性抽取方法,实验结果表明,这种方法的准确率接近或大于80%,召回率大于90%.  相似文献   

6.
We have built on the U.C.S.D. P-system (running on an IBM Personal Computer) a relational algebra processor, MRDS/FS, which is extremely powerful and which supports a functional syntax for the programmer-user. The relational algebra is provided in the extended operators μ-join, σ-join, project and select. The domain algebra is fully implemented for the first time, giving operations on attributes: arithmetic, logic, comparison and four different categories of aggregation of these. A strictly functional syntax is provided, permitting user-defined functions using the relational and domain algebras as primitive operations. An interactive editor permits the creation, copying and changing of both relations and user-defined functions.  相似文献   

7.
We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with the source information. The problem has primarily been studied in the relational context, in which data-exchange systems have also been built. Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and in XML data exchange. This indicates that to make the use of relational systems possible, restrictions have to be imposed on XML schemas and mappings, as well as on XML shredding schemes. We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the in-lining technique for data-exchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system.  相似文献   

8.
本文从空值语义及更新操作的关系出发,提出了一种新的扩展关系模型,用以组织更新操作下的含有空值的关系数据库中的信息.同时,定义了这种模型下的基本关系代数运算.为实现空值环境下关系数据库的数据更新奠定了基础.  相似文献   

9.
This paper demonstrates the relational structure of belief networks by establishing an extended relational data model which can be applied to both belief networks and relational applications. It is demonstrated that a Markov network can be represented as a generalized acyclic join dependency (GAJD) which is equivalent to a set of conflict-free generalized multivalued dependencies (GMVDs). A Markov network can also be characterized by an entropy function, which greatly facilitates the manipulation of GMVDs. These results are extensions of results established in relational theory. It is shown that there exists a complete set of inference rules for the GMVDs. This result is important from a probabilistic perspective. All the above results explicitly demonstrate that there is a unified model for relational database and probabilistic reasoning systems. This is not only important from a theoretical point of view in that one model has been developed for a number of domains, but also from a practical point of view in that one system can be implemented for both domains. This implemented system can take advantage of the performance enhancing techniques developed in both fields. Thereby, this paper serves as a theoretical foundation for harmonizing these two important information domains.  相似文献   

10.
Two kinds of fuzziness in attribute values of the fuzzy relational databases can be distinguished: one is that attribute values are possibility distributions and the other is that there are resemblance relations in attribute domains. The fuzzy relational databases containing these two kinds of fuzziness simultaneously are called extended possibility‐based fuzzy relational databases. In this article, we focus on such fuzzy relational databases and investigate three update operations for the fuzzy relational databases, which are Insertion, Deletion, and Modification, respectively. We develop the strategies and implementation algorithms of these operations. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 237–258, 2007.  相似文献   

11.
Abstract: In building a knowledge-based system, it is sometimes possible to save time by applying some machine learning process to a set of historical cases. In some problem domains, however, such cases may not be available. In addition, the classes, attributes and attribute values that comprise the partial domain model in terms of which cases are expressed may also not be available explicitly. In these circumstances, the repertory grid technique offers a single process for both building a partial domain model and generating a training set of examples. Alternatively, examples can be elicited directly. This paper explores the relationship between knowledge acquisition from examples and the repertory grid technique, and discusses the shared need for machine learning. Fragments of business-strategy knowledge are used to illustrate the discussion.  相似文献   

12.
Aggregation of imprecise and uncertain information in databases   总被引:4,自引:0,他引:4  
Information stored in a database is often subject to uncertainty and imprecision. Probability theory provides a well-known and well understood way of representing uncertainty and may thus be used to provide a mechanism for storing uncertain information in a database. We consider the problem of aggregation using an imprecise probability data model that allows us to represent imprecision by partial probabilities and uncertainty using probability distributions. Most work to date has concentrated on providing functionality for extending the relational algebra with a view to executing traditional queries on uncertain or imprecise data. However, for imprecise and uncertain data, we often require aggregation operators that provide information on patterns in the data. Thus, while traditional query processing is tuple-driven, processing of uncertain data is often attribute-driven where we use aggregation operators to discover attribute properties. The aggregation operator that we define uses the Kullback-Leibler information divergence between the aggregated probability distribution and the individual tuple values to provide a probability distribution for the domain values of an attribute or group of attributes. The provision of such aggregation operators is a central requirement in furnishing a database with the capability to perform the operations necessary for knowledge discovery in databases  相似文献   

13.
Data Mining in Large Databases Using Domain Generalization Graphs   总被引:5,自引:0,他引:5  
Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.  相似文献   

14.
Identifier attributes—very high-dimensional categorical attributes such as particular product ids or people's names—rarely are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative to have communicated with a particular set of people or to have purchased a particular set of products. A key limitation of existing relational modeling techniques is how they aggregate bags (multisets) of values from related entities. The aggregations used by existing methods are simple summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM, or COUNT. This paper's main contribution is the introduction of aggregation operators that capture more information about the value distributions, by storing meta-data about value distributions and referencing this meta-data when aggregating—for example by computing class-conditional distributional distances. Such aggregations are particularly important for aggregating values from high-dimensional categorical attributes, for which the simple aggregates provide little information. In the first half of the paper we provide general guidelines for designing aggregation operators, introduce the new aggregators in the context of the relational learning system ACORA (Automated Construction of Relational Attributes), and provide theoretical justification. We also conjecture special properties of identifier attributes, e.g., they proxy for unobserved attributes and for information deeper in the relationship network. In the second half of the paper we provide extensive empirical evidence that the distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical attributes, and in support of the aforementioned conjectures. Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at .  相似文献   

15.
A variation of the index selection problem for an extended relational model when all encoding of information is memory resident is discussed. The data model is the relational model extended in two ways that are common with semantic data models. One consequence of memory residence is that the search space of possible indexes is enlarged to the extent that previous methods requiring some consideration of each possibility are no longer possible. An instance of the index selection problem that includes a set of partial match queries in addition to the input schema is given. It is assumed that the set is determined by an initial phase of query optimization when applied to a fixed set of more general forms of queries that characterize the way in which information is accessed for an application. An initial choice of indexes is made, only considering their suitability for answering the partial match queries  相似文献   

16.

In this paper, we propose a domain learning process build on a machine learning-based process that, starting from plan traces with (partially known) intermediate states, returns a planning domain with numeric predicates, and expressive logical/arithmetic relations between domain predicates written in the planning domain definition language (PDDL). The novelty of our approach is that it can discover relations with little information about the ontology of the target domain to be learned. This is achieved by applying a selection of preprocessing, regression, and classification techniques to infer information from the input plan traces. These techniques are used to prepare the planning data, discover relational/numeric expressions, or extract the preconditions and effects of the domain’s actions. Our solution was evaluated using several metrics from the literature, taking as experimental data plan traces obtained from several domains from the International Planning Competition. The experiments demonstrate that our proposal—even with high levels of incompleteness—correctly learns a wide variety of domains discovering relational/arithmetic expressions, showing F-Score values above 0.85 and obtaining valid domains in most of the experiments.

  相似文献   

17.
18.
Two kinds of fuzziness in attribute values of the fuzzy relational databases can be distinguished: One is that attribute values are possibility distributions, and the other is that there are resemblance relations in attribute domains. The fuzzy relational databases containing these two kinds of fuzziness simultaneously are called extended possibility‐based fuzzy relational databases. In this paper, we focus on such fuzzy relational databases. We classify two kinds of fuzzy data redundancies and define their removal. On this basis, we define fuzzy relational operations in relational algebra, which, being similar to the conventional relational databases, are complete and sound. In particular, we investigate fuzzy querying strategies and give the form of fuzzy querying with SQL. © 2002 Wiley Periodicals, Inc.  相似文献   

19.
Aggregate keyword search on large relational databases   总被引:2,自引:1,他引:1  
Keyword search has been recently extended to relational databases to retrieve information from text-rich attributes. However, all the existing methods focus on finding individual tuples matching a set of query keywords from one table or the join of multiple tables. In this paper, we motivate a novel problem of aggregate keyword search: finding minimal group-bys covering a set of query keywords well, which is useful in many applications. We develop two interesting approaches to tackle the problem. We further extend our methods to allow partial matches and matches using a keyword ontology. An extensive empirical evaluation using both real data sets and synthetic data sets is reported to verify the effectiveness of aggregate keyword search and the efficiency of our methods.  相似文献   

20.
A multidimensional file is one whose data are characterized by several attributes, each specified in a given domain. A partial match query on a multidimensional file extracts all data whose attributes match the values of one or more attributes specified in the query. The disk allocation problem of a multidimensional file F on a database system with multiple disks accessible in parallel is the problem of distributing F among the disks such that the data qualifying for each partial match query are distributed as evenly as possible among the disks of the system. We propose an optimal solution to this problem for multidimensional files with pairwise prime domains based on a large and flexible class of maximum distance separable codes, namely, the redundant residue codes. We also introduce a new family of residue codes, called the redundant nonpairwise prime residue codes, to deal with files whose attribute domains are nonpairwise prime.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号