期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An evidential reasoning approach to attribute value conflictresolution in database integration

Ee-Peng Lim Srivastava J. Shekhar S. 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(5):707-723

Resolving domain incompatibility among independently developed databases often involves uncertain information. DeMichiel (1989) showed that uncertain information can be generated by the mapping of conflicting attributes to a common domain, based on some domain knowledge. We show that uncertain information can also arise when the database integration process requires information not directly represented in the component databases, but can be obtained through some summary of data. We therefore propose an extended relational model based on Dempster-Shafer theory of evidence to incorporate such uncertain knowledge about the source databases. The extended relation uses evidence sets to represent uncertainty in information, which allow probabilities to be attached to subsets of possible domain values. We also develop a full set of extended relational operations over the extended relations. In particular, an extended union operation has been formalized to combine two extended relations using Dempster's rule of combination. The closure and boundedness properties of our proposed extended operations are formulated. We also illustrate the use of extended operations by some query examples 相似文献

2.

Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries

Henri Prade Claudette Testemale 《Information Sciences》1984,34(2):115-143

This paper deals with relational databases which are extended in the sense that fuzzily known values are allowed for attributes. Precise as well as partial (imprecise, uncertain) knowledge concerning the value of the attributes are represented by means of [0,1]-valued possibility distributions in Zadeh's sense. Thus, we have to manipulate ordinary relations on Cartesian products of sets of fuzzy subsets rather than fuzzy relations. Besides, vague queries whose contents are also represented by possibility distributions can be taken into account. The basic operations of relational algebra, union, intersection, Cartesian product, projection, and selection are extended in order to deal with partial information and vague queries. Approximate equalities and inequalities modeled by fuzzy relations can also be taken into account in the selection operation. Then, the main features of a query language based on the extended relational algebra are presented. An illustrative example is provided. This approach, which enables a very general treatment of relational databases with fuzzy attribute values, makes an extensive use of dual possibility and necessity measures. 相似文献

3.

Adding time dimension to relational model and extending relational algebra 总被引：1，自引：0，他引：1

Abdullah Uz Tansel 《Information Systems》1986,11(4):343-355

A methodology for adding the time dimension to the relational model is proposed and relational algebra is extended for this purpose. We propose time-stamping attributes instead of adding time to tuples. Each attribute value is stored along with a time interval over which it is valid. Non-first normal form realations are used. A relation can have atomic, set-valued, triplet-valued, or set triplet-valued attributes. The last two types of attributes preserve the time (history). Furthermore, new algebraic operations are defined to extract information from historical relations. These operations convert one attribute type to another and do selection over the time dimension. Algebraic rules and identities for the new operations are also included. 相似文献

4.

Extending the fuzzy database with fuzzy numbers

Billy P. Buckles Frederick E. Petry 《Information Sciences》1984,34(2):145-155

The fuzzy relational database model originated by the authors permits fuzzy domain values from a discrete, finite universe. The model is extended here by demonstrating that fuzzy numbers may be employed as domain values without loss of consistency with respect to representation or the relational algebra. Where equivalence is required in an ordinary relational database, similarity is employed in a fuzzy relational database. For discrete, finite universes, similarity between atomic elements is described via a fuzzy similarity relation with max-min transitivity. Two or more fuzzy numbers are defined to be α-similar if their union forms a continuous α-level set over the real line. This convention effects the partitioning of fuzzy number domains that is necessary to assure the well-definedness of the fuzzy relational algebra. 相似文献

5.

基于扩展领域模型的有名属性抽取 总被引：1，自引：0，他引：1

王宇谭松波廖祥文曾依灵《计算机研究与发展》2010,47(9):1567-1573

网页信息抽取是互联网挖掘的重要课题.为了自动化抽取过程,最新的研究利用特定领域的特征,通过机器学习方法对信息抽取过程进行统一建模.但是,对领域特征的依赖使得这类方法难以推广到其他领域中去.因此,对信息抽取问题进行了分析,从中分离出一个可以完全自动化的信息抽取子任务,即有名属性抽取任务.在多个领域的数据集上进行的统计表明,这个子任务覆盖了60%以上的待抽取属性,因此它在整个信息抽取中占有重要地位.并给出了一种基于扩展领域模型的有名属性抽取方法,实验结果表明,这种方法的准确率接近或大于80%,召回率大于90%. 相似文献

6.

A functional processor for the relational algebra on a microcomputer

T. H. Merrett Ted Van Rossum 《Software》1986,16(11):987-1002

We have built on the U.C.S.D. P-system (running on an IBM Personal Computer) a relational algebra processor, MRDS/FS, which is extremely powerful and which supports a functional syntax for the programmer-user. The relational algebra is provided in the extended operators μ-join, σ-join, project and select. The domain algebra is fully implemented for the first time, giving operations on attributes: arithmetic, logic, comparison and four different categories of aggregation of these. A strictly functional syntax is provided, permitting user-defined functions using the relational and domain algebras as primitive operations. An interactive editor permits the creation, copying and changing of both relations and user-defined functions. 相似文献

7.

Tractable XML data exchange via relations

Rada CHIRKOVA Leonid LIBKIN Juan L. REUTTER 《Frontiers of Computer Science》2012,6(3):243-263

We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with the source information. The problem has primarily been studied in the relational context, in which data-exchange systems have also been built. Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and in XML data exchange. This indicates that to make the use of relational systems possible, restrictions have to be imposed on XML schemas and mappings, as well as on XML shredding schemes. We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the in-lining technique for data-exchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system. 相似文献

8.

空值环境下关系数据库的更新　Ⅰ：扩展关系模型及基本运算

郝忠孝马宗民《计算机学报》1994,(7)

本文从空值语义及更新操作的关系出发，提出了一种新的扩展关系模型，用以组织更新操作下的含有空值的关系数据库中的信息．同时，定义了这种模型下的基本关系代数运算．为实现空值环境下关系数据库的数据更新奠定了基础．相似文献

9.

The Relational Structure of Belief Networks

S.K.M. Wong 《Journal of Intelligent Information Systems》2001,16(2):117-148

This paper demonstrates the relational structure of belief networks by establishing an extended relational data model which can be applied to both belief networks and relational applications. It is demonstrated that a Markov network can be represented as a generalized acyclic join dependency (GAJD) which is equivalent to a set of conflict-free generalized multivalued dependencies (GMVDs). A Markov network can also be characterized by an entropy function, which greatly facilitates the manipulation of GMVDs. These results are extensions of results established in relational theory. It is shown that there exists a complete set of inference rules for the GMVDs. This result is important from a probabilistic perspective. All the above results explicitly demonstrate that there is a unified model for relational database and probabilistic reasoning systems. This is not only important from a theoretical point of view in that one model has been developed for a number of domains, but also from a practical point of view in that one system can be implemented for both domains. This implemented system can take advantage of the performance enhancing techniques developed in both fields. Thereby, this paper serves as a theoretical foundation for harmonizing these two important information domains. 相似文献

10.

Updating extended possibility‐based fuzzy relational databases

Z.M. Ma Li Yan 《国际智能系统杂志》2007,22(3):237-258

Two kinds of fuzziness in attribute values of the fuzzy relational databases can be distinguished: one is that attribute values are possibility distributions and the other is that there are resemblance relations in attribute domains. The fuzzy relational databases containing these two kinds of fuzziness simultaneously are called extended possibility‐based fuzzy relational databases. In this article, we focus on such fuzzy relational databases and investigate three update operations for the fuzzy relational databases, which are Insertion, Deletion, and Modification, respectively. We develop the strategies and implementation algorithms of these operations. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 237–258, 2007. 相似文献

11.

Learning without case records: a mapping of the repertory grid technique onto knowledge acquisition from examples

Clive Nicholson 《Expert Systems》1992,9(2):79-87

Abstract: In building a knowledge-based system, it is sometimes possible to save time by applying some machine learning process to a set of historical cases. In some problem domains, however, such cases may not be available. In addition, the classes, attributes and attribute values that comprise the partial domain model in terms of which cases are expressed may also not be available explicitly. In these circumstances, the repertory grid technique offers a single process for both building a partial domain model and generating a training set of examples. Alternatively, examples can be elicited directly. This paper explores the relationship between knowledge acquisition from examples and the repertory grid technique, and discusses the shared need for machine learning. Fragments of business-strategy knowledge are used to illustrate the discussion. 相似文献

12.

Aggregation of imprecise and uncertain information in databases 总被引：4，自引：0，他引：4

McClean S. Scotney B. Shapcott M. 《Knowledge and Data Engineering, IEEE Transactions on》2001,13(6):902-912

Information stored in a database is often subject to uncertainty and imprecision. Probability theory provides a well-known and well understood way of representing uncertainty and may thus be used to provide a mechanism for storing uncertain information in a database. We consider the problem of aggregation using an imprecise probability data model that allows us to represent imprecision by partial probabilities and uncertainty using probability distributions. Most work to date has concentrated on providing functionality for extending the relational algebra with a view to executing traditional queries on uncertain or imprecise data. However, for imprecise and uncertain data, we often require aggregation operators that provide information on patterns in the data. Thus, while traditional query processing is tuple-driven, processing of uncertain data is often attribute-driven where we use aggregation operators to discover attribute properties. The aggregation operator that we define uses the Kullback-Leibler information divergence between the aggregated probability distribution and the individual tuple values to provide a probability distribution for the domain values of an attribute or group of attributes. The provision of such aggregation operators is a central requirement in furnishing a database with the capability to perform the operations necessary for knowledge discovery in databases 相似文献

13.

Data Mining in Large Databases Using Domain Generalization Graphs 总被引：5，自引：0，他引：5

Robert J. Hilderman Howard J. Hamilton Nick Cercone 《Journal of Intelligent Information Systems》1999,13(3):195-234

Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting. 相似文献

14.

Distribution-based aggregation for relational learning with identifier attributes

Claudia Perlich Foster Provost 《Machine Learning》2006,62(1-2):65-105

Identifier attributes—very high-dimensional categorical attributes such as particular product ids or people's names—rarely are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative to have communicated with a particular set of people or to have purchased a particular set of products. A key limitation of existing relational modeling techniques is how they aggregate bags (multisets) of values from related entities. The aggregations used by existing methods are simple summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM, or COUNT. This paper's main contribution is the introduction of aggregation operators that capture more information about the value distributions, by storing meta-data about value distributions and referencing this meta-data when aggregating—for example by computing class-conditional distributional distances. Such aggregations are particularly important for aggregating values from high-dimensional categorical attributes, for which the simple aggregates provide little information. In the first half of the paper we provide general guidelines for designing aggregation operators, introduce the new aggregators in the context of the relational learning system ACORA (Automated Construction of Relational Attributes), and provide theoretical justification. We also conjecture special properties of identifier attributes, e.g., they proxy for unobserved attributes and for information deeper in the relationship network. In the second half of the paper we provide extensive empirical evidence that the distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical attributes, and in support of the aforementioned conjectures. Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at . 相似文献

15.

Selection of indexes to memory-resident entities for semantic datamodels

Weddell G.E. 《Knowledge and Data Engineering, IEEE Transactions on》1989,1(2):274-284

A variation of the index selection problem for an extended relational model when all encoding of information is memory resident is discussed. The data model is the relational model extended in two ways that are common with semantic data models. One consequence of memory residence is that the search space of possible indexes is enlarged to the extent that previous methods requiring some consideration of each possibility are no longer possible. An instance of the index selection problem that includes a set of partial match queries in addition to the input schema is given. It is assumed that the set is determined by an initial phase of query optimization when applied to a fixed set of more general forms of queries that characterize the way in which information is accessed for an application. An initial choice of indexes is made, only considering their suitability for answering the partial match queries 相似文献

16.

Discovering relational and numerical expressions from plan traces for learning action models

Segura-Muros José Á. Pérez Raúl Fernández-Olivares Juan 《Applied Intelligence》2021,51(11):7973-7989

In this paper, we propose a domain learning process build on a machine learning-based process that, starting from plan traces with (partially known) intermediate states, returns a planning domain with numeric predicates, and expressive logical/arithmetic relations between domain predicates written in the planning domain definition language (PDDL). The novelty of our approach is that it can discover relations with little information about the ontology of the target domain to be learned. This is achieved by applying a selection of preprocessing, regression, and classification techniques to infer information from the input plan traces. These techniques are used to prepare the planning data, discover relational/numeric expressions, or extract the preconditions and effects of the domain’s actions. Our solution was evaluated using several metrics from the literature, taking as experimental data plan traces obtained from several domains from the International Planning Competition. The experiments demonstrate that our proposal—even with high levels of incompleteness—correctly learns a wide variety of domains discovering relational/arithmetic expressions, showing F-Score values above 0.85 and obtaining valid domains in most of the experiments.

相似文献

17.

Some practical aspects of fuzzy database techniques: An example

R. Vandenberghe A. Van Schooten R. De Caluwe E. E. Kerre 《Information Systems》1989,14(6):465-472

相似文献

18.

Handling fuzzy information in extended possibility‐based fuzzy relational databases

Z. M. Ma F. Mili 《国际智能系统杂志》2002,17(10):925-942

Two kinds of fuzziness in attribute values of the fuzzy relational databases can be distinguished: One is that attribute values are possibility distributions, and the other is that there are resemblance relations in attribute domains. The fuzzy relational databases containing these two kinds of fuzziness simultaneously are called extended possibility‐based fuzzy relational databases. In this paper, we focus on such fuzzy relational databases. We classify two kinds of fuzzy data redundancies and define their removal. On this basis, we define fuzzy relational operations in relational algebra, which, being similar to the conventional relational databases, are complete and sound. In particular, we investigate fuzzy querying strategies and give the form of fuzzy querying with SQL. © 2002 Wiley Periodicals, Inc. 相似文献

19.

Aggregate keyword search on large relational databases 总被引：2，自引：1，他引：1

Bin Zhou Jian Pei 《Knowledge and Information Systems》2012,30(2):283-318

Keyword search has been recently extended to relational databases to retrieve information from text-rich attributes. However, all the existing methods focus on finding individual tuples matching a set of query keywords from one table or the join of multiple tables. In this paper, we motivate a novel problem of aggregate keyword search: finding minimal group-bys covering a set of query keywords well, which is useful in many applications. We develop two interesting approaches to tackle the problem. We further extend our methods to allow partial matches and matches using a keyword ontology. An extensive empirical evaluation using both real data sets and synthetic data sets is reported to verify the effectiveness of aggregate keyword search and the efficiency of our methods. 相似文献

20.

Load balanced and optimal disk allocation strategy for partial match queries on multidimensional files

Das S.K. Pinotti C.M. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(12):1211-1219

A multidimensional file is one whose data are characterized by several attributes, each specified in a given domain. A partial match query on a multidimensional file extracts all data whose attributes match the values of one or more attributes specified in the query. The disk allocation problem of a multidimensional file F on a database system with multiple disks accessible in parallel is the problem of distributing F among the disks such that the data qualifying for each partial match query are distributed as evenly as possible among the disks of the system. We propose an optimal solution to this problem for multidimensional files with pairwise prime domains based on a large and flexible class of maximum distance separable codes, namely, the redundant residue codes. We also introduce a new family of residue codes, called the redundant nonpairwise prime residue codes, to deal with files whose attribute domains are nonpairwise prime. 相似文献