首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Decision tables are widely used in many knowledge-based and decision support systems. They allow relatively complex logical relationships to be represented in an easily understood form and processed efficiently. This paper describes second-order decision tables (decision tables that contain rows whose components have sets of atomic values) and their role in knowledge engineering to: (1) support efficient management and enhance comprehensibility of tabular knowledge acquired by knowledge engineers, and (2) automatically generate knowledge from a tabular set of examples. We show how second-order decision tables can be used to restructure acquired tabular knowledge into a condensed but logically equivalent second-order table. We then present the results of experiments with such restructuring. Next, we describe SORCER, a learning system that induces second-order decision tables from a given database. We compare SORCER with IDTM, a system that induces standard decision tables, and a state-of-the-art decision tree learner, C4.5. Results show that in spite of its simple induction methods, on the average over the data sets studied, SORCER has the lowest error rate.  相似文献   

2.
The problem of downscaling the effects of global scale climate variability into predictions of local hydrology has important implications for water resource management. Our research aims to identify predictive relationships that can be used to integrate solar and ocean-atmospheric conditions into forecasts of regional water flows. In recent work we have developed an induction technique called second-order table compression, in which learning can be viewed as a process that transforms a table consisting of training data into a second-order table (which has sets of atomic values as entries) with fewer rows by merging rows in consistency preserving ways. Here, we apply the second-order table compression technique to generate predictive models of future water inflows of Lake Okeechobee, a primary source of water supply for south Florida. We also describe SORCER, a second-order table compression learning system and compare its performance with three well-established data mining techniques: neural networks, decision tree learning and associational rule mining. SORCER gives more accurate results, on the average, than the other methods with average accuracy between 49% and 56% in the prediction of inflows discretized into four ranges. We discuss the implications of these results and the practical issues in assessing the results from data mining models to guide decision-making.  相似文献   

3.
Relational databases are the most popular repository for structured data, and is thus one of the richest sources of knowledge in the world. In a relational database, multiple relations are linked together via entity-relationship links. Multirelational classification is the procedure of building a classifier based on information stored in multiple relations and making predictions with it. Existing approaches of inductive logic programming (recently, also known as relational mining) have proven effective with high accuracy in multirelational classification. Unfortunately, most of them suffer from scalability problems with regard to the number of relations in databases. In this paper, we propose a new approach, called CrossMine, which includes a set of novel and powerful methods for multirelational classification, including 1) tuple ID propagation, an efficient and flexible method for virtually joining relations, which enables convenient search among different relations, 2) new definitions for predicates and decision-tree nodes, which involve aggregated information to provide essential statistics for classification, and 3) a selective sampling method for improving scalability with regard to the number of tuples. Based on these techniques, we propose two scalable and accurate methods for multirelational classification: CrossMine-Rule, a rule-based method and CrossMine-Tree, a decision-tree-based method. Our comprehensive experiments on both real and synthetic data sets demonstrate the high scalability and accuracy of the CrossMine approach.  相似文献   

4.
The theory of functional dependencies is based on relations, i.e. sets of tuples. Over relations, the class of functional dependencies subsumes the class of keys. Commercial database systems permit the storage of bags of tuples where duplicate tuples can occur. Over bags, keys and functional dependencies interact differently from how they interact over relations.We establish finite ground axiomatizations of keys and functional dependencies over bags, and show a strong correspondence to goal and definite clauses in classical propositional logic. We define a syntactic Boyce-Codd-Heath Normal Form condition, and show that the condition characterizes schemata that will never have any redundant data value occurrences in their instances. The results close the gap between the existing set-based theory of data dependencies and database practice where bags are permitted.  相似文献   

5.
在现实世界中,有些对象比其它的更具有一般性,两个对象的相似度可能不对称。两个对象之间的相似关系可能既不对称又不传递,我们用弱相似关系来表示。本文提出了非对称冗余元组来处理模糊关系数据库中的弱相似关系。非对称冗余元组的概念是模糊关系数据库的冗余概念的推广,它用来删除一些冗余信息,表示更精确的信息。  相似文献   

6.
Users of a relational database must explicitly navigate between relations in order to establish a connection among a set of attributes spanning several relation schemes. While a universal scheme interface to a relational database provides users with automatic navigation, it usually imposes on the database a unique role assumption. This assumption requires every attribute name to represent a unique role in the database, so that connections among sets of attributes are unambiguous.  相似文献   

7.
采用关系数据库模型进行建模,对于同一关系框架上的数据定义了相似数据集。对单个数据集,通过关系拆分对数据库模型进行规范化处理,去除了关系内部的数据冗余;对多个数据集之间的压缩提出了一种基于0-1状态标记序列的增量式无损压缩算法,压缩后的数据可以快速地完全解压缩。试验结果表明,算法可以实现对相似数据集的高效无损压缩和快速查询。  相似文献   

8.
This paper describes a database model based on the original rough sets theory. Its rough relations permit the representation of a rough set of tuples not definable in terms of the elementary classes, except through use of lower and upper approximations. The rough relational database model also incorporates indiscernibility in the representation and in all the operators of the rough relational algebra. This indiscernibility is based strictly on equivalence classes which must be defined for every attribute domain. There are several obvious applications for which the rough relational database model can more accurately model an enterprise than does the standard relational model. These include systems involving ambiguous, imprecise, or uncertain data. Retrieval over mismatched domains caused by the merging of one or more applications can be facilitated by the use of indiscernibility, and naive system users can achieve greater recall with the rough relational database. In addition, applications inherently “rough” could be more easily implemented and maintained in the rough relational database.  相似文献   

9.
Multirelational classification aims to discover patterns across multiple interlinked tables (relations) in a relational database. In many large organizations, such a database often spans numerous departments and/or subdivisions, which are involved in different aspects of the enterprise such as customer profiling, fraud detection, inventory management, financial management, and so on. When considering classification, different phases of the knowledge discovery process are affected by economic utility. For instance, in the data preprocessing process, one must consider the cost associated with acquiring, cleaning, and transforming large volumes of data. When training and testing the data mining models, one has to consider the impact of the data size on the running time of the learning algorithm. In order to address these utility-based issues, the paper presents an approach to create a pruned database for multirelational classification, while minimizing predictive performance loss on the final model. Our method identifies a set of strongly uncorrelated subgraphs from the original database schema, to use for training, and discards all others. The experiments performed show that our strategy is able to, without sacrificing predictive accuracy, significantly reduce the size of the databases, in terms of the number of relations, tuples, and attributes.The approach prunes the sizes of databases by as much as 94 %. Such reduction also results in decreasing computational cost of the learning process. The method improves the multirelational learning algorithms’ execution time by as much as 80 %. In particular, our results demonstrate that one may build an accurate model with only a small subset of the provided database.  相似文献   

10.
Individual facts in the real world are represented by tuples in database relations (instances), while universal (time-independent) facts are treated as semantic constraints regarding database relation schemata. One important role of these semantic constraints is their use as integrity constraints that must not be violated by update operations. Among these, static constraints are represented by assertions, which are extended relational calculi in which every tuple variable is bound over a relation, and become true in a consistent database. When an update has been made on a consistent database, it is necessary to ascertain if the updated database is still consistent. It can be done by evaluating all the assertions in the updated database, but this is very time consuming. If a given assertion is in one of some classes, it is possible to devise an efficient validation procedure, which before the update is actually applied determines if the update violates the given assertion. In many cases a simplified form can be found, by examining whose value the properness of the given update is determined. The existence of such an efficient procedure and simplified form depends on what class the given assertion belongs, and also on what type of the update is to be made on what relation. This paper presents a method of finding such a procedure and simplified form using several simple syntactical transformation rules regarding extended relational calculi. This method is based on several basic properties in prepositional logic and many-sorted predicate logic.  相似文献   

11.
Nowadays, with more and more data publicly available on the Internet, it is increasingly important to ensure the integrity of these data. The traditional solution is to use a digital signature scheme. However, a digital signature can only detect whether the entire data set has been modified; it cannot localize and characterize the modifications. In this paper, a novel fragile watermarking scheme is proposed to detect malicious modifications of database relations. In the proposed scheme, all tuples in a database relation are first securely divided into groups; watermarks are embedded and verified group by group independently. The embedded watermarks cannot only detect but also localize, and even characterize, the modifications made to the database. In the worst case, the modifications can be narrowed down to tuples in a group. Rigorous analysis shows that the modifications can be detected and localized with high probability, which is also demonstrated by our experimental results.  相似文献   

12.
《Neurocomputing》1999,24(1-3):37-54
This paper presents some highlights in the application of neural networks, fuzzy logic, genetic algorithms, and rough sets to automated knowledge acquisition. These techniques are capable of dealing with inexact and imprecise problem domains and have been demonstrated to be useful in the solution of classification problems. It addresses the issue of the application of appropriate evaluation criteria such as rule base accuracy and comprehensibility for new knowledge acquisition techniques. An empirical study is then described in which three approaches to knowledge acquisition are investigated. The first approach combines neural networks and fuzzy logic, the second, genetic algorithms and fuzzy logic, and in the third a rough sets approach has been examined, and compared. In this study neural network and genetic algorithm fuzzy rule induction systems have been developed and applied to three classification problems. Rule induction software based on rough sets theory was also used to generate and test rule bases for the same data. A comparison of these approaches with the C4.5 inductive algorithm was also carried out. Our research to date indicates that, based on the evaluation criteria used, the genetic/fuzzy approach compares more than favourably with the neuro/fuzzy and rough set approaches. On the data sets used the genetic algorithm system displays a higher accuracy of classification and rule base comprehensibility than the C4.5 inductive algorithm.  相似文献   

13.
Mengchi Liu 《Software》2001,31(5):409-443
The Relationlog system is a novel persistent deductive database system for advanced data and knowledge‐based applications. It directly supports the storage and inference of data with complex structures, especially data supported in nested relational and complex‐object models. The Relationlog system supports the Relationlog query language, which is a typed extension of Datalog with tuples and sets and stands in the same relationship to the nested relational and complex‐object models as Datalog stands to the relational model. It also supports an SQL‐like data definition language and a declarative data manipulation language. This article introduces the Relationlog language, discusses the system architecture, the design decisions incorporated within its implementation, and our experience in developing the system. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

14.
In a relational database system, a number of concurrent users execute queries for Read and Update on shared data tuples. Domains are used to ensure data integrity. At data creation time for each column of a tuple set, a set of rules that data must satisfy before they are stored in the database, can be specified by the user. This set of rules is called a Domain. The User Sharing and Authority features of the system are discussed in detail, showing the flexibility for the end user to grant and retract sharing to other users and applications, as well as the several levels of control of updatability of shared sets of tuples and Relational Views on sets of tuples. In addition, further facilities for access control on top of the sharing functions are explained. These are the Trap-, and the Passwords on tuple sets-, facilities which are available to end users or applications.  相似文献   

15.
The development and investigation of efficient methods of parallel processing of very large databases using the columnar data representation designed for computer cluster is discussed. An approach that combines the advantages of relational and column-oriented DBMSs is proposed. A new type of distributed column indexes fragmented based on the domain-interval principle is introduced. The column indexes are auxiliary structures that are constantly stored in the distributed main memory of a computer cluster. To match the elements of a column index to the tuples of the original relation, surrogate keys are used. Resource hungry relational operations are performed on the corresponding column indexes rather than on the original relations of the database. As a result, a precomputation table is obtained. Using this table, the DBMS reconstructs the resulting relation. For basic relational operations on column indexes, methods for their parallel decomposition that do not require massive data exchanges between the processor nodes are proposed. This approach improves the class OLAP query performance by hundreds of times.  相似文献   

16.
From data to global generalized knowledge   总被引:1,自引:0,他引:1  
The attribute-oriented induction (AOI) is a useful data mining method that extracts generalized knowledge from relational data and user's background knowledge. The method uses two thresholds, the relation threshold and attribute threshold, to guide the generalization process, and output generalized knowledge, a set of generalized tuples which describes the major characteristics of the target relation. Although AOI has been widely used in various applications, a potential weakness of this method is that it only provides a snapshot of the generalized knowledge, not a global picture. When thresholds are different, we would obtain different sets of generalized tuples, which also describe the major characteristics of the target relation. If a user wants to ascertain a global picture of induction, he or she must try different thresholds repeatedly. That is time-consuming and tedious. In this study, we propose a global AOI (GAOI) method, which employs the multiple-level mining technique with multiple minimum supports to generate all interesting generalized knowledge at one time. Experiment results on real-life dataset show that the proposed method is effective in finding global generalized knowledge.  相似文献   

17.
Most current applications of inductive learning in databases take place in the context of a single extensional relation. The authors place inductive learning in the context of a set of relations defined either extensionally or intentionally in the framework of deductive databases. LINUS, an inductive logic programming system that induces virtual relations from example positive and negative tuples and already defined relations in a deductive database, is presented. Based on the idea of transforming the problem of learning relations to attribute-value form, several attribute-value learning systems are incorporated. As the latter handle noisy data successfully, LINUS is able to learn relations from real-life noisy databases. The use of LINUS for learning virtual relations is illustrated, and a study of its performance on noisy data is presented  相似文献   

18.
基于关系数据库的实时数据压缩探讨   总被引:1,自引:0,他引:1  
基于关系数据库,探讨实时数据在关系表中的压缩存储和访问技术.采用Oracle数据库内部语言,对给出的压缩方法加以实现.程序运行结果表明,针对大量的实时数据具有良好的压缩效果,压缩后数据量急剧减少,在数据时间间隔频率不高和采集点数较少时,给出的方法可作为实时数据库的替代方案.  相似文献   

19.
《Information Systems》1999,24(7):535-554
We extend the relational data model to incorporate linear orderings into data domains, which we call the ordered relational model. The conventional Functional Dependencies (FDs) are examined in the context of ordered relational databases by using the notion of System Ordering Independence (SOI), which refers to the desirable scenario that the ordering of tuples in a relation is independent of the implementation of the underlying DBMS. We also extend Armstrong's axiom system for FDs to object relations, which are a subclass of ordered relations that allow us to view tuples as objects. We formally define Ordered Functional Dependencies (OFDs) for the extended model by means of two possible extensions of domains, pointwise-orderings and lexicographical orderings. We first present a sound and complete axiom system for OFDs in the case of pointwise-orderings and then establish a sound and complete set of chase rules for OFDs in the case of lexicographical orderings. Our main result shows that the implication problems for both cases of OFDs are decidable, and that it is linear time for the case of pointwise-orderings.  相似文献   

20.
The nested relational model allows relations that are not in first normal form. This paper gives an extension of Datalog rules for nested relations. In our approach, nested Datalog is a natural extension of Datalog introduced for the relational data model. A nested Datalog program has a hierarchical structure of rules and subprograms to manipulate relation values of nested relations. We introduce a new category of predicate symbols, the variable predicate symbols to refer to tuples of subrelations. The notion of soundness, safety and consistency is defined to avoid undesirable nested Datalog programs. The evaluation of nested Datalog is given in terms of the nested relational algebra. Finally, we relate the expressive power of nonrecursive nested Datalog to the power of nested relational algebra and safe nested tuple relational calculus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号