首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
Inductive databases integrate database querying with database mining. In this article, we present an inductive database system that does not rely on a new data mining query language, but on plain SQL. We propose an intuitive and elegant framework based on virtual mining views, which are relational tables that virtually contain the complete output of data mining algorithms executed over a given data table. We show that several types of patterns and models that are implicitly present in the data, such as itemsets, association rules, and decision trees, can be represented and queried with SQL using a unifying framework. As a proof of concept, we illustrate a complete data mining scenario with SQL queries over the mining views, which is executed in our system.  相似文献   

2.
Analyzing graphs is a fundamental problem in big data analytics, for which DBMS technology does not seem competitive. On the other hand, SQL recursive queries are a fundamental mechanism to analyze graphs in a DBMS, whose processing and optimization are significantly harder than traditional SPJ queries. Columnar DBMSs are a new faster class of database system, with significantly different storage and query processing mechanisms compared to row DBMSs, still the dominating technology. With that motivation in mind, we study the optimization of recursive queries on a columnar DBMS focusing on two fundamental and complementary graph problems: transitive closure and adjacency matrix multiplication. From a query processing perspective we consider the three fundamental relational operators: selection, projection and join (SPJ), where projection subsumes SQL group-by aggregation. We present comprehensive experiments comparing recursive query processing on columnar, row and array DBMSs to analyze large graphs with different shape and density. We study the relative impact of query optimizations and we compare raw speed of DBMSs to evaluate recursive queries on graphs. Results confirm classical query optimizations that keep working well in a columnar DBMS, but their relative impact is different. Most importantly, a columnar DBMS with tuned query optimization is uniformly faster than row and array systems to analyze large graphs, regardless of their shape, density and connectivity. On the other hand, there is no clear winner between the row and array DBMSs.  相似文献   

3.
The Semantic Web’s promise of web-wide data integration requires the inclusion of legacy relational databases,1 i.e. the execution of SPARQL queries on RDF representation of the legacy relational data. We explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment is embodied in a system, Ultrawrap, that encodes a logical representation of the database as an RDF graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course of executing a SPARQL query, the SQL optimizer uses the SQL views that represent a mapping of relational data to RDF, and optimizes its execution. In contrast, related research is predicated on incorporating optimizing transforms as part of the SPARQL to SQL translation, and/or executing some of the queries outside the underlying SQL environment.Ultrawrap is evaluated using two existing benchmark suites that derive their RDF data from relational data through a Relational Database to RDF (RDB2RDF) Direct Mapping and repeated for each of the three major relational database management systems. Empirical analysis reveals two existing relational query optimizations that, if applied to the SQL produced from a simple syntactic translations of SPARQL queries (with bound predicate arguments) to SQL, consistently yield query execution time that is comparable to that of SQL queries written directly for the relational representation of the data. The analysis further reveals the two optimizations are not uniquely required to achieve a successful wrapper system. The evidence suggests effective wrappers will be those that are designed to complement the optimizer of the target database.  相似文献   

4.
Applications ranging from algorithmic trading to scientific data analysis require real-time analytics based on views over databases receiving thousands of updates each second. Such views have to be kept fresh at millisecond latencies. At the same time, these views have to support classical SQL, rather than window semantics, to enable applications that combine current with aged or historical data. In this article, we present the DBToaster system, which keeps materialized views of standard SQL queries continuously fresh as data changes very rapidly. This is achieved by a combination of aggressive compilation techniques and DBToaster’s original recursive finite differencing technique which materializes a query and a set of its higher-order deltas as views. These views support each other’s incremental maintenance, leading to a reduced overall view maintenance cost. DBToaster supports tens of thousands of complete view refreshes per second for a wide range of queries.  相似文献   

5.
6.
The Bayesian classifier is a fundamental classification technique. In this work, we focus on programming Bayesian classifiers in SQL. We introduce two classifiers: Naive Bayes and a classifier based on class decomposition using K-means clustering. We consider two complementary tasks: model computation and scoring a data set. We study several layouts for tables and several indexing alternatives. We analyze how to transform equations into efficient SQL queries and introduce several query optimizations. We conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations, and scalability. Our Bayesian classifier is more accurate than Naive Bayes and decision trees. Distance computation is significantly accelerated with horizontal layout for tables, denormalization, and pivoting. We also compare Naive Bayes implementations in SQL and C++: SQL is about four times slower. Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability.  相似文献   

7.
Discovery of frequent DATALOG patterns   总被引:19,自引:0,他引:19  
Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. We present WARMR, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem.The motivation for this novel approach is twofold. First, exploratory data mining is well supported: WARMR offers the flexibility required to experiment with standard and in particular novel settings not supported by special purpose algorithms. Also, application prototypes based on WARMR can be used as benchmarks in the comparison and evaluation of new special purpose algorithms. Second, the unified representation gives insight to the blurred picture of the frequent pattern discovery domain. Within the DATALOG formulation a number of dimensions appear that relink diverged settings.We demonstrate the frequent query approach and its use on two applications, one in alarm analysis, and one in a chemical toxicology domain.  相似文献   

8.
In this work, we present a semantic query optimization approach to improve the efficiency of the evaluation of a subset of SQL:1999 recursive queries. Using datalog notation, we can state our main contribution as an algorithm that builds a program P′ equivalent to a given program P, when both are applied over a database d satisfying a set of functional dependencies. The input program P is a linear recursive datalog program. The new program P′ has less different variables and, sometimes, less atoms in rules, thus it is cheaper to evaluate. Using coral and ibm db2, P′ is empirically shown to be more efficient than the original program.This work is partially supported by Xunta de Galicia grant PGIDIT05SIN10502PR and Ministerio de Educación y Ciencia (PGE y FEDER) grants TIC2003-06593 and TIN2006-15071-C03-03.  相似文献   

9.
城轨线网数据中心汇集多条线路数据,单表记录量达数十亿条,当前系统数据查询响应时间过长、效率低下.提出利用数据库集群及中间件优化系统架构突破单库存储与处理瓶颈,多节点并行处理提升查询速度.按线路水平切分数据等方法,保证JOIN操作的局部性,满足新线路扩展需求;利用表分区、索引、物化视图、SQL语句优化等技术优化单机查询.其中,针对集群数据透明访问系统架构,设计专用数据库访问中间件,解决查询解析、路由及结果合成等关键问题.以广州城轨线路数据为例进行实验,结果表明通过本文方法各类查询响应时间至少降低90%.  相似文献   

10.
The intelligent Fril/SQL interrogator is an object‐oriented and knowledge‐based support query system, which is implemented by the set of logic objects linking one another. These logic objects integrate SQL query, support logic programming language—Fril and Fril query together by processing them in sequence in slots of each logic object. This approach therefore takes advantage of both object‐oriented system and a logic programming‐based system. Fuzzy logic data mining and a machine learning tool kit built in the intelligent interrogator can automatically provide a knowledge base or rules to assist a human to analyze huge data sets or create intelligent controllers. Alternatively, users can write or edit the knowledge base or rules according to their requirements, so that the intelligent interrogator is also a support logic programming environment where users can write and run various Fril programs through these logic objects. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 279–302, 2007.  相似文献   

11.
A structured approach for cooperative query answering   总被引:1,自引:0,他引:1  
This paper proposes the use of a type abstraction hierarchy as a framework for deriving cooperative query answers. The type abstraction hierarchy integrates the abstraction view with the subsumption (is-a) and composition (part-of) views of a type hierarchy. Such a framework provides multilevel object representation, which is an important aspect of cooperative query answering. The concept of pattern that specifies one or more conditions on an object is also proposed. Patterns have smaller granularity than types, and thus provide more specific semantic information. Cooperative query answering consists of query relaxation, generalization, specialization, and association on patterns. Query relaxation can be explicitly specified by the user or implicitly performed by the system. The implicit and explicit relaxations can also be combined and performed interactively by both the system and the user. CSQL, an extension of SQL for cooperative query answering, is also proposed. Preliminary experimental results reveal that the proposed type abstraction hierarchy provides an organized structure representing concepts at different knowledge levels in various domains, and provides a systematic and efficient method for cooperative query answering  相似文献   

12.
Many recursive query processing applications are still poorly supported, partly because implementations of general recursive capabilities are inefficient and hard to understand for users, partly because the approaches do not integrate well with existing query languages. An extension is proposed of the database language SQL for the processing of recursive structures. The new constructs are integrated in the view definition mechanism of SQL. Therefore, users with knowledge of SQL can take advantage of the increased functionally without learning a new language. The construct is based on a generalization of transitive closure and is formally defined. Because of the importance of extreme value sections, special constructs are introduced for the selection of tuples with minimal or maximal values in some attributes. Applying these selections on recursively defined views constitutes nonlinear recursion. By the introduction of special constructs for these selections, dealing with general nonlinear recursion can be avoided  相似文献   

13.
回顾了当前入侵检测技术和数据挖掘技术,分析了Snort网络入侵检测系统存在的问题,重点研究了数据挖掘中的关联算法Apriori算法和聚类算法K一均值算法;在Snort入侵检测系统的基础上,增加了正常行为挖掘模块、异常检测模块和新规则生成模块,构建了基于数据挖掘技术的网络入侵检测系统模型。新模型能够有效地检测新的入侵行为,而且提高了系统的检测效率。  相似文献   

14.
数据挖掘在SQL Server2005中的应用   总被引:1,自引:0,他引:1  
本文首先介绍了数据挖掘的概念和处理过程,然后介绍了SQLServer2005中的数据挖掘功能,最后给出了在SQLServer2005中实现数据挖掘项目的整个流程。  相似文献   

15.
In order to extend the expressive power of deductive databases, a formula that can have existential quantifiers in prenex normal form in a restricted way is defined as an extended rule. With the extended rule, we can easily define a virtual view that requires a division operation of relational algebra to evaluate. The paper addresses a recursive query evaluation where at least one formula in a recursive rule set is of an extended rule. We investigate transformable recursions as well as four cases of non-transformable recursions of transitive-closure-like and linear type. The work reveals that occurrence of an existentially quantified variable in the extended recursive body predicate might dramatically limit the level of recursive search. In particular, the number of iterations to answer extended queries can be determined, independently of database contents  相似文献   

16.
一种有效的隐私保护关联规则挖掘方法   总被引:23,自引:3,他引:23  
隐私保护是当前数据挖掘领域中一个十分重要的研究问题,其目标是要在不精确访问真实原始数据的条件下,得到准确的模型和分析结果.为了提高对隐私数据的保护程度和挖掘结果的准确性,提出一种有效的隐私保护关联规则挖掘方法.首先将数据干扰和查询限制这两种隐私保护的基本策略相结合,提出了一种新的数据随机处理方法,即部分隐藏的随机化回答(randomized response with partial hiding,简称RRPH)方法,以对原始数据进行变换和隐藏.然后以此为基础,针对经过RRPH方法处理后的数据,给出了一种简单而又高效的频繁项集生成算法,进而实现了隐私保护的关联规则挖掘.理论分析和实验结果均表明,基于RRPH的隐私保护关联规则挖掘方法具有很好的隐私性、准确性、高效性和适用性.  相似文献   

17.
提出一个基于SQL Server2005的Web日志挖掘解决方案.主要应用SSIS将日志数据从文本文件导入数据库.在SQL Server Management Studio中应用SQL语句和存储过程完成日志的预处理,然后应用SSAS完成数据挖掘任务。通过关联规则挖掘算法在web日志的应用实例证明解决方案的有效性.  相似文献   

18.
关联规则的发现是数据挖掘的一个重要方面,而数量关联规则的发现不同于传统的布尔型关联规则。介绍了数量型关联规则挖掘的方法、步骤以及存在的问题,分析了几种具有代表性的数量型关联规则挖掘算法,提出了IQAM算法,并对数量型关联规则的挖掘进行了展望。  相似文献   

19.
基于矩阵加权关联规则挖掘的伪相关反馈查询扩展   总被引:13,自引:0,他引:13  
黄名选  严小卫  张师超 《软件学报》2009,20(7):1854-1865
提出一种面向查询扩展的矩阵加权关联规则挖掘算法,给出与其相关的定理及其证明过程.该算法采用4种剪枝策略,挖掘效率得到极大提高.实验结果表明,其挖掘时间比原来的平均时间减少87.84%.针对现有查询扩展的缺陷,将矩阵加权关联规则挖掘技术应用于查询扩展,提出新的查询扩展模型和更合理的扩展词权重计算方法.在此基础上提出一种伪相关反馈查询扩展算法——基于矩阵加权关联规则挖掘的伪相关反馈查询扩展算法,该算法能够自动地从前列n 篇初检文档中挖掘与原查询相关的矩阵加权关联规则,构建规则库,从中提取与原查询相关的扩展词,实现查询扩展.实验结果表明,该算法的检索性能确实得到了很好的改善.与现有查询扩展算法相比,在相同的查全率水平级下,其平均查准率有了明显的提高.  相似文献   

20.
数据仓库中多视图环境下的联机维护   总被引:3,自引:0,他引:3  
数据仓库的视图联机维护是指数数据仓库中的实体化视图实时地与信息源中的数据库仑保持一致,同时不影响前端用户对数据仓库的正常使用。为了解决多视图环境中视图联机维护与下钻查询的一致性问题,文中在数据仓库体系结构中引入了“基库”模型,并提出了相应的视图维护算法3VPA。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号