共查询到20条相似文献,搜索用时 15 毫秒
1.
Discovering trend reversals between two data cubes provides users with a novel and interesting knowledge when the real world context fluctuates: What is new? Which trends appear or emerge? Which tendencies are immersing or disappear? With the concept of Emerging Cube, we capture such trend reversals by enforcing an emergence constraint. We resume the classical borders for the Emerging Cube and introduce a new one which optimizes both storage space and computation time, provides a simple characterization of the size of Emerging Cubes, as well as classification and cube navigation tools. We soundly state the connection between the classical and proposed borders by using cube transversals. Knowing the size of Emerging Cubes without computing them is of great interest in particular for adjusting at best the underlying emergence constraint. We address this issue by studying an upper bound and characterizing the exact size of Emerging Cubes. We propose two strategies for quickly estimate their size: one based on analytical estimation, without database access, and one based on probabilistic counting using the proposed borders as the input of the near-optimal algorithm HyperLogLog. Due to the efficiency of the estimation algorithm various iterations can be performed to calibrate at best the emergence constraint. Moreover, we propose reduced and lossless representations of the Emerging Cube by using the concept of cube closure. Finally, we perform experiments for different data distributions in order to measure on one hand the size of the introduced condensed and concise representations and on the other hand the performance (accuracy and computation time) of the proposed estimation method. 相似文献
2.
The data cube operator computes group-bys for all possible combinations of a set of dimension attributes. Since computing a data cube typically incurs a considerable cost, the data cube is often precomputed and stored as materialized views in data warehouses. A materialized data cube needs to be updated when the source relations are changed. The incremental maintenance of a data cube is to compute and propagate only its changes, rather than recompute the entire data cube from scratch. For n dimension attributes, the data cube consists of 2n group-bys, each of which is called a cuboid. To incrementally maintain a data cube with 2n cuboids, the conventional methods compute 2ndelta cuboids, each of which represents the change of a cuboid. In this paper, we propose an efficient incremental maintenance method that can maintain a data cube using only a subset of 2n delta cuboids. We formulate an optimization problem to find the optimal subset of 2n delta cuboids that minimizes the total maintenance cost, and propose a heuristic solution that allows us to maintain a data cube using only delta cuboids. As a result, the cost of maintaining a data cube is substantially reduced. Through various experiments, we show the performance advantages of the proposed method over the conventional methods. We also extend the proposed method to handle partially materialized cubes and dimension hierarchies. 相似文献
3.
4.
通过对数据仓库和OLAP概念及体系结构的分析,描述了一种OLAP应用系统的设计方案,并介绍了它的具体实现方法。基于数据仓库的查询,一般都是及时特定查询,要在严格的响应时间内执行复杂的查询,遍历百万上亿的记录,同时进行可能很复杂的搜索、连接和汇总的操作。查询的数据吞吐量和响应时间是判断数据仓库性能的重点。CUBE的计算是OLAP及时查询的基础,提高查询的速度需要对OLAP进行预先的计算。文中系统比较了一些计算立方体的算法,并运用到具体的系统当中。 相似文献
5.
Goran Hrovat Gregor Stiglic Peter Kokol Milan Ojsteršek 《Computer methods and programs in biomedicine》2014
With the increased acceptance of electronic health records, we can observe the increasing interest in the application of data mining approaches within this field. This study introduces a novel approach for exploring and comparing temporal trends within different in-patient subgroups, which is based on associated rule mining using Apriori algorithm and linear model-based recursive partitioning. The Nationwide Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP), Agency for Healthcare Research and Quality was used to evaluate the proposed approach. This study presents a novel approach where visual analytics on big data is used for trend discovery in form of a regression tree with scatter plots in the leaves of the tree. The trend lines are used for directly comparing linear trends within a specified time frame. Our results demonstrate the existence of opposite trends in relation to age and sex based subgroups that would be impossible to discover using traditional trend-tracking techniques. Such an approach can be employed regarding decision support applications for policy makers when organizing campaigns or by hospital management for observing trends that cannot be directly discovered using traditional analytical techniques. 相似文献
6.
Zhongzhi Youping Qing Lida Xu Shaohui Liu Liangxi Ziyan Jiayou Li Huijing Lei Zhao 《Decision Support Systems》2007,42(4):2016
Since the early 1970s, decision support systems (DSS) have evolved significantly. In this paper, the design and implementation of MSMiner, a developing platform for DSS, is introduced. The system is constructed on a data warehouse and integrated with a number of data mining algorithms. It is well suited for on-line analytical processing (OLAP). The characteristics of MSMiner include the ability to support multiple data sources and data mining strategies, additional organizational flexibility in regard to data and mining strategies, and the powerful expansibility of data mining tasks. 相似文献
7.
OLAP(联机分析处理)是一种数据分析技术,它和数据仓库有着密切的联系,详细阐述了OLAP技术在电信领域的应用,以及OALP技术和数据仓库的一些关系。以移动电话业务收入总量为分析主题,确定了分析方法,定义了维度,并构造分析了立方体和星型结构,最后对结果进行了分析。 相似文献
8.
The Model‐Driven Architecture (MDA) is an approach that aligns modeling and automation for software development. By applying such an approach to data warehouse (DW) projects, we can minimize a great deal of time and cost. Furthermore, most of OnLine Analytical Processing (OLAP) platforms seem to be like black boxes that provide wizards only to business intelligence developers to create and manipulate OLAP objects without allowing their sustainability and migration from a platform to another. That is why many works in the literature have proposed using the MDA approach in DW projects. However, most of them have mainly focused on the generation of the DW relational model from the conceptual one, and they overlooked the OLAP model and the cube implementation. To deal with this problem, we propose in this paper an MDA solution to automate the process of getting OLAP cube and its implementation through a set of metamodels and automatic transformations among them. In fact, the proposal generates the OLAP and DW relational models (PSMs) from the conceptual one, using also a PDM model that describes the target business intelligence platform. After that, the source code to create the cube is got from both PSM models. For this aim, we define a set of transformation rules implemented using the Atlas transformation language. Finally, a case study will be provided to validate our approach. 相似文献
9.
Imad Rahal Dongmei Ren Weihua Wu Anne Denton Christopher Besemann William Perrizo 《Knowledge and Information Systems》2006,10(1):57-91
Graphs are increasingly becoming a vital source of information within which a great deal of semantics is embedded. As the size of available graphs increases, our ability to arrive at the embedded semantics grows into a much more complicated task. One form of important hidden semantics is that which is embedded in the edges of directed graphs. Citation graphs serve as a good example in this context. This paper attempts to understand temporal aspects in publication trends through citation graphs, by identifying patterns in the subject matters of scientific publications using an efficient, vertical association rule mining model. Such patterns can (a) indicate subject-matter evolutionary history, (b) highlight subject-matter future extensions, and (c) give insights on the potential effects of current research on future research. We highlight our major differences with previous work in the areas of graph mining, citation mining, and Web-structure mining, propose an efficient vertical data representation model, introduce a new subjective interestingness measure for evaluating patterns with a special focus on those patterns that signify strong associations between properties of cited papers and citing papers, and present an efficient algorithm for the purpose of discovering rules of interest followed by a detailed experimental analysis.
Imad Rahal is a newly appointed assistant professor in the Department of Computer Science at the College of Saint Benedict ∣ Saint John's University, Collegeville, MN, and a Ph.D. candidate at North Dakota State University, Fargo, ND. In August 2003, he earned his master's degree in computer science from North Dakota State University. Prior to that, he graduated summa cum laude from the Lebanese American University, Beirut, Lebanon, in February 2001 with a bachelor's degree in computer science. Currently, he is completing the final requirements for his Ph.D. degree in computer science on an NSF ND-EPSCoR doctoral dissertation assistantship with August of 2005 as a projected completion date. He is very active in research, proposal writing, and publications; his research interests are largely in the broad areas of data mining, machine learning, databases, artificial intelligence, and bioinformatics.
Dongmei Ren is working for the Database Technology Institute for z/OS, IBM Silicon Valley Lab, San Jose, CA, as a staff software engineer. She holds a Ph.D. degree from North Dakota State University, Fargo, ND, and master's and bachelor's degrees from TianJin University, TianJin, China. She has been a software engineer at DaTang Telecommunications, Beijing, China. Her areas of expertise are outlier analysis, data mining and knowledge discovery, database systems, machine learning, intelligent systems, wireless networks and bioinformatics. She has been awarded the Siemens Scholarship research enhancement for excellent performance in study and research. She is a member of ACM, IEEE.
Weihua Wu is a network monitoring & managed services analyst at Hewlett-Packard Co. in Canada. He holds a master's degree from North Dakota State University and a bachelor's degree from Nanjing University, both in computer science. His research areas of interest include data mining, knowledge discovery, data warehousing, information technology, network security, and bioinformatics. He has participated in various projects supported by NSF, DARPA, NASA, USDA, and GSA grants.
Anne Denton is an assistant professor in computer science at North Dakota State University. Her research interests are in data mining, knowledge discovery in scientific data, and bioinformatics. Specific interests include data mining of diverse data, in which objects are characterized by a variety of properties such as numerical and categorical attributes, graphs, sequences, time-dependent attributes, and others. She received her Ph.D. in physics from the University of Mainz, Germany, and her M.S. in computer science from North Dakota State University, Fargo, ND.
Christopher Besemann received his M.Sc. in computer science from North Dakota State University in Fargo, ND, 2005. Currently, he works in data mining research topics including association mining and relational data mining with recent work in model integration as a research assistant. He is accepted under a fellowship program for Ph.D. study at North Dakota State University.
William Perrizo is a professor of computer science at North Dakota State University. He holds a Ph.D. degree from the University of Minnesota, a master's degree from the University of Wisconsin and a bachelor's degree from St. John's University. He has been a research scientist at the IBM Advanced Business Systems Division and the U.S. Air Force Electronic Systems Division. His areas of expertise are data mining, knowledge discovery, database systems, distributed database systems, high speed computer and communications networks, precision agriculture and bioinformatics. He is a member of ISCA, ACM, IEEE, IAAA, and AAAS. 相似文献
10.
刘光榕 《电脑编程技巧与维护》2011,(4):32-35
以一个对实时收入和实时话务为基础数据进行的包括诸如时间、产品、渠道、资费等的多维度分析的过程为例,论述了数据仓库及联机分析(OLAP)技术的概念、技术要点及开发实施步骤,探讨了这些技术在电信业务分析中的应用. 相似文献
11.
OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method to rewrite a given OLAP query using various kinds of materialized views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the selection and aggregation granularities, which are derived from the lattice of dimension hierarchies. Conditions for usability of materialized views in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that can effectively utilize materialized views having different selection granularities, selection regions, and aggregation granularities together. We also propose an algorithm to find a set of materialized views that results in a rewritten query which can be executed efficiently. We show the effectiveness and performance of the algorithm experimentally. 相似文献
12.
基于数据仓库的OLAP在DSS中的应用研究 总被引:14,自引:1,他引:14
以数据仓库为基础的在线联机分析处理(OLAP)技术是决策支持系统(DSS)中一种新的决策分析方法。该文从DSS的实际需求出发,分析了数据仓库的特征和结构模型,研究了OLAP的特征和体系结构及其实现决策分析的过程,最后给出了应用实例,并探讨了完善DSS结构的研究方向。 相似文献
13.
随着商业智能市场的逐步扩大,联机分析处理(OLAP)系统的使用质量评估已经成为数据库应用的研究热点.作为效用特性的OLAP系统性能评估需要一个性能基准.以OLAP委员会推出的APB-1性能基准为基础,首先设计了面向多维数据库的立方体(Cube)模型以及相应的多维表达式(MDX)查询模板,在Cube模型设计的过程中修改了APB-1基准ROLAP星型模型的不足之处;接着在测试数据一致和测试参数一致的前提下,通过对设计的MOLAP模型查询结果与ROLAP模型查询结果进行对比分析,证明了MOLAP模型及MDX查询模板设计的正确性;然后给出了OLAP性能测试流程,描述了支持ROLAP和MOLAP性能测试的工具框架及其主要模块.最后使用该测试框架在商业数据库管理系统上对ROLAP和MOLAP进行并发查询实践,验证了框架的有效性.提出的方法及技术实现为未来OLAP产品性能的测试和评价提供多维数据模型、业务模型和工具的支持. 相似文献
14.
15.
采用插件接口对象编写的基于微软管理控制台(MMC)的分析管理器(AM)插件可以调用决策支持对象(DSO)来管理OLAP Server中的各种对象,从而大大地提高OLAP数据结构管理的安全性,准确性,灵活性以及快速性,分别介绍了分析管理器插件中使用DSO管理OLAP数据的具体方法。 相似文献
16.
介绍基于分布式数据库技术、网络通信技术、地理信息系统技术的空间数据仓库的设计方法。以福建省沿海地区遗迹保护区为例,搭建了一个分布式的空间数据仓库。在此基础上实现以地区行政级别、类型划分及保护区一般信息为雏度的多粒度数据部署。并以本数据仓库为倒,根据不同的空间联机分析(OLAP)服务请求,动态创建数据立方体,完成OLAP服务并返回空间OLAP结果。 相似文献
17.
This paper addresses the integration of fuzziness with On-Line Analytical Processing (OLAP) based association rules mining. It contributes to the ongoing research on multidimensional online association rules
mining by proposing a general architecture that utilizes a fuzzy data cube for knowledge discovery. A data cube is mainly constructed to provide users with the flexibility to view data from different
perspectives as some dimensions of the cube contain multiple levels of abstraction. The first step of the process described
in this paper involves introducing fuzzy data cube as a remedy to the problem of handling quantitative values of dimensional
attributes in a cube. This facilitates the online mining of fuzzy association rules at different levels within the constructed
fuzzy data cube. Then, we investigate combining the concepts of weight and multiple-level to mine fuzzy weighted multi-cross-level
association rules from the constructed fuzzy data cube. For this purpose, three different methods are introduced for single
dimension, multidimensional and hybrid (integrates the other two methods) fuzzy weighted association rules mining. Each of
the three methods utilizes a fuzzy data cube constructed to suite the particular method. To the best of our knowledge, this
is the first effort in this direction. We compared the proposed approach to an existing approach that does not utilize fuzziness.
Experimental results obtained for each of the three methods on a synthetic dataset and on the adult data of the United States
census in year 2000 demonstrate the effectiveness and applicability of the proposed fuzzy OLAP based mining approach.
OLAP is one of the most popular tools for on-line, fast and effective multidimensional data analysis.
In the OLAP framework, data is mainly stored in data hypercubes (simply called cubes). 相似文献
18.
19.
基于MS Analysis Services 的OLAP分析系统模型设计及应用 总被引:1,自引:0,他引:1
首先对OLAP和MS Analysis Services技术进行了讨论,然后提出了一种基于MS Analysis Services的OLAP分析系统模型设计,模型具有通用性、实用性和开放性.最后给出了运用此模型构建的OLAP分析系统实例,阐述了系统开发的步骤.实践证明运用提出的模型构建OLAP分析系统,实现了良好的分析决策功能. 相似文献