首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Although many real world phenomena are vague and characterized by having uncertain location or vague shape, existing spatial data warehouse models do not support spatial vagueness and then cannot properly represent these phenomena. In this paper, we propose the VSCube conceptual model to represent and manipulate shape vagueness in spatial data warehouses, allowing the analysis of business scores related to vague spatial data, and therefore improving the decision-making process. Our VSCube conceptual model is based on the cube metaphor and supports geometric shapes and the corresponding membership values, thus providing more expressiveness to represent vague spatial data. We also define vague spatial aggregation functions (e.g. vague spatial union) and vague spatial predicates to enable vague SOLAP queries (e.g. intersection range queries). Finally, we introduce the concept of vague SOLAP and its operations (e.g. drill-down and roll-up). We demonstrate the applicability of our model by describing an application concerning pest control in agriculture and by discussing the reuse of existing models in the VSCube conceptual model.  相似文献   

2.
Cloud computing systems handle large volumes of data by using almost unlimited computational resources, while spatial data warehouses (SDWs) are multidimensional databases that store huge volumes of both spatial data and conventional data. Cloud computing environments have been considered adequate to host voluminous databases, process analytical workloads and deliver database as a service, while spatial online analytical processing (spatial OLAP) queries issued over SDWs are intrinsically analytical. However, hosting a SDW in the cloud and processing spatial OLAP queries over such database impose novel obstacles. In this article, we introduce novel concepts as cloud SDW and spatial OLAP as a service, and afterwards detail the design of novel schemas for cloud SDW and spatial OLAP query processing over cloud SDW. Furthermore, we evaluate the performance to process spatial OLAP queries in cloud SDWs using our own query processor aided by a cloud spatial index. Moreover, we describe the cloud spatial bitmap index to improve the performance to process spatial OLAP queries in cloud SDWs, and assess it through an experimental evaluation. Results derived from our experiments revealed that such index was capable to reduce the query response time from 58.20 up to 98.89 %.  相似文献   

3.
On-line analytical processing (OLAP) refers to the technologies that allow users to efficiently retrieve data from the data warehouse for decision-support purposes. Data warehouses tend to be extremely large, it is quite possible for a data warehouse to be hundreds of gigabytes to terabytes in size (Chauduri and Dayal, 1997). Queries tend to be complex and ad hoc, often requiring computationally expensive operations such as joins and aggregation. Given this, we are interested in developing strategies for improving query processing in data warehouses by exploring the applicability of parallel processing techniques. In particular, we exploit the natural partitionability of a star schema and render it even more efficient by applying DataIndexes-a storage structure that serves both as an index as well as data and lends itself naturally to vertical partitioning of the data. DataIndexes are derived from the various special purpose access mechanisms currently supported in commercial OLAP products. Specifically, we propose a declustering strategy which incorporates both task and data partitioning and present the Parallel Star Join (PSJ) Algorithm, which provides a means to perform a star join in parallel using efficient operations involving only rowsets and projection columns. We compare the performance of the PSJ Algorithm with two parallel query processing strategies. The first is a parallel join strategy utilizing the Bitmap Join Index (BJI), arguably the state-of-the-art OLAP join structure in use today. For the second strategy we choose a well-known parallel join algorithm, namely the pipelined hash algorithm. To assist in the performance comparison, we first develop a cost model of the disk access and transmission costs for all three approaches.  相似文献   

4.
Views over databases have regained attention in the context of data warehouses, which are seen as materialized views. In this setting, efficient view maintenance is an important issue, for which the notion of self-maintainability has been identified as desirable. In this paper, we extend the concept of self-maintainability to (query and update) independence within a formal framework, where independence with respect to arbitrary given sets of queries and updates over the sources can be guaranteed. To this end we establish an intuitively appealing connection between warehouse independence and view complements. Moreover, we study special kinds of complements, namely monotonic complements, and show how to compute minimal ones in the presence of keys and foreign keys in the underlying databases. Taking advantage of these complements, an algorithmic approach is proposed for the specification of independent warehouses with respect to given sets of queries and updates. Received: 21 November 2000 / Accepted: 1 May 2001 Published online: 6 September 2001  相似文献   

5.
A new structure for organizing a set of multidimensional points called the nested interpolation-based grid file (NIBGF) is introduced. The structure represents a synthesis and an improvement over interpolation-based grid files (IBGF), BANG files, andK-D-B-trees. It decomposes the data search space into uniquely identifiable regions which may either be disjoint as in interpolation-based grid files or enclose each other as in the BANG files. In addition to possessing the symmetry of access and clustering properties characteristic of grid file structures, the performance of NIBGF is comparable to aB-tree performance as far as the index is concerned, even in the worst case scenario, and to the BANG file performance as far as the data regions are concerned. These properties make the new structure suitable for efficient implementation of relational database operations.Research supported by NSF IRI-9010365  相似文献   

6.
OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method to rewrite a given OLAP query using various kinds of materialized views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the selection and aggregation granularities, which are derived from the lattice of dimension hierarchies. Conditions for usability of materialized views in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that can effectively utilize materialized views having different selection granularities, selection regions, and aggregation granularities together. We also propose an algorithm to find a set of materialized views that results in a rewritten query which can be executed efficiently. We show the effectiveness and performance of the algorithm experimentally.  相似文献   

7.
Due to an explosive increase of XML documents, it is imperative to manage XML data in an XML data warehouse. XML warehousing imposes challenges, which are not found in the relational data warehouses. In this paper, we firstly present a framework to build an XML data warehouse schema. For the purpose of scalability due to the increase of data volume, we propose a number of partitioning techniques for multi-version XML data warehouses, including document based partitioning, schema based partitioning, and cascaded (mixed) partitioning model. Finally, we formulate cost models to evaluate various types of queries for an XML data warehouse.  相似文献   

8.
Efficient aggregation algorithms for compressed data warehouses   总被引:9,自引:0,他引:9  
Aggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. This paper presents a set of aggregation algorithms on compressed data warehouses for multidimensional OLAP. These algorithms operate directly on compressed data sets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the data set parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms  相似文献   

9.
With a huge amount of data stored in spatial databases and the introduction of spatial components to many relational or object-relational databases, it is important to study the methods for spatial data warehousing and OLAP of spatial data. In this paper, we study methods for spatial OLAP, by integrating nonspatial OLAP methods with spatial database implementation techniques. A spatial data warehouse model, which consists of both spatial and nonspatial dimensions and measures, is proposed. Methods for the computation of spatial data cubes and analytical processing on such spatial data cubes are studied, with several strategies being proposed, including approximation and selective materialization of the spatial objects resulting from spatial OLAP operations. The focus of our study is on a method for spatial cube construction, called object-based selective materialization, which is different from cuboid-based selective materialization (proposed in previous studies of nonspatial data cube construction). Rather than using a cuboid as an atomic structure during the selective materialization, we explore granularity on a much finer level: that of a single cell of a cuboid. Several algorithms are proposed for object-based selective materialization of spatial data cubes, and a performance study has demonstrated the effectiveness of these techniques  相似文献   

10.
A number of proposals for integrating geographical (Geographical Information Systems—GIS) and multidimensional (data warehouse—DW and online analytical processing—OLAP) processing are found in the database literature. However, most of the current approaches do not take into account the use of a GDW (geographical data warehouse) metamodel or query language to make available the simultaneous specification of multidimensional and spatial operators. To address this, this paper discusses the UML class diagram of a GDW metamodel and proposes its formal specifications. We then present a formal metamodel for a geographical data cube and propose the Geographical Multidimensional Query Language (GeoMDQL) as well. GeoMDQL is based on well-known standards such as the MultiDimensional eXpressions (MDX) language and OGC simple features specification for SQL and has been specifically defined for spatial OLAP environments based on a GDW. We also present the GeoMDQL syntax and a discussion regarding the taxonomy of GeoMDQL query types. Additionally, aspects related to the GeoMDQL architecture implementation are described, along with a case study involving the Brazilian public healthcare system in order to illustrate the proposed query language.  相似文献   

11.
Decision support systems help the decision making process with the use of OLAP (On-Line Analytical Processing) and data warehouses. These systems allow the analysis of corporate data. As OLAP and data warehousing evolve, more and more complex data is being used. XML (Extensible Markup Language) is a flexible text format allowing the interchange and the representation of complex data. Finding an appropriate model for an XML data warehouse tends to become complicated as more and more solutions appear. Hence, in this survey paper we present an overview of the different proposals that use XML within data warehousing technology. These proposals range from using XML data sources for regular warehouses to those using full XML warehousing solutions. Some researches merely focus on document storage facilities while others present adaptations of XML technology for OLAP. Even though there are a growing number of researches on the subject, many issues still remain unsolved.  相似文献   

12.
Data warehouse systems typically designate downtime for view maintenance, ranging from tens of minutes to hours depending on the system size. We develop a multiagent system that achieves immediate incremental view maintenance (IIVM) for continuous updating of data warehouse views. We describe an IIVM system that processes updates as transactions are executed at the underlying data sources to eliminate view maintenance downtime for the data warehouse-a crucial requirement for internet applications. The use of a multiagent framework provides considerable process speed improvement when compared with other IIVM systems. Since agents are used to delegate the data sources and warehouse views, it is easy to reorganize the components of the system. Through the use of cooperative agents, the data consistency of IIVM can be easily maintained. The test results from this research show that the proposed system increases the availability of the data warehouse while preserving a stringent requirement on data consistency.  相似文献   

13.
In order to create better decisions for business analytics, organizations increasingly use external structured, semi-structured, and unstructured data in addition to the (mostly structured) internal data. Current Extract-Transform-Load (ETL) tools are not suitable for this “open world scenario” because they do not consider semantic issues in the integration processing. Current ETL tools neither support processing semantic data nor create a semantic Data Warehouse (DW), a repository of semantically integrated data. This paper describes our programmable Semantic ETL (SETL) framework. SETL builds on Semantic Web (SW) standards and tools and supports developers by offering a number of powerful modules, classes, and methods for (dimensional and semantic) DW constructs and tasks. Thus it supports semantic data sources in addition to traditional data sources, semantic integration, and creating or publishing a semantic (multidimensional) DW in terms of a knowledge base. A comprehensive experimental evaluation comparing SETL to a solution made with traditional tools (requiring much more hand-coding) on a concrete use case, shows that SETL provides better programmer productivity, knowledge base quality, and performance.  相似文献   

14.
This paper proposes a novel, probabilistic data model and algebra that improves the modeling and querying of uncertain data in spatial OLAP (SOLAP) to support location-based services. Data warehouses that support location-based services need to combine complex hierarchies, such as road networks or transportation infrastructures, with static and dynamic content, e.g., speed limits and vehicle positions, respectively. Both the hierarchies and the content are often uncertain in real-world applications. Our model supports the use of probability distributions within both facts and dimensions. We give an algebra that correctly aggregates uncertain data over uncertain hierarchies. This paper also describes an implementation of the model and algebra, gives a complexity analysis of the algebra, and reports on an empirical, experimental evaluation of the implementation. The work is motivated with a real-world case study, based on our collaboration with a leading Danish vendor of location-based services.  相似文献   

15.
16.
Active data warehouses: complementing OLAP with analysis rules   总被引:2,自引:0,他引:2  
Conventional data warehouses are passive. All tasks related to analysing data and making decisions must be carried out manually by analysts. Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available. Such a functionality can be provided by extending the conventional data warehouse architecture with analysis rules, which mimic the work of an analyst during decision making. Analysis rules extend the basic event/condition/action (ECA) rule structure with mechanisms to analyse data multidimensionally and to make decisions. The resulting architecture is called active data warehouse.  相似文献   

17.
Histogram feature representation is important in many classification applications for characterization of the statistical distribution of different pattern attributes, such as the color and edge orientation distribution in images. While the construction of these feature representations is simple, this very simplicity may compromise the classification accuracy in those cases where the original histogram does not provide adequate discriminative information for making a reliable classification. In view of this, we propose an optimization approach based on evolutionary computation (Back, Evolutionary algorithms in theory and practice, Oxford University Press, New York, 1996; Fogel, Evolutionary computation: toward a new philosophy of machine intelligence, 2nd edn. IEEE, Piscataway, NJ 1998) to identify a suitable transformation on the histogram feature representation, such that the resulting classification performance based on these features is maximally improved while the original simplicity of the representation is retained. To facilitate this optimization process, we propose a hierarchical classifier structure to demarcate the set of categories in such a way that the pair of category subsets with the highest level of dissimilarities is identified at each stage for partition. In this way, the evolutionary search process for the required transformation can be considerably simplified due to the reduced level of complexities in classification for two widely separated category subsets. The proposed approach is applied to two problems in multimedia data classification, namely the categorization of 3D computer graphics models and image classification in the JPEG compressed domain. Experimental results indicate that the evolutionary optimization approach, facilitated by the hierarchical classification process, is capable of significantly improving the classification performance for both applications based on the transformed histogram representations.
Hau-San WongEmail:
  相似文献   

18.
The multidimensional (MD) modeling, which is the foundation of data warehouses (DWs), MD databases, and On-Line Analytical Processing (OLAP) applications, is based on several properties different from those in traditional database modeling. In the past few years, there have been some proposals, providing their own formal and graphical notations, for representing the main MD properties at the conceptual level. However, unfortunately none of them has been accepted as a standard for conceptual MD modeling.

In this paper, we present an extension of the Unified Modeling Language (UML) using a UML profile. This profile is defined by a set of stereotypes, constraints and tagged values to elegantly represent main MD properties at the conceptual level. We make use of the Object Constraint Language (OCL) to specify the constraints attached to the defined stereotypes, thereby avoiding an arbitrary use of these stereotypes. We have based our proposal in UML for two main reasons: (i) UML is a well known standard modeling language known by most database designers, thereby designers can avoid learning a new notation, and (ii) UML can be easily extended so that it can be tailored for a specific domain with concrete peculiarities such as the multidimensional modeling for data warehouses. Moreover, our proposal is Model Driven Architecture (MDA) compliant and we use the Query View Transformation (QVT) approach for an automatic generation of the implementation in a target platform. Throughout the paper, we will describe how to easily accomplish the MD modeling of DWs at the conceptual level. Finally, we show how to use our extension in Rational Rose for MD modeling.  相似文献   


19.
《Information Systems》1999,24(3):229-253
Most database researchers have studied data warehouses (DW) in their role as buffers of materialized views, mediating between update-intensive OLTP systems and query-intensive decision support. This neglects the organizational role of data warehousing as a means of centralized information flow control. As a consequence, a large number of quality aspects relevant for data warehousing cannot be expressed with the current DW meta models. This paper makes two contributions towards solving these problems. Firstly, we enrich the meta data about DW architectures by explicit enterprise models. Secondly, many very different mathematical techniques for measuring or optimizing certain aspects of DW quality are being developed. We adapt the Goal-Question-Metric approach from software quality management to a meta data management environment in order to link these special techniques to a generic conceptual framework of DW quality. The approach has been implemented in full on top of the ConceptBase repository system and has undergone some validation by applying it to the support of specific quality-oriented methods, tools, and application projects in data warehousing.  相似文献   

20.
We focus exclusively on the issue of Requirements engineering for Data Warehouses (DW). Our position is that the information content of a DW is found in the larger context of the goals of an organization. We refer to this context as the organizational perspective. Goals identify the set of decisions that are relevant which in turn help in determining the information needed to support these. The organizational perspective is converted into the technical perspective, which deals with the set of decisions to be supported and the information required. The latter defines Data warehouse contents. To elicit the technical perspective, we use the notion of an informational scenario. It is a typical interaction between a DW system and the decision maker and consists of a sequence of pairs of the form, <information request, response>. We formulate an information request as a statement in an adapted form of SQL called Specification SQL. The proposals here are implemented in the form of an Informational Scenario Engine that processes informational scenarios and determines Data Warehouse Information Contents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号