首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
NoSQL systems have gained their popularity for many reasons, including the flexibility they provide in organizing data, as they relax the rigidity provided by the relational model and by the other structured models. This flexibility and the heterogeneity that has emerged in the area have led to a little use of traditional modeling techniques, as opposed to what has happened with databases for decades.In this paper, we argue how traditional notions related to data modeling can be useful in this context as well. Specifically, we propose NoAM (NoSQL Abstract Model), a novel abstract data model for NoSQL databases, which exploits the commonalities of various NoSQL systems. We also propose a database design methodology for NoSQL systems based on NoAM, with initial activities that are independent of the specific target system. NoAM is used to specify a system-independent representation of the application data and, then, this intermediate representation can be implemented in target NoSQL databases, taking into account their specific features. Overall, the methodology aims at supporting scalability, performance, and consistency, as needed by next-generation web applications.  相似文献   

2.
社交网络和微博等新型应用对数据管理技术提出了新的挑战,如海量数据高效存储、高并发访问、高可扩展性和高可用性等。而传统的关系数据库技术无法满足这些新型应用的需求,因此,NoSQL数据管理技术的研究、开发和应用越来越受到重视。本文从NoSQL数据模型、数据存储、查询处理以及SQL与NoSQL混合数据库解决方案等方面,综述了NoSQL数据管理技术发展现状和趋势,并介绍了几种典型的NoSQL产品。  相似文献   

3.
NoSQL databases are famed for the characteristics of high scalability, high availability, and high fault-tolerance. So NoSQL databases are used in a lot of applications. The data partitioning strategy and fragment allocation strategy directly affect NoSQL database systems’ performance. The data partition strategy of large, global databases is performed by horizontally, vertically partitioning or combination of both. In the general way the system scatters the related fragments as possible to improve operations’ parallel degree. But the operations are usually not very complicated in some applications, and an operation may access to more than one fragment. At the same time, those fragments which have to be accessed by an operation may interact with each other. The general allocation strategies will increase system’s communication cost during operations execution over sites. In order to improve those applications’ performance and enable NoSQL database systems to work efficiently, these applications’ fragments have to be allocated in a reasonable way that can reduce the communication cost i.e., to minimize the total volume of data transmitted during operations execution over sites. A strategy of clustering fragments based on hypergraph is proposed, which can cluster fragments which were accessed together in most operations to the same cluster. Themethod uses a weighted hypergraph to represent the fragments’ access pattern of operations. A hypergraph partitioning algorithmis used to cluster fragments in our strategy. This method can reduce the amount of sites that an operation has to span. So it can reduce the communication cost over sites. Experimental results confirm that the proposed technique will effectively contribute in solving fragments re-allocation problem in a specific application environment of NoSQL database system.  相似文献   

4.
NoSQL systems are increasingly adopted for Web applications requiring scalability that relational database systems cannot meet. Although NoSQL systems have not been designed to support joins, as they are applied to a wide variety of applications, the need to support joins has emerged. Furthermore, joins performed in NoSQL systems are generally similarity joins, rather than exact-match joins, which find similar pairs of records. Since Web applications often use the MapReduce framework, we develop a solution to perform similarity joins in NoSQL systems using the MapReduce framework.  相似文献   

5.
Current information technologies generate large amounts of data for management or further analysis, storing it in NoSQL databases which provide horizontal scaling and high performance, supporting many read/write operations per second. NoSQL column-oriented databases, such as Cassandra and HBase, are usually modelled following a query-driven approach, resulting in denormalized databases where the same data can be repeated in several tables. Therefore, maintaining data integrity relies on client applications to ensure that, for data changes that occur, the affected tables will be appropriately updated. We devise a method called MDICA that, given a data insertion at a conceptual level, determines the required actions to maintain database integrity in column-oriented databases. This method is implemented for Cassandra database applications. MDICA is based on the definition of (1) rules to determine the tables that will be impacted by the insertion, (2) procedures to generate the statements to ensure data integrity and (3) messages to warn the user about errors or potential problems. This method helps developers in two ways: generating the statements needed to maintain data integrity and producing messages to avoid problems such as loss of information, redundant repeated data or gaps of information in tables.  相似文献   

6.
Changqing Li  Jianhua Gu 《Software》2019,49(3):401-422
As the applications with big data in cloud computing environment grow, many existing systems expect to expand their service to support the dramatic increase of data, and modern software development for services computing and cloud computing software systems is no longer based on a single database but on existing multidatabases and this convergence needs new software architecture design. This paper proposes an integration approach to support hybrid database architecture, including MySQL, MongoDB, and Redis, to make it possible of allowing users to query data simultaneously from both relational SQL systems and NoSQL systems in a single SQL query. Two mechanisms are provided for constructing Redis's indexes and semantic transforming between SQL and MongoDB API to add the SQL feature for these NoSQL databases. With the proposed approach, hybrid database systems can be performed in a flexible manner, ie, access can be either relational database or NoSQL, depending on the size of data. The approach can effectively reduce development complexity and improve development efficiency of the software systems with multidatabases. This is the result of further research on the related topic, which fills the gap ignored by relevant scholars in this field to make a little contribution to the further development of NoSQL technology.  相似文献   

7.
Starting with the birth of Web 2.0, the quantity of data managed by large-scale web services has grown exponentially, posing new challenges and infrastructure requirements. This has led to new programming paradigms and architectural choices, such as map-reduce and NoSQL databases, which constitute two of the main peculiarities of the specialized massively distributed systems referred to as Big Data architectures. The underlying computer infrastructures usually face complexity requirements, resulting from the need for efficiency and speed in computing over huge evolving data sets. This is achieved by taking advantage from the features of new technologies, such as the automatic scaling and replica provisioning of Cloud environments. Although performances are a key issue for the considered applications, few performance evaluation results are currently available in this field. In this work we focus on investigating how a Big Data application designer can evaluate the performances of applications exploiting the Apache Hive query language for NoSQL databases, built over a Apache Hadoop map-reduce infrastructure.This paper presents a dedicated modeling language and an application, showing first how it is possible to ease the modeling process and second how the semantic gap between modeling logic and the domain can be reduced, by means of vertical multiformalism modeling.  相似文献   

8.
Integration of data stored in heterogeneous database systems is a very challenging task and it may hide several difficulties. As NoSQL databases are growing in popularity, integration of different NoSQL systems and interoperability of NoSQL systems with SQL databases become an increasingly important issue. In this paper, we propose a novel data integration methodology to query data individually from different relational and NoSQL database systems. The suggested solution does not support joins and aggregates across data sources; it only collects data from different separated database management systems according to the filtering options and migrates them. The proposed method is based on a metamodel approach and it covers the structural, semantic and syntactic heterogeneities of source systems. To introduce the applicability of the proposed methodology, we developed a web-based application, which convincingly confirms the usefulness of the novel method.  相似文献   

9.
Due to the gradual expansion in data volume used in social networks and cloud computing, the term “Big data” has appeared with its challenges to store the immense datasets. Many tools and algorithms appeared to handle the challenges of storing big data. NoSQL databases, such as Cassandra and MongoDB, are designed with a novel data management system that can handle and process huge volumes of data. Partitioning data in NoSQL databases is considered one of the critical challenges in database design. In this paper, a MapReduce Rendezvous Hashing-Based Virtual Hierarchies (MR-RHVH) framework is proposed for scalable partitioning of Cassandra NoSQL database. The MapReduce framework is used to implement MR-RHVH on Cassandra to enhance its performance in highly distributed environments. MR-RHVH distributes the nodes to rendezvous regions based on a proposed Adopted Virtual Hierarchies strategy. Each region is responsible for a set of nodes. In addition, a proposed bloom filter evaluator is used to ensure the accurate allocation of keys to nodes in each region. Moreover, a number of experiments were performed to evaluate the performance of MR-RHVH framework, using YCSB for database benchmarking. The results show high scalability rate and less time consuming for MR-RHVH framework over different recent systems.  相似文献   

10.
Wide-column NoSQL databases are an important class of NoSQL (Not only SQL) databases which scale horizontally and feature high access performance on sparse tables. With current trends towards big Data Warehouses (DWs), it is attractive to run existing business intelligence/data warehousing applications on higher volumes of data in wide-column NoSQL databases for low latency by mapping multidimensional models to wide-column NoSQL models or using additional SQL add-ons. For examples, applications like retail management can run over integrated data sets stored in big DWs or in the cloud to capture current item-selling trends. Many of these systems also employ Snapshot Isolation (SI) as a concurrency control mechanism to achieve high throughput for read-heavy workloads. SI works well in a DW environment, as analytical queries can now work on (consistent) snapshots and are not impacted by concurrent update jobs performed by online incremental Extract-Transform-Load (ETL) flows that refresh fact/dimension tables. However, the snapshot made available in the DW is often stale, since at the moment when an analytical query is issued, the source updates (e.g. in a remote retail store) may not have been extracted and processed by the ETL process in time due to high input data volume or slow processing speed. This staleness may cause incorrect results for time-critical decision support queries. To address this problem, snapshots which are supposed to be accessed by analytical queries need to be first maintained by corresponding ETL flows to reflect source updates based on given freshness needs. Snapshot maintenance in this work means maintaining the distributed data partitions that are required by a query. Since most NoSQL databases are not ACID compliant and do not provide full-fledged distributed transaction support, snapshot may be inconsistently derived when its data partitions are updated by different ETL maintenance jobs.This paper describes an extended version of HBelt system [1] which tightly integrates the wide-column NoSQL database HBase with a clustered & pipelined ETL engine. Our objective is to efficiently refresh HBase tables with remote source updates while a consistent snapshot is guaranteed across distributed partitions for each scan request in analytical queries. A consistency model is defined and implemented to address so-called distributed snapshot maintenance. To achieve this, ETL jobs and analytical queries are scheduled in a distributed processing environment. In addition, a partitioned, incremental ETL pipeline is introduced to increase the performance of ETL (update) jobs. We validate the efficiency gain in terms of data pipelining and data partitioning using the TPC-DS benchmark, which simulates a modern decision support system for a retail product supplier. Experimental results show that high query throughput can be achieved in HBelt when distributed, refreshed snapshots are demanded.  相似文献   

11.
The growing popularity of massively accessed Web applications that store and analyze large amounts of data, being Facebook, Twitter and Google Search some prominent examples of such applications, have posed new requirements that greatly challenge traditional RDBMS. In response to this reality, a new way of creating and manipulating data stores, known as NoSQL databases, has arisen. This paper reviews implementations of NoSQL databases in order to provide an understanding of current tools and their uses. First, NoSQL databases are compared with traditional RDBMS and important concepts are explained. Only databases allowing to persist data and distribute them along different computing nodes are within the scope of this review. Moreover, NoSQL databases are divided into different types: Key-Value, Wide-Column, Document-oriented and Graph-oriented. In each case, a comparison of available databases is carried out based on their most important features.  相似文献   

12.
In this paper, we consider performance evaluation of a system which shares K servers (or resources) among N heterogeneous classes of workloads, where server allocation and deallocation for class i is dictated by a class specific threshold-based policy with hysteresis control. In particular, the server activation time for class i is noninstantaneous. There are many systems and applications where a multiclass threshold-based queueing system can be of great use. One important utility of using threshold-based approaches is in situations where applications may incur server usage costs. In these cases, one needs to consider not only the performance aspects but also the resulting cost/performance ratio. The motivation for using hysteresis control is to reduce the unnecessary cost of server setup (or activation) and server removal (or deactivation) whenever there are momentary fluctuations in workload. Moreover, servers in such systems and applications are often needed by multiple classes of workloads and, hence, it is desirable to find good approaches to sharing server resources among the different classes of workloads, preferably without statically partitioning the server pool among these classes. An important and distinguishing characteristic of our work is that we consider the modeling and analysis of a multiclass system with noninstantaneous server activation. The main contributions of this work are: 1) in developing an efficient approximation method for solving such models; 2) in verifying the convergence of our iterative method, and 3) in evaluating the resulting accuracy of the technique for computing performance measures of interest, which can subsequently be used in making system design choices  相似文献   

13.
在大数据时代,信息化数据呈爆炸式增长,传统关系型数据库和新兴的NoSQL数据库都难以全面且高效地面对这些挑战。因此,提出一种基于中间件的异构数据库访问方法(MingleDB),以结合NoSQL和传统关系型数据库的优点。MingleDB透明融合了NoSQL数据库和传统数据库的主要运行逻辑,同时又能够根据当前用户请求的读写特征,自动选取合适的处理路径以避免二者的不足;它还支持轻量级的事务处理框架,该框架按需实施以保证异构数据库数据的最终一致性和完整性。将MingleDB分别与MongoDB,MySQL数据库进行读写性能对比,实验证明了MingleDB方法的正确性和合理性。同时将MingleDB部署在实际的社交网络系统中进行实际验证,结果亦证明了其实用性和可移植性。  相似文献   

14.
In distributed, heterogeneous and network-connected collaborative environments where resources are provided to diverse unknown users for their applications, it is necessary to define access control for resources. Access control for such systems is defined as the ability to authorise or repudiate access to resources by a particular user. Traditional access control solutions are inherently inadequate for collaborative systems because they are effective only in situations where the system knows in advance which users are going to access the resources and what are their access rights so that they can be predefined by the developers or security administrators, but in collaborative systems the number of users as well as their usage on resources is not static. Targeting collaborative systems, a fine grained, flexible, persistent trust-based model for protecting the access and usage of digital resources is defined in this paper using radial basis function neural network (RBFNN). RBFNN classifies the users requesting the resources as trustworthy and non-trustworthy based on their attributes. RBFNN is used for classification because of its ability to generalise well for even unseen data and non-iterative method employed in its training. A proof of concept implementation backed by extensive set of tests on the real data collected for one such collaborative systems, i.e. Enabling Grids for E-Science grid demonstrated that the design is sound for collaborative systems where access of resources are provided to large and unknown users with their variant set of requirements.  相似文献   

15.
Wireless sensor networks are used in a large array of applications to capture, collect, and analyze physical environmental data. Many existing sensor systems instruct sensor nodes to report their measurements to central repositories outside the network, which is expensive in energy cost. Recent technological advances in flash memory have given rise to the development of storagecentric sensor networks, where sensor nodes are equipped with high-capacity flash memory storage such that sensor data can be stored and managed inside the network to reduce expensive communication. This novel architecture calls for new data management techniques to fully exploit distributed in-network data storage. This paper describes some of our research on distributed query processing in such flash-based sensor networks. Of particular interests are the issues that arise in the design of storage management and indexing structures combining sensor system workload and read/write/erase characteristics of flash memory.  相似文献   

16.
Wireless sensor networks are used in a large array of applications to capture, collect, and analyze physical environmental data. Many existing sensor systems instruct sensor nodes to report their measurements to central repositories outside the network, which is expensive in energy cost. Recent technological advances in flash memory have given rise to the development of storage-centric sensor networks, where sensor nodes are equipped with high-capacity flash memory storage such that sensor data can be stored and managed inside the network to reduce expensive communication. This novel architecture calls for new data management techniques to fully exploit distributed in-network data storage. This paper describes some of our research on distributed query processing in such flash-based sensor networks. Of particular interests are the issues that arise in the design of storage management and indexing structures combining sensor system workload and read/write/erase characteristics of flash memory.  相似文献   

17.
Active database systems extend functionality of traditional database systems with powerful mechanisms of triggers (or active rules) support. Triggers provide a uniform and convenient base that can be used for realization of internal DBMS functions, such as support of integrity constraints, representations, access authorization, statistics gathering, monitoring and notifications and for higher efficiency of external applications. Among representative examples of external applications that can be based on the usage of properties of active DBMS are data-intensive expert systems and workflow management systems. Today, the majority of industrial relational DBMS already support triggers, while XML DBMS, which are comparatively new, lack such functionality. Expansion of the XML DBMS application field and its usage in constructing complex application system stimulates appearance of new research works aimed at extending the functionality of XML DBMS by trigger support.In this paper, the authors define a special type of triggers for XML DBMS—XML triggers responding to data retrieval—and propose methods for their implementation. The paper also discusses examples of applications where XML query triggers occur to be useful and gives review of existing research works in this area.  相似文献   

18.
支持大数据管理的NoSQL系统研究综述   总被引:6,自引:0,他引:6  
申德荣  于戈  王习特  聂铁铮  寇月 《软件学报》2013,24(8):1786-1803
针对大数据管理的新需求,呈现出了许多面向特定应用的NoSQL数据库系统。针对基于key-value数据模型的 NoSQL 数据库的相关研究进行综述。首先,介绍了大数据的特点以及支持大数据管理系统面临的关键技术问题;然后,介绍了相关前沿研究和研究挑战,其中典型的包括系统体系结构、数据模型、访问方式、索引技术、事务特性、系统弹性、动态负载均衡、副本策略、数据一致性策略、基于flash的多级缓存机制、基于MapReduce的数据处理策略和新一代数据管理系统等;最后给出了研究展望。  相似文献   

19.
分布式NoSQL系统旨在提供大规模数据的高可用性,但缺乏内在的支持复杂查询的应用程序。传统的基于单一词汇倒排表的解决方案未达到良好的效果。因此,文中就文档型数据库在处理动态文档集时不支持多键作为主索引的缺点展开研究,提出了一种改进的组合索引方法。通过存储组合条件的倒列表,查询驱动机制可以从最近的查询记录中自适应地存储比较受欢迎的条件组合。该方法可以降低整体的带宽消耗,只需占用较少的存储资源等额外开销,明显改善了NoSQL系统的容量和响应时间。  相似文献   

20.
In many scientific applications, arrays containing data are indirectly indexed through indirection arrays. Such scientific applications are called irregular programs and are a distinct class of applications that require special techniques for parallelization. This paper presents a library called CHAOS, which helps users implement irregular programs on distributed-memory message-passing machines, such as the Paragon, Delta, CM-5 and SP-1. The CHAOS library provides efficient runtime primitives for distributing data and computation over processors; it supports efficient index translation mechanisms and provides users high-level mechanisms for optimizing communication. CHAOS subsumes the previous PARTI library and supports a larger class of applications. In particular, it provides efficient support for parallelization of adaptive irregular programs where indirection arrays are modified during the course of computation. To demonstrate the efficacy of CHAOS, two challenging real-life adaptive applications were parallelized using CHAOS primitives: a molecular dynamics code, CHARMM, and a particle-in-cell code, DSMC. Besides providing runtime support to users, CHAOS can also be used by compilers to automatically parallelize irregular applications. This paper demonstrates how CHAOS can be effectively used in such a framework. By embedding CHAOS primitives in the Syracuse Fortran 90D/HPF compiler, kernels taken from the CHARMM and DSMC codes have been automatically parallelized.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号