首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
介绍了两个具有代表性的NoSQL数据库:Bigtable和Dynamo系统。首先,描述了Bigtable和Dynamo的适用范围及其产生原因。Bigtable和Dynamo可以高效的处理web数据提供相应服务;然后,介绍了Bigtable和Dynamo系统的架构、特性等,以及各自独特的设计方法。最后,将这两个数据库与传统的关系数据库进行比较分析,描述了它们之间的不同点,对比结果表明NoSQL数据库在处理web应用数据时是高效可用的,比传统关系数据库更占优势。  相似文献   

2.
NoSQL systems have gained their popularity for many reasons, including the flexibility they provide in organizing data, as they relax the rigidity provided by the relational model and by the other structured models. This flexibility and the heterogeneity that has emerged in the area have led to a little use of traditional modeling techniques, as opposed to what has happened with databases for decades.In this paper, we argue how traditional notions related to data modeling can be useful in this context as well. Specifically, we propose NoAM (NoSQL Abstract Model), a novel abstract data model for NoSQL databases, which exploits the commonalities of various NoSQL systems. We also propose a database design methodology for NoSQL systems based on NoAM, with initial activities that are independent of the specific target system. NoAM is used to specify a system-independent representation of the application data and, then, this intermediate representation can be implemented in target NoSQL databases, taking into account their specific features. Overall, the methodology aims at supporting scalability, performance, and consistency, as needed by next-generation web applications.  相似文献   

3.
Starting with the birth of Web 2.0, the quantity of data managed by large-scale web services has grown exponentially, posing new challenges and infrastructure requirements. This has led to new programming paradigms and architectural choices, such as map-reduce and NoSQL databases, which constitute two of the main peculiarities of the specialized massively distributed systems referred to as Big Data architectures. The underlying computer infrastructures usually face complexity requirements, resulting from the need for efficiency and speed in computing over huge evolving data sets. This is achieved by taking advantage from the features of new technologies, such as the automatic scaling and replica provisioning of Cloud environments. Although performances are a key issue for the considered applications, few performance evaluation results are currently available in this field. In this work we focus on investigating how a Big Data application designer can evaluate the performances of applications exploiting the Apache Hive query language for NoSQL databases, built over a Apache Hadoop map-reduce infrastructure.This paper presents a dedicated modeling language and an application, showing first how it is possible to ease the modeling process and second how the semantic gap between modeling logic and the domain can be reduced, by means of vertical multiformalism modeling.  相似文献   

4.
As the growing of applications with big data in cloud computing become popular, many existing systems expect to expand their service to support the explosive increase of data. We propose a data adapter system to support hybrid database architecture including a relational database (RDB) and NoSQL database. It can support query from application and deal with database transformation at the same time. We provide three modes of query approach in data adapter system: blocking transformation mode (BT mode), blocking dump mode (BD mode), and direct access mode (DA mode). We provide a data synchronization mechanism and describe the design and implementation in detail. This paper focuses on velocity with proposed three modes and partly variety with data stored in RDB, NoSQL database and temporary files. With the proposed data adapter system, we can provide a seamless mechanism to use RDB and NoSQL database at the same time.  相似文献   

5.
The amount of data being produced is increasing constantly, as the number and variety of connected devices are growing and the advances in data storage and mining are supporting this evolution. However, storing and handling high quantities of data is challenging the current Relational Database Management Systems. Big Data and its related products came to help in this matter, and the NoSQL databases arise with the purpose to offer better solutions and features to handle massive amounts of data with higher performance, sometimes near real-time. The present study presents the NoSQL databases scenario and background, and elaborates a detailed study with the characteristics, a features comparison and a performance evaluation of three different NoSQL databases extensively used in the market nowadays: Couchbase, MongoDB and RethinkDB. Tests were performed in two different scenarios: single thread and multiple threads. The results reveal that Couchbase had a better performance at most of the operations, except for retrieving multiple documents and inserting documents with multiple threads, operations in which MongoDB scored better.  相似文献   

6.
NoSQL systems are increasingly adopted for Web applications requiring scalability that relational database systems cannot meet. Although NoSQL systems have not been designed to support joins, as they are applied to a wide variety of applications, the need to support joins has emerged. Furthermore, joins performed in NoSQL systems are generally similarity joins, rather than exact-match joins, which find similar pairs of records. Since Web applications often use the MapReduce framework, we develop a solution to perform similarity joins in NoSQL systems using the MapReduce framework.  相似文献   

7.
大数据具有规模大、深度大、宽度大、处理时间短、硬件系统普通化、软件系统开源化等特点。传统关系型数据库在对大数据进行操作时,系统性能严重下降。因此,大数据管理技术研究成为当前研究热点。分别从并行数据库,面向大数据处理的MapReduce模型,NoSQL与数据库技术的对比以及MapReduce与数据库技术相结合四个方面,对国内外的研究发展状况进行分析和评述,最后展望了未来大数据研究发展方向。  相似文献   

8.
A knowledge-based system, called the Knowledge Extraction System (KES), is presented which performs the process of reverse engineering of relational databases. KES generates an extended entity-relationship (EER) model from a relational database. Within its extraction procedure, domain semantics are obtained by analyzing the data schema and data instances of an existing database, by using heuristics, or asking the user. Relations and attributes are classified into several categories and then converted into the corresponding modelling structures of the EER model. KES demonstrates how knowledge-based system technology can be applied to ease the work of database reverse engineering. It also illustrates that the reverse engineering process can be implemented at a high level of automation. To do so, KES is integrated with the target database management system so that data can be analyzed directly through dynamic SQL queries.  相似文献   

9.
随着云计算、物联网、移动互联网等技术的飞速发展,海量数据在这些崭新的领域迅猛地生长着,大数据作为一项颠覆性技术,为处理海量数据提供了无限可能。而传统的关系型数据库的不再适用,导致了分布式数据库NoSQL的应运而生。针对大数据领域面临的种种现实难题,设计并实现了一种基于Hadoop和NoSQL的新型分布式大数据管理系统(DBDMS),其提供大数据的实时采集、检索以及永久存储的功能。实验表明,DBDMS可以显著提高大数据处理能力,适用于海量日志备份和检索、海量网络报文抓取和分析等领域。  相似文献   

10.
关系数据库中分布式大数据集成冲突消解仿真   总被引:1,自引:0,他引:1  
针对当前方法在进行冲突消解时存在消解耗时较长、消解成功率较低的问题,提出一种基于BP神经网络的关系数据库分布式大数据集成冲突消解方法,利用相似度度量方法提取关系数据库中分布式大数据集成过程数据属性特征在语义上的冲突特征,包括字符类型属性值的数据、数值类型属性值的数据、布尔类型属性值的数据,还有区间值类型属性值的数据四种;在相似度计算基础上,实现不同属性值类型数据的冲突特征提取,将这些冲突特征输入到训练好的BP神经网络模型中,判断关系数据库中分布式大数据集成过程是否存在冲突,并对存在的冲突进行消解。仿真对比测试结果证明,所提方法能够实现关系数据库中分布式大数据集成过程的冲突消解,而且具有耗时低、成功率高的优点。  相似文献   

11.
One of the chief difficulties which needs to be overcome during the early design stages of a system is that of establishing a satisfactory design for that system. From the time it was first conceived it was apparent that the Relational Data Base Management System is like a compiler in so far as it takes a succession of user requests for information formulated in an applied predicate calculus and translates each one into a series of calls which access an underlying data base and transform data from that data base into the form the user wishes to see. This paper compares the architecture of the Relational Data Base Management System with that of a compiler, and then demonstrates the use of the architecture when processing a language based on an applied predicate calculus. Finally, the paper describes a number of extensions to that architecture which are required to solve the particular problems raised by the data base system.  相似文献   

12.
随着大数据时代的到来,应用数据量剧增,个性化推荐技术日趋重要。传统的推荐技术直接应用于大数据环境时会面临推荐精度低、推荐时延长以及网络开销大等问题,导致推荐性能急剧下降。针对上述问题,提出用户共现矩阵乘子推荐策略,将用户相似度矩阵与项目评分矩阵相乘得到用户对项目的预测评分矩阵,从而生成对每个用户的候选推荐项目集;在此基础上,根据分布式处理架构的特点对传统协同过滤算法进行并行化扩展,设计了基于用户的分布式协同过滤算法;最后通过重定义序列组合的MapReduce模式将多个子任务串联起来,自动地完成顺序化的执行。实验结果表明,该算法在分布式计算环境下具有良好的推荐精度和推荐效率。  相似文献   

13.
基于ORDB的分布式空间数据异步更新模型研究   总被引:5,自引:0,他引:5  
随着目前数据库技术与GIS应用的结合,采用对象关系数据库来管理空间数据已表现出强大的生命力。空间数据的更新处理,在解决空间数据共享和空间数据的互操作性方面有着重要的意义。在基于对象关系数据库管理空间数据的基础上,设计了一种分布式空间数据的异步更新模型,从而可以较好地应用于对移动式分布空间数据的更新信息处理。  相似文献   

14.
We propose a new algorithm, called Stripe-join, for performing a join given a join index. Stripe-join is inspired by an algorithm called ‘Jive-join’ developed by Li and Ross. Stripe-join makes a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a set of temporary files that contain tuple identifiers but no input tuples. Stripe-join performs this efficiently even when the input relations are much larger than main memory, as long as the number of blocks in main memory is of the order of the square root of the number of blocks in the participating relations. Stripe-join is particularly efficient for self-joins. To our knowledge, Stripe-join is the first algorithm that, given a join index and a relation significantly larger than main memory, can perform a self-join with just a single pass over the input relation and without storing input tuples in intermediate files. Almost all the I/O is sequential, thus minimizing the impact of seek and rotational latency. The algorithm is resistant to data skew. It can also join multiple relations while still making only a single pass over each input relation. Using a detailed cost model, Stripe-join is analyzed and compared with competing algorithms. For large input relations, Stripe-join performs significantly better than Valduriez's algorithm and hash join algorithms. We demonstrate circumstances under which Stripe-join performs significantly better than Jive-join. Unlike Jive-join, Stripe-join makes no assumptions about the order of the join index.  相似文献   

15.
In this paper we discuss the merging of two different computation paradigms: the fixpoint computation for deductive databases and the pattern-matching computation for graph-based languages. We show how these paradigms can be combined on the example of the declarative, graph-based, database query language G-Log. A naive algorithm to compute G-Log programs turns out to be very inefficient. However, we also present a backtracking fixpoint algorithm for Generative G-Log, a syntactical sublanguage of G-Log that, like G-Log, is non-deterministic complete. This algorithm is considerably more efficient, and reduces to the standard fixpoint computation for a sublanguage of Generative G-Log that is a graphical equivalent of Datalog. The paper also studies some interesting properties like satisfiability and triviality, that are undecidable for full G-Log and turn out to be decidable for sufficiently general classes of Generative G-Log programs.  相似文献   

16.
Recent progress in peer to peer (P2P) search algorithms has presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effective full-text search within P2P databases. PHIRST works by effectively leveraging between the relative strengths of these approaches. Similar to structured approaches, agents first publish terms within their stored documents. However, frequent terms are quickly identified and not exhaustively stored, resulting in a significant reduction in the system's storage requirements. During query lookup, agents use unstructured search to compensate for the lack of fully published terms. Additionally, they explicitly weigh between the costs involved in structured and unstructured approaches, allowing for a significant reduction in query costs. Finally, we address how node failures can be effectively addressed through storing multiple copies of selected data. We evaluated the effectiveness of our approach using both real-world and artificial queries. We found that in most situations our approach yields near perfect recall. We discuss the limitations of our system, as well as possible compensatory strategies.  相似文献   

17.
The paper describes a new pictorial database oriented to image analysis, implemented inside the MIDAS data analysis system. Pictorial databases need expressive data structures in order to represent a wide class of information from the numerical to the visual. The model of the database is relational; however, a full normalization is not achievable, owing to the complexity of the visual information. The paper reports the general design and notes on the software implementation. Preliminary experiments show the performance of the pictorial database.  相似文献   

18.
Distributed database is an exciting concept since it combines the functional advantages of an integrated database with the economic advantages of a distributed implementation. However, a potential implementor may well ask how much expensive special purpose software must be produced locally or otherwise added to realise a distributed database or, in other words, how much support for the distributed database concept is currently available from manufacturers or software vendors. This paper outlines the requirements of distributed database systems and attempts to survey the present level of support.  相似文献   

19.
Dynamic aspects of information systems are taken into account in a lot of conceptual models. However, the dynamic concepts of these models have rarely been fully implemented in database management systems (DBMSs).Rubis is an extended relational DBMS which supports an extended relational schema (including event and operation concepts) and automatic control of the dynamic aspects of applications, i.e. event recognition, operation triggering and time handling.The first part of the paper contains a short presentation of the basic concepts and the specification language used for the extended schema. The second part focuses on the internal mechanisms: the temporal processor, which manages the temporal aspects of specifications and recognizes temporal events; and the event processor which manages events treatment and synchronization. These two mechanisms permit an automatic execution of the extended schema and so provide rapid prototyping capabilities. This last part will be covered in the December issue of this journal.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号