首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 200 毫秒
1.
云计算环境下支持复杂查询的多维数据索引机制   总被引:1,自引:0,他引:1  
针对云计算环境下分布式存储系统的数据索引不支持复杂查询的问题,提出了一种多维数据索引机制M-Index,采用金字塔技术(pyramid-technique)将数据的多维元数据描述成一维索引,在此基础上首次提出前缀二叉树(prefix binary tree,PBT)的概念,通过提取一维索引和PBT有效节点的前缀作为数据在存储系统中的主键.数据根据主键和一致性Hash机制发布到存储节点组成的覆盖网络.设计了基于M-Index的数据查询算法,将复杂查询请求转换成一维查询键值,有效支持多维查询和区间查询等复杂查询模式.理论分析和实验表明,M-Index在复杂查询模式下具有良好的查询效率和负载均衡.  相似文献   

2.
分片位图索引:一种适用于云数据管理的辅助索引机制   总被引:3,自引:0,他引:3  
云计算技术的快速发展为海量数据的存储和管理提供了可能.然而,由于存储模型的根本改变,传统关系数据库管理系统中成熟的索引技术既不能直接应用于海量数据的处理,也无法被简单地迁移到云计算环境中.通过分析对比辅助索引在云环境中的两种截然不同的基本逻辑结构,即集中式方案与分布式方案,在吸收两者的优势并规避其弱点的基础上,提出了具有良好可扩展性的分片位图索引机制,从而对云环境中海量数据的检索任务提供高效的支持.通过充分利用云环境中的并行计算资源,使单条查询的响应速度得到提升;与此同时,局部节点根据其所掌握的全局信息规避了不必要的检索开销从而使大量请求并发到达时的查询吞吐量得以保证.在真实数据上进行实验的结果表明,分片位图索引的查询性能大大优于其它方法.  相似文献   

3.
云数据管理系统中查询技术研究综述   总被引:8,自引:0,他引:8  
作为一种全新的互联网应用模式,云计算在工业界和学术界备受关注.人们可以通过终端设备便捷地获取云端服务,并以按需使用的方式获得存储资源、计算资源以及软硬件资源.云计算的发展带来了一系列挑战性问题,而云数据的管理问题首当其冲.文中结合云数据的特点提出了一个云数据管理系统的框架,并在此基础上从索引管理、查询处理、查询优化以及在线聚集等几个方面对云数据管理系统中查询技术的研究工作进行了总结分析,指明了该领域面临的挑战和未来的研究工作.  相似文献   

4.
针对用户在大规模云对等网络环境下多维区间查询问题,将基于m叉平衡树的索引架构引入到云对等网络环境下,在该架构上实现集中式环境下支持多维数据索引的层次化树结构,例如R树,QR树等。多维区间查询算法保证查询从树的任意位置开始,避免了根节点引起的系统性能瓶颈问题。通过计算和实验验证,对于N个节点的网络,多维区间查询效率为O(logmN)(m>2)(m表示扇出),由此可见,查询效率和维数d无关,查询效率不会随着维数d的增加而降低。最后建立基于扇出m的代价模型,并且计算出了最优的m值。  相似文献   

5.
云计算环境下的分布存储关键技术   总被引:11,自引:0,他引:11  
云计算作为下一代计算模式,在科学计算和商业计算领域均发挥着重要作用,受到当前学术界和企业界的广泛关注.云计算环境下的分布存储主要研究数据在数据中心上的组织和管理,作为云计算环境的核心基础设施,数据中心通常由百万级以上节点组成,存储其上的数据规模往往达到PB级甚至EB级,导致数据失效成为一种常态行为,极大地限制了云计算的应用和推广,增加了云计算的成本.因此,提高可扩展性和容错性、降低成本,成为云计算环境下分布存储研究的若干关键技术.针对如何提高存储的可扩展性、容错性以及降低存储的能耗等目标,从数据中心网络的设计、数据的存储组织方式等方面对当前分布存储的关键技术进行了综述.首先,介绍并对比了当前典型的数据中心网络结构的优缺点;其次,介绍并对比了当前常用的两种分布存储容错技术,即基于复制的容错技术和基于纠删码的容错技术;第三,介绍了当前典型的分布存储节能技术,并分析了各项技术的优缺点;最后指出了当前技术面临的主要挑战和下一步研究的方向.  相似文献   

6.
云计算和云数据管理技术   总被引:7,自引:0,他引:7  
随着各种新技术的发展,企业的关键信息以几何级速度增长,更多的数据需要保存更长的时间.伴随着云计算技术的发展,云计算已经成为一种全新的互联网应用模式.而在云计算对海量的数据高效管理,云端数据精确精准快速查询成为越来越重要的问题.一个新的面向云计算的数据管理研究领域正逐渐形成,在云计算技术的基础上,提出了云数据管理的概念.分析GFS,BigTable,Dynamo等当前互联网主流云数据管理系统的基本原理,并针对未来云数据管理架构进行分析,最后指出了云数据管理领域的主要研究方向.  相似文献   

7.
近年来,随着计算机技术的迅猛发展,其领域迎来了大数据时代。随着大数据的出现,传统的关系型数据库已经不能满足高储存量的要求,此时成本低廉、有着良好并行性和伸缩性的云数据库应运而生,它采用键值对数据模型和分布式的计算环境。但是海量数据在Key-value数据库中的查询效率低下、实时性差等问题又普遍存在。为了解决查询效率低下这一问题,将多维数据模型和索引技术应用于Key-value数据库,将事实数据以多维的形式进行存储并在多维模型上建立索引以加快查询速度。论文将系统地描述多维数据模型的建立和索引技术的实现,最后简单地和主流Key-value数据库进行优缺点对比。  相似文献   

8.
主要探讨了基于Android平台和云计算的电子词典设计与实现过程,系统本地客户端使用Android软件开发工具包以及Eclipse集成开发环境进行开发,云服务器端则综合运用负载均衡、网络存储、虚拟化、结点管理等技术进行搭建.系统采用二级索引方式存储词汇数据,大大提高了查询检索词汇的速度.  相似文献   

9.
为了研究多维属性云资源在云对等网络中快速定位问题,结合云对等网络的优势,提出了一种基于云对等网络的多属性云资源的查找算法。在分层云对等网络的基础上,分别利用云资源的类型和属性值建立多维索引。首先根据类型索引将相关的数据聚集在同一个资源簇内;然后将属性值的值域划分为多个区段,并将相应资源存储其中。同时建立资源簇融合、区间邻居维护等机制使算法更具效率和扩展性。仿真实验表明,该算法实现了多属性云资源的快速定位。并且它不会随着网络节点和类型维度增加而产生较大查询迟延,具有很好的扩展性。  相似文献   

10.
智慧养老系统由于其管理关系到多元异构数据,对于目前智慧养老系统存在的数据存储、扩展能力、存储类型等方面的缺陷,通过云数据、云计算在智慧养老系统中的运用,将Hadoop云平台应用到智慧养老系统多源异构数据融合中,实现养老数据的云查询、云存储。首先研究智慧养老系统的数据多元性,进而分析多元数据特点,将其划分成空间数据、医学影像、属性数据三方面,进而针对性的设计空间数据存储、空间数据检索、医学影像检索、属性数据索引等。利用多元异构数据融合技术来实现相关数据的写入和查询。  相似文献   

11.
大数据流式计算:关键技术及系统实例   总被引:5,自引:0,他引:5  
大数据计算主要有批量计算和流式计算两种形态,目前,关于大数据批量计算系统的研究和讨论相对充分,而如何构建低延迟、高吞吐且持续可靠运行的大数据流式计算系统是当前亟待解决的问题且研究成果和实践经验相对较少.总结了典型应用领域中流式大数据所呈现出的实时性、易失性、突发性、无序性、无限性等特征,给出了理想的大数据流式计算系统在系统结构、数据传输、应用接口、高可用技术等方面应该具有的关键技术特征,论述并对比了已有的大数据流式计算系统的典型实例,最后阐述了大数据流式计算系统在可伸缩性、系统容错、状态一致性、负载均衡、数据吞吐量等方面所面临的技术挑战.  相似文献   

12.
何龙  陈晋川  杜小勇 《软件学报》2017,28(3):502-513
SOH(SQL over HDFS)系统通常将数据存储于分布式文件系统HDFS中,采用Map/Reduce或分布式查询引擎来处理查询任务。得益于HDFS以及Map/Reduce的容错能力和可扩展性,SOH系统可以很好地应对数据规模的飞速增长,完成分析型查询处理。然而,在处理选择型查询或交互式查询时,这类系统暴露出性能上的缺陷。本文提出一个通用的索引技术,可以应用于SOH系统中,以提高其查询处理的效率。分析了SOH系统访问HDFS文件的过程,指出了其中影响数据加载时间的关键因素;提出了split层和split内部双层索引机制;设计并实现了聚集索引和非聚集索引。最后,在标准数据集上进行了大量实验,并与现有基于HDFS的索引技术进行了比较。实验结果表明,所提出的索引技术可以有效地提高查询处理的效率。  相似文献   

13.
The growing scale and complexity of component interactions in cloud computing systems post great challenges for operators to understand the characteristics of system performance. Profiling has long been proved to be an effective approach to performance analysis; however, existing approaches confront new challenges that emerge in cloud computing systems. First, the efficiency of the profiling becomes of critical concern; second, service-oriented profiling should be considered to support separation-of-concerns performance analysis. To address the above issues, in this paper, we present P-Tracer, an online performance profiling tool specifically tailored for cloud computing systems. P-Tracer constructs a specific search engine that proactively processes performance logs and generates a particular index for fast queries; second, for each service, P-Tracer retrieves a statistical insight of performance characteristics from multi-dimensions and provides operators with a suite of web-based interfaces to query the critical information. We evaluate P-Tracer in the aspects of tracing overheads, data preprocessing scalability and querying efficiency. Three real-world case studies that happened in Alibaba cloud computing platform demonstrate that P-Tracer can help operators understand software behaviors and localize the primary causes of performance anomalies effectively and efficiently.  相似文献   

14.
Cloud computing has recently emerged as a new paradigm to provide computing services through large-size data centers where customers may run their applications in a virtualized environment. The advantages of cloud in terms of flexibility and economy encourage many enterprises to migrate from local data centers to cloud platforms, thus contributing to the success of such infrastructures. However, as size and complexity of cloud infrastructures grow, scalability issues arise in monitoring and management processes. Scalability issues are exacerbated because available solutions typically consider each virtual machine (VM) as a black box with independent characteristics, which is monitored at a fine-grained granularity level for management purposes, thus generating huge amounts of data to handle. We claim that scalability issues can be addressed by leveraging the similarity between VMs in terms of resource usage patterns. In this paper, we propose an automated methodology to cluster similar VMs starting from their resource usage information, assuming no knowledge of the software executed on them. This is an innovative methodology that combines the Bhattacharyya distance and ensemble techniques to provide a stable evaluation of similarity between probability distributions of multiple VM resource usage, considering both system- and network-related data. We evaluate the methodology through a set of experiments on data coming from an enterprise data center. We show that our proposal achieves high and stable performance in automatic VMs clustering, with a significant reduction in the amount of data collected which allows to lighten the monitoring requirements of a cloud data center.  相似文献   

15.
Containers, enabling lightweight environment and performance isolation, fast and flexible deployment, and fine-grained resource sharing, have gained popularity in better application management and deployment in addition to hardware virtualization. They are being widely used by organizations to deploy their increasingly diverse workloads derived from modern-day applications such as web services, big data, and internet of things in either proprietary clusters or private and public cloud data centers. This has led to the emergence of container orchestration platforms, which are designed to manage the deployment of containerized applications in large-scale clusters. These systems are capable of running hundreds of thousands of jobs across thousands of machines. To do so efficiently, they must address several important challenges including scalability, fault tolerance and availability, efficient resource utilization, and request throughput maximization among others. This paper studies these management systems and proposes a taxonomy that identifies different mechanisms that can be used to meet the aforementioned challenges. The proposed classification is then applied to various state-of-the-art systems leading to the identification of open research challenges and gaps in the literature intended as future directions for researchers.  相似文献   

16.
Cyberattacks are difficult to prevent because the targeted companies and organizations are often relying on new and fundamentally insecure cloud-based technologies, such as the Internet of Things. With increasing industry adoption and migration of traditional computing services to the cloud, one of the main challenges in cybersecurity is to provide mechanisms to secure these technologies. This work proposes a Data Security Framework for cloud computing services (CCS) that evaluates and improves CCS data security from a software engineering perspective by evaluating the levels of security within the cloud computing paradigm using engineering methods and techniques applied to CCS. This framework is developed by means of a methodology based on a heuristic theory that incorporates knowledge generated by existing works as well as the experience of their implementation. The paper presents the design details of the framework, which consists of three stages: identification of data security requirements, management of data security risks and evaluation of data security performance in CCS.  相似文献   

17.
随着大数据时代的到来,传统的计算机因为单机资源有限、运行速度慢、分布式处理支持差,已满足不了现行的医疗体系中的大数据处理需求,基于时空数据的移动医疗呼叫系统方法可以很好地解决这些问题。在移动云计算环境下研究[k]最近邻查询算法是当前一个热点问题,支持可扩展和分布式的空间数据索引对于kNN查询的效率影响很大,目前已有的查询算法不适合并行化或者会导致内容冗余。将MapReduce分布式处理技术与空间kNN查询方法相结合,设计可以快速检索到满足用户查询需求的医生位置信息的移动医疗呼叫算法。提出并构建了一个新的分布式空间数据索引方法:倒排Voronoi图索引,它将倒排索引和Voronoi图索引进行结合;提出了一种基于MapReduce的利用Voronoi图来处理kNN查询的高效算法,其在分布式环境下可以有效提高查询效率;用真实的和仿真的数据集来进行大量实验评估,实验结果表明所提出的方法具有良好的高效性和可扩展性。  相似文献   

18.
Failures are normal rather than exceptional in cloud computing environments, high fault tolerance issue is one of the major obstacles for opening up a new era of high serviceability cloud computing as fault tolerance plays a key role in ensuring cloud serviceability. Fault tolerant service is an essential part of Service Level Objectives (SLOs) in clouds. To achieve high level of cloud serviceability and to meet high level of cloud SLOs, a foolproof fault tolerance strategy is needed. In this paper, the definitions of fault, error, and failure in a cloud are given, and the principles for high fault tolerance objectives are systematically analyzed by referring to the fault tolerance theories suitable for large-scale distributed computing environments. Based on the principles and semantics of cloud fault tolerance, a dynamic adaptive fault tolerance strategy DAFT is put forward. It includes: (i) analyzing the mathematical relationship between different failure rates and two different fault tolerance strategies, which are checkpointing fault tolerance strategy and data replication fault tolerance strategy; (ii) building a dynamic adaptive checkpointing fault tolerance model and a dynamic adaptive replication fault tolerance model by combining the two fault tolerance models together to maximize the serviceability and meet the SLOs; and (iii) evaluating the dynamic adaptive fault tolerance strategy under various conditions in large-scale cloud data centers and consider different system centric parameters, such as fault tolerance degree, fault tolerance overhead, response time, etc. Theoretical as well as experimental results conclusively demonstrate that the dynamic adaptive fault tolerance strategy DAFT has high potential as it provides efficient fault tolerance enhancements, significant cloud serviceability improvement, and great SLOs satisfaction. It efficiently and effectively achieves a trade-off for fault tolerance objectives in cloud computing environments.  相似文献   

19.
With computing systems undergone a fundamental transformation from single-processor devices at the turn of the century to the ubiquitous and networked devices and the warehouse-scale computing via the cloud, the parallelism has become ubiquitous at many levels. At micro level, parallelisms are being explored from the underlying circuits, to pipelining and instruction level parallelism on multi-cores or many cores on a chip as well as in a machine. From macro level, parallelisms are being promoted from multiple machines on a rack, many racks in a data center, to the globally shared infrastructure of the Internet. With the push of big data, we are entering a new era of parallel computing driven by novel and ground breaking research innovation on elastic parallelism and scalability. In this paper, we will give an overview of computing infrastructure for big data processing, focusing on architectural, storage and networking challenges of supporting big data paper. We will briefly discuss emerging computing infrastructure and technologies that are promising for improving data parallelism, task parallelism and encouraging vertical and horizontal computation parallelism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号