首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 859 毫秒
1.
Replacing existing software/hardware components with their equivalent cloud services is an important decision faced by IT managers in today's enterprises. A variety of possible migration targets and cloud services with too many configurations and cost models, disparate and changing strategic objectives of the enterprise management that triggers the migration process, and the complex structure of the legacy applications make software migration to the cloud a challenging issue. In contrast to the existing approaches that model the migration process as an optimization problem to find the optimal deployment of software components on cloud services without presenting a practical migration plan, in this paper, a plan-oriented migration approach is proposed by which the enterprise management is able to follow migration steps of a valid plan. All valid plans are modeled using a labeled transition system, and a recommender engine directs the management through the possible migration paths using predefined fitness functions. It was observed that, particularly in dynamic and changing conditions that a flexible migration plan is essential, the proposed plan-oriented method is very much effective in satisfying the enterprise strategic objectives. Evaluations have been performed using two quality indicators: total cost of ownership and scalability index.  相似文献   

2.
A multidatabase system (MDBS) integrates information from multiple autonomous local databases. Performing global query optimization to achieve efficient query processing in such a system is challenging due to local autonomy of the data sources. Dynamic factors in the environment make the problem even more difficult. In this paper, we present two techniques, i.e., contention space partitioning and cost error controlling, to perform global query optimization in a dynamic MDBS. Both techniques generate an execution plan with multiple versions for a query in a dynamic MDBS, utilizing the multistate cost models built for the dynamic environment via our previous multistate query sampling method. The first technique partitions the contention space of a dynamic multidatabase environment into a given number of subspaces and chooses a good query execution plan version for each subspace, while the second technique selects a set of execution plan versions by using a given error tolerance to control query execution costs. Experiments demonstrate that the proposed techniques are quite promising for performing global query optimization in a dynamic MDBS. Compared with related work on dynamic query optimization, our approach has an advantage of avoiding the high overhead for modifying or re-generating an execution plan for a query based on dynamic runtime information. Research was supported by the US National Science Foundation under Grant # IIS-9811980 and The University of Michigan.  相似文献   

3.
This paper studies continuous pattern detection over large evolving graphs, which plays an important role in monitoring-related applications. The problem is challenging due to the large size and dynamic updates of graphs, the massive search space of pattern detection and inconsistent query results on dynamic graphs. This paper first introduces a snapshot isolation requirement, which ensures that the query results come from a consistent graph snapshot instead of a mixture of partial evolving graphs. Second, we propose an SSD (single sink directed acyclic graph) plan friendly to vertex-centric-distributed graph processing frameworks. SSD plan can guide the message transformation and transfer among graph vertices, and determine the satisfaction of the pattern on graph vertices for the sink vertex. Third, we devise strategies for major steps in the SSD evaluation, including the location of valid messages to achieve snapshot isolation, AO-List to determine the satisfaction of transition rule over dynamic graph, and message-on-change policy to reduce outgoing messages. The experiments on billion-edge graphs using Giraph, an open source implementation of Pregel, illustrate the efficiency and effectiveness of our method.  相似文献   

4.
As RDF data continue to gain popularity, we witness the fast growing trend of RDF datasets in both the number of RDF repositories and the size of RDF datasets. Many known RDF datasets contain billions of RDF triples (subject, predicate and object). One of the grant challenges for managing these huge RDF data is how to execute RDF queries efficiently. In this paper, we address the query processing problems against the billion triple challenges. We first identify some causes for the problems of existing query optimization schemes, such as large intermediate results, initial query cost estimation errors. Then, we present our block-oriented dynamic query plan generation approach powered with pipelining execution. Our approach consists of two phases. In the first phase, a near-optimal execution plan for queries is chosen by identifying the processing blocks of queries. We group the join patterns sharing a join variable into building blocks of the query plan since executing them first provides opportunities to reduce the size of intermediate results generated. In the second phase, we further optimize the initial pipelining for a given query plan. We employ optimization techniques, such as sideways information passing and semi-join, to further reduce the size of intermediate results, improve the query processing cost estimation and speed up the performance of query execution. Experimental results on several RDF datasets of over a billion triples demonstrate that our approach outperforms existing RDF query engines that rely on dynamic programming based static query processing strategies.  相似文献   

5.
基数估计是基于代价查询优化的关键步骤,已经被研究了近40年.传统方法如基于直方图的方法在一些假设如属性相互独立、相交的表满足包含原则等成立时能基本满足准确性要求.然而,在真实运行环境中这些假设往往不再成立,可能导致基数估计严重错误进而造成查询延迟.近年来,随着数据的增多和新硬件的发展,使用机器学习方法来提高基数估计的质量成为了可能.由于基于代价的查询优化主要根据查询中子执行计划的估计代价来选择最优的查询执行计划,因此,有一些最近的工作针对一些关键的子执行计划模板建立相应的局部学习模型,取得了不错的进展.但是,这些局部模型主要用于查询(查询空间)分布和数据(数据库数据)分布不变的场景,而在真实运行环境中,它们往往不断地发生变化,限制了这些估计技术的有效性.在本文中,我们针对子执行计划模板在查询分布和数据分布不断变化的环境下提出了一种使用增量的局部加权学习进行自适应基数估计的方法.具体地说,首先抽取子执行计划的语义和统计特征使之能代表当前查询和数据的特性,然后使用增量的局部加权学习模型根据查询分布和数据分布的变化进行自适应的学习,实现基数估计.最后,通过对比实验验证了本文方法的有效性.  相似文献   

6.
We observe two deficiencies of current query processing and scheduling techniques for sensor networks: (1) A query execution plan does not adapt to the hardware characteristics of sensing devices; and (2) the data communication schedule of each node is not adapted to the query runtime workload. Both cause time and energy waste in query processing in sensor networks. To address this problem, we propose an adaptive holistic scheduler, AHS, to run on each node in a wireless sensor network. AHS schedules both the query evaluation and the wireless communication operations, and is able to adapt the schedule to the runtime dynamics of these operations on each node. We have implemented AHS and tested it on real motes as well as in simulation. Our results show that AHS improves the performance of query processing in various dynamic settings.  相似文献   

7.
完整性约束是保证关系型数据库中数据确定性的重要条件,现实中存在大量不确定、不满足完整约束条件,但仍具有使用价值。结合概率数据库理论,提出了一种新的针对非一致性数据库的查询策略,利用并、交、差、选择、投影、连接等约束方法,对非一致性数据进行修复,四元组概率计算方法和概率查询重写技术弥补了非一致性数据库查询的不足,减少了数据冲突的发生机率。  相似文献   

8.
It is widely recognized that the integration of information retrieval (IR) and database (DB) techniques provides users with a broad range of high quality services. Along this direction, IR-styled m-keyword query processing over a relational database in an rdbms framework has been well studied. It finds all hidden interconnected tuple structures, for example connected trees that contain keywords and are interconnected by sequences of primary/foreign key relationships among tuples. A new challenging issue is how to monitor events that are implicitly interrelated over an open-ended relational data stream for a user-given m-keyword query. Such a relational data stream is a sequence of tuple insertion/deletion operations. The difficulty of the problem is related to the number of costly joins to be processed over time when tuples are inserted and/or deleted. Such cost is mainly affected by three parameters, namely, the number of keywords, the maximum size of interconnected tuple structures, and the complexity of the database schema when it is viewed as a schema graph. In this paper, we propose new approaches. First, we propose a novel algorithm to efficiently determine all the joins that need to be processed for answering an m-keyword query. Second, we propose a new demand-driven approach to process such a query over a high speed relational data stream. We show that we can achieve high efficiency by significantly reducing the number of intermediate results when processing joins over a relational data stream. The proposed new techniques allow us to achieve high scalability in terms of both query plan generation and query plan execution. We conducted extensive experimental studies using synthetic data and real data to simulate a relational data stream. Our approach significantly outperforms existing algorithms.  相似文献   

9.
大规模高维向量空间的快速范围查询   总被引:1,自引:0,他引:1  
金字塔技术是目前针对高维空间范围查询的有效方法之一,但是随着数据量的增加,检索过程由于引入过多的误中点而导致不必要的高维距离计算,为此本文提出改进的金字塔技术.引入向量排序、活性维等概念,利用分段处理思想,将不包含候选点的误中分段剪枝,并通过逐维距离累加法过滤剩余分段内的误中点,从而快速排除所有的误中点,尽可能减少距离计算次数,实现大规模高维向量空间的快速范围查询.利用模拟数据和真实数据,实验验证了OPT方法的正确性和有效性.  相似文献   

10.
数据流查询计划的并行迁移策略   总被引:1,自引:0,他引:1       下载免费PDF全文
数据流中的查询计划需要不断进行适应性优化,针对该特征提出一种查询计划的并行迁移策略。该策略能确保在输出过程中不丢。失元组或产生冗余元组,维持正确的元组输出时序。实验结果证明,该策略可以使查询计划平滑过渡,避免迁移过程出现无元组输出的空自期,在系统资源紧张和数据流流速过大时,维持较少的中间元组数和较大输出速率。  相似文献   

11.
吴仁彪  刘超  屈景怡 《计算机应用》2018,38(5):1339-1345
针对我国目前航班延误平台的移植难、可扩展性差,无法适应民航高速发展所带来的大数据量存储的现状,设计了面向大数据的跨平台、高适用性与高扩展性的航班延误平台。该平台以大数据工具LeafLet为可视化载体,在地图界面实时显示航班轨迹并将轨迹数据加载至HBase数据库中,并且利用信息摘要算法(MD5)重新设计与优化航班数据表的行键,以解决其递增的飞行时间特性产生的"热点"问题;针对HBase过滤器多级查询的缺陷,提出了基于SolrCloud的关联查询算法,利用SolrCloud实现对行键与索引字段的分层存储,从而实现HBase二级快速索引;最后在HBase的历史航班数据与飞行计划数据基础上,构建基于Hive的海量航班信息数据仓库。实验结果显示,航班延误大数据平台的可扩展性与搭建的航班信息数据仓库可以满足民航对数据集中统一存储的需求,而多条件查询的响应速度与无二级索引的集群相比提高了上百倍,并且这种优势随着航班数据量的增长愈发明显。  相似文献   

12.
与传统关系数据库不同,数据流管理系统主要处理并发的连续查询.由于查询可能随时增删,所以其主要关注适合查询增删的并发连续查询优化,而不是单条查询优化.提出适合频繁增删查询环境下的数据流窗口连接优化算法.对于新注册的查询以类似最小生成树算法写出数据流的探测序列,然后在不更改其他查询探测序列顺序的情况下尽量合并,减少重复计算.注册或删除查询并不影响其他的查询计划,不需要执行繁琐的查询计划迁移.理论分析和实验证明,该算法简单,优化性能在可接受的范围内,尤其适合查询更新频率较高的系统.  相似文献   

13.
This paper looks at the processing of skyline queries on peer-to-peer (P2P) networks. We propose Skyframe, a framework for efficient skyline query processing in P2P systems, which addresses the challenges of quick response time, low network communication cost and query load balancing among peers. Skyframe consists of two querying methods: one is optimized for network communication while the other focuses on query response time. These methods are different in the way in which the query search space is defined. In particular, the first method uses a high dominating point that has a large dominating region to prune the search space to achieve a low cost in network communication. On the other hand, the second method relaxes the search space in order to allow parallel query processing to speed up query response. Skyframe achieves query load balancing by both query load conscious data space splitting/merging during the join/departure of nodes and dynamic load migration. We further show how to apply Skyframe to both the P2P systems supporting multi-dimensional indexing and the P2P systems supporting single-dimensional indexing. Finally, we have conducted extensive experiments on both real and synthetic data sets over two existing P2P systems: CAN (Ratnasamy in A scalable content-addressable network. In: Proceedings of SIGCOMM Conference, pp. 161–172, 2001) and BATON (Jagadish et al. in A balanced tree structure for peer-to-peer networks. In: Proceedings of VLDB Conference, pp. 661–672, 2005) to evaluate the effectiveness and scalability of Skyframe.  相似文献   

14.
In this paper we study queries over relational databases with integrity constraints (ICs). The main problem we analyze is OWA query answering, i.e., query answering over a database with ICs under open-world assumption. The kinds of ICs that we consider are inclusion dependencies and functional dependencies, in particular key dependencies; the query languages we consider are conjunctive queries and unions of conjunctive queries. We present results about the decidability of OWA query answering under ICs. In particular, we study OWA query answering both over finite databases and over unrestricted databases, and identify the cases in which such a problem is finitely controllable, i.e., when OWA query answering over finite databases coincides with OWA query answering over unrestricted databases. Moreover, we are able to easily turn the above results into new results about implication of ICs and query containment under ICs, due to the deep relationship between OWA query answering and these two classical problems in database theory. In particular, we close two long-standing open problems in query containment, since we prove finite controllability of containment of conjunctive queries both under arbitrary inclusion dependencies and under key and foreign key dependencies. The results of our investigation are very relevant in many research areas which have recently dealt with databases under an incomplete information assumption: e.g., data integration, data exchange, view-based information access, ontology-based information systems, and peer data management systems.  相似文献   

15.
在线课程下的自适应查询调度算法   总被引:1,自引:1,他引:0  
在线课程系统中,针对如何将查询请求充分映射到有限资源上这一热点问题,设计基于系统负载平衡的自适应查询处理器。该处理器综合考虑服务器、带宽等性能指标,建立由服务资源单元和远程查询消耗单元组成的基于资源负载平衡的查询期望代价矩阵,并结合利用Min-Min和Max-Min算法的优点,提出新的自适应查询调度算法(A-MM)。实验表明A-MM有较好的执行效率和平衡负载能力。  相似文献   

16.
The data migration problem is the problem of computing a plan for moving data objects stored on devices in a network from one configuration to another. Load balancing or changing usage patterns might necessitate such a rearrangement of data. In this paper, we consider the case where the objects are fixed-size and the network is complete. Our results are both theoretical and empirical. Our main theoretical results are (1) a polynomial time algorithm for finding a near-optimal migration plan in the presence of space constraints when a certain number of additional nodes is available as temporary storage, and (2) a 3/2-approximation algorithm for the case where data must be migrated directly to its destination. We also run extensive experiments on several algorithms for various data migration problems and show that empirically, many algorithms perform better in practice than their theoretical bounds suggest. We conclude that many of the algorithms we present are both practical and effective for data migration.  相似文献   

17.
时空数据库中的运动对象最近邻居查询是NN Queries中的新问题,基于TPR-TREE索引结构的TP NN Queries算法能较好地处理对象的时态特性,但会多次查询同一对象。本文利用运动对象的时空连续性对TP NN Queries算法进行改进,通过一次查询TPR-TREE索引获取所有候选NN对象与查询对象的距离变化曲线,进而得到NN对象集,减少了查询及时空运算的次数。本文最后给出了实验分析。  相似文献   

18.
When planning a new development (facility/service site), location decisions are always the major issue. In this paper we introduce a novel query capacity constraint MaxBRNN, which can solve the facility location selection problem efficiently.The MaxBRNN (maximizing BRNN) query is based on bichromatic reverse nearest neighbor (BRNN) query which uses the number of reverse nearest customers to model the influence of a facility location. The MaxBRNN query has been appreciated extensively in spatial database studies because of its great potential in real life applications, such as, markets decision, sensor network clustering and the design of GSM (global system for mobile communication). The existing researches mostly suppose that the service facility's capacity is unlimited. However, in real cases, facilities are inevitably constrained by designed capacities. For example, if the government wants to select a new place to set up an emergency center to share the existing centers’ patients, they need to know the current emergency centers’ capacity so that they can estimate the new center's scale. Thus, the capacity constrained MaxBRNN query is significantly important in planning a new development. As far as we know, the capacity constrained MaxBRNN query has not been studied yet, so, we formulate this problem, propose a basic solution and develop some efficient algorithms for the query.Our major contributions are as follows: (1) we propose a novel query capacity constraint MaxBRNN which can solve the facility location selection problem effectively and efficiently; (2) we develop a basic algorithm CCMB and two improved algorithms which can find out the optimal region in terms of building a new facility, maximize its impact and deal with the complicated reassignment when adding new facilities into the dataset; (3) we prove the algorithms’ effectiveness and efficiency by extensive experiments using both real and synthetic data sets.  相似文献   

19.
提出了一种新的蚁群算法优化的虚拟机放置策略ACA-VMP (Ant Colony Algorithm based virtual machine placement);ACA-VMP以云数据中心的总体能量消耗降低、服务质量最佳及减少虚拟机迁移次数为目标函数;根据蚁群优化算法,ACA-VMP采用了全局最优解和局部最优解信息素强度更新规则;全局最优解经过多次迭代后,蚂蚁路径的多次寻优,保证这个虚拟机放置优化策略的完成;局部信息素强度参数更新可以补充蚂蚁其他局部最优路径的寻找,这样也可以使得ACA-VMP虚拟机放置优化算法更快的接近全局最优解;仿真结果表明:ACA-VMP策略使得云数据中心的各类性能指标都可以改善,该实验结果对于其他企业构造节能云数据中心有很好的参考价值.  相似文献   

20.
针对现有的高维空间近似k近邻查询算法在数据降维时不考虑维度间关联关系的问题,首次提出了基于维度间关联规则进行维度分组降维的方法.该方法通过将相关联维度分成一组进行降维来减少数据信息的损失,同时针对Hash降维后产生的数据偏移问题,设置了符号位并基于符号位的特性对结果进行精炼;为提高维度间关联规则挖掘的效率,提出了一种新的基于UFP-tree的频繁项集挖掘算法.通过将数据映射成二进制编码来进行查询,有效地提高了近似k近邻查询效率,同时基于信息熵筛选编码函数,提高了编码质量;在查询结果精炼的过程,基于信息熵对候选集数据的编码位进行权重的动态设定,通过比较动态加权汉明距离和符号位碰撞次数返回最终近似k近邻结果.理论和实验研究表明,所提方法能够较好地处理高维空间中近似k近邻查询问题.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号