共查询到20条相似文献,搜索用时 15 毫秒
1.
How to reduce power consumption of data centers has received worldwide attention. By combining the energy-aware data placement policy and locality-aware multi-job scheduling scheme, we propose a new multi-objective bi-level programming model based on MapReduce to improve the energy efficiency of servers. First, the variation of energy consumption with the performance of servers is taken into account; second, data locality can be adjusted dynamically according to current network state; last but not least, considering that task-scheduling strategies depend directly on data placement policies, we formulate the problem as an integer bi-level programming model. In order to solve the model efficiently, specific-design encoding and decoding methods are introduced. Based on these, a new effective multi-objective genetic algorithm based on MOEA/D is proposed. As there are usually tens of thousands of tasks to be scheduled in the cloud, this is a large-scale optimization problem and a local search operator is designed to accelerate convergent speed of the proposed algorithm. Finally, numerical experiments indicate the effectiveness of the proposed model and algorithm. 相似文献
2.
Bogdan NicolaeAuthor VitaeGabriel AntoniuAuthor Vitae Luc BougéAuthor VitaeDiana MoiseAuthor Vitae Alexandra Carpen-AmarieAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(2):169-184
As data volumes increase at a high speed in more and more application fields of science, engineering, information services, etc., the challenges posed by data-intensive computing gain increasing importance. The emergence of highly scalable infrastructures, e.g. for cloud computing and for petascale computing and beyond, introduces additional issues for which scalable data management becomes an immediate need. This paper makes several contributions. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, we highlight the potentially large benefits of using versioning in this context. Second, based on these principles, we propose a set of versioning algorithms, both for data and metadata, that enable a high throughput under concurrency. Finally, we implement and evaluate these algorithms in the BlobSeer prototype, that we integrate as a storage backend in the Hadoop MapReduce framework. We perform extensive microbenchmarks as well as experiments with real MapReduce applications: they demonstrate that applying the principles defended in our approach brings substantial benefits to data intensive applications. 相似文献
3.
多线程程序设计的概念与应用 总被引:10,自引:0,他引:10
多线程程序设计技术近年来得到迅速发展和日益广泛的应用。本文介绍了线程(Thread)的概念、多线程程序设计(MultithreadedProgramming)的基本思想及其应用模式,讨论了POSIX标准中Pthread系统的用户级实现。最后以一个服务器程序的实例,说明了多线程程序的设计方法。 相似文献
4.
WEI Jiang-shu 《数字社区&智能家居》2008,(17)
《数据结构》是计算机专业的核心课程,通过对《数据结构》教学的研究与探讨,提出了一些可供参考的教学思路,在近年来的教学实践中,取得了较好的效果。 相似文献
5.
The current informal semantics of the simple concurrent object-oriented programming (SCOOP) mechanism for Eiffel is described.
We construct and discuss a model using the process algebra CSP. This model gives a more formal semantics for SCOOP than existed
previously.
We implement the model mechanically via a new tool called CSPsim. We examine two semantic variations of SCOOP: when and how
far to pass locks, and when to wait for child calls to complete. We provide evidence that waiting for child calls to complete
both unnecessarily reduces parallelism without any increase in safety and increases deadlocks involving callbacks.
Through the creation and analysis of the model, we identify a number of ambiguities relating to reservations and the underlying
run-time system and propose means to resolve them.
M. J. Butler 相似文献
6.
N.H. Gehani 《Computer Languages, Systems and Structures》1982,7(1):21-23
Implementation of Ada's parallel tasks on a multicomputer architecture requires additional communication and naming overhead because tasks can operate on shared data via global variables and pointers. This increases the complexity of implementing Ada and has a negative impact on program understandability. 相似文献
7.
《Concurrency and Computation》2017,29(12)
Energy efficiency is becoming increasingly important for computing systems, in particular for large scale High Performance Computing (HPC) facilities. In this work, we evaluate, from a user perspective, the use of Dynamic Voltage and Frequency Scaling techniques, assisted by the power and energy monitoring capabilities of modern processors to tune applications for energy efficiency. We run selected kernels and a full HPC application on 2 high‐end processors widely used in the HPC context, namely, an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate the available trade‐offs between energy‐to‐solution and time‐to‐solution, attempting a function‐by‐function frequency tuning. We finally estimate the benefits obtainable running the full code on an HPC multi‐GPU node, with respect to default clock frequency governors. We instrument our code to accurately monitor power consumption and execution time without the need of any additional hardware, and we enable it to change CPUs and GPUs clock frequencies while running. We analyze our results on the different architectures using a simple energy‐performance model and derive a number of energy saving strategies, which can be easily adopted on recent high‐end HPC systems for generic applications. 相似文献
8.
高并发集群监控系统中内存数据库的设计与应用 总被引:1,自引:0,他引:1
在具有大量并发连接的高并发集群监控系统中,传统磁盘数据库由于内外存交换开销过大,无法支撑数据的实时存储与处理,因此大量实时系统都选择采用内存数据库作为数据支撑模块。从介绍内存数据库的关键技术点出发,通过引入虚拟影子内存和粗粒度意向锁来分别改进内存数据库的数据组织和并发控制,设计实现了一个用于支撑高并发集群监控系统的高效内存数据库模块,并且研究了其在实际系统中的应用情况。 相似文献
9.
Skyline查询是一个典型的多目标优化查询,在多目标优化、数据挖掘等领域有着广泛的应用。现有的Skyline查询处理算法大都假定数据集存放在单一数据库服务器中,查询处理算法通常也被设计成针对单一服务器的串行算法。随着数据量的急剧增长,特别是在大数据背景下,传统的基于单机的串行Skyline算法已经远远不能满足用户的需求。基于流行的分布式并行编程框架MapReduce,研究了适用于大数据集的并行Skyline查询算法。针对影响MapReduce计算的因素,对现有基于角度的划分策略进行了改进,提出了Balanced Angular划分策略;同时,为了减少Reduce过程的计算量,提出了在Map端预先进行数据过滤的策略。实验结果显示所提出的Skyline查询算法能显著提升系统性能。 相似文献
10.
详细分析数据库的并发机制,以及使用DELPHI的数据导航条与数据集组件来解决多客户并发的问题. 相似文献
11.
Power optimization in data centers requires either to raise the temperature of the cold air supplied by the air conditioner or to reduce the power consumption of the servers by careful workload allocation. Both the approaches must satisfy a number of constraints, mainly temperature at the server intakes, which should not exceed a critical threshold, and capacity and response time requirements. To tackle these issues, we formulate an optimization problem in which the total data center power has to be minimized subject to the constraints imposed by performance requirements and thermal specifications of the servers. At the heart of the optimization problem is an analytical model which takes into account the complex relationship between the performance of servers, the allocation of workloads, the temperature of the air supplied by the conditioning unit and the heat distribution in the server room. For the easy evaluation of this relationship, we adopt a simplified yet accurate heat flow model, which we extensively validate using the data collected in several months of Computational Fluid Dynamics simulations. Extensive tests on 90 randomly generated scenarios suggest that the proposed coupled thermal-performance model can lead to a power saving of 21%. Finally, a case study is presented which is based on 1164 workload traces collected from the data center of a large telco operator. The cooling-aware workload placement suggests a saving of 8% with respect to a performance-only based strategy. 相似文献
12.
现有针对MapReduce的负载均衡调度的研究均未考虑中间数据的分布特点及网络传输的开销,导致额外的网络传输代价与系统效率的下降。为解决上述问题,提出了一种数据本地性感知的负载均衡策略。充分利用YARN中资源管理的新特性,在Map阶段对内存数据溢写的同时进行统计以获取数据分布,根据数据分布情况及各节点的计算能力进行任务调度,减少网络传输开销的同时尽量保证各节点的负载平衡。此外,通过引入细粒度分区与分区的自适应分裂策略,进一步提高在数据倾斜时调度策略的性能。对比实验结果表明,提出的负载均衡调度策略能有效提升性能,同时较好地降低网络总开销。 相似文献
13.
肖承勇 《数字社区&智能家居》2007,1(1):202-203
目前网络化程度越来越高,但是,在很多情况下我们的客户端应用程序仍然无法获得网络连接.或者需要显示地进行脱机工作。偶尔连接的智能客户端在无法连接到网络资源时,能够让用户继续工作,然后在以后某个时间能够获得网络连接后再同步数据。本文讨论了设计和生成偶尔连接到网络的智能客户端应用程序所面临的问题,以及解决方案。这些解决方案包括网络连接状态的监测、客户端数据的缓存、数据的同步和数据并发处理。 相似文献
14.
15.
John C. Cavouras 《Software》1983,13(9):809-815
Ways to implement coroutines in a block-structured language with no multitasking facilities are presented. Coroutines are implemented as procedures. The reactivation points are kept in global variables, one variable for each procedure. Local variables whose values are required on re-entry are stored as STATIC objects. The variables or data of re-entrant coroutines are stored in an event list associated with each such coroutine. A procedure with several entries is a convenient mechanism to trap the primitive calls issued by the coroutines. This procedure returns to the master program by using a non-local GOTO. The implementation of the above in PL/I and C is described and a comparison is made with sequential Pascal. Ada includes constructs which satisfy most requirements. 相似文献
16.
烟草行业期望通过大数据应用提升企业核心竞争力。本文通过对企业业务与战略目标的分析,找出烟草商业企业大数据的所在与应用需求,并根据企业大数据类型与大数据应用的特点,设计出符合企业需求的大数据处理架构并逐步实现,探索未来烟草大数据中心的建设之路。 相似文献
17.
《计算机科学》2025,52(1)
现代NewSQL数据库为了提供数据的高可用性,通常会为数据提供多个副本,以便在某个副本不可用时,可以从其他的副本中获取数据。而在数据多副本的情况下,又需要考虑副本间的数据一致性问题,即在某一时刻不同客户端读取某个数据时得到的结果应该是相同的,因此引入了事务处理机制。在一个包含多个写操作的交互式事务处理过程中,由于数据存在多个副本,因此每个写入操作需要对所有的主备副本进行写入操作。然而主备副本通常分散在不同的机器上,因此会引入写远端副本的时延,其最终将会增大整个事务的处理时延。针对该问题,提出了数据协同持久化的方案,其主要思想是让客户端在本地缓存事务的写操作日志,在最终提交事务时,客户端首先将事务中的写操作日志进行持久化,并将该日志发送给事务的协调者节点,让协调者进行日志数据的分发处理,从而达到两者协同持久化事务数据的目的。实验结果表明,相较于同步持久化方案,协同持久化方案不仅能降低交互式事务处理的时延,还能提高约38%左右的系统极限吞吐率。 相似文献
18.
k-means算法是一种 最常用的基于划分的聚类算法。传统的集中式k-means算法已不能适应当前呈爆炸式增长的数据规模,设计分布式k-means算法成为了目前亟需解决的问题。现有分布式k-means算法基于MapReduce计算框架且没有考虑初始聚类中心的影响。由于每个MapReduce任务均需要读写分布式文件系统,导致MapReduce不能有效表达多个任务之间的依赖关系,因此提出了一种基于数据流的计算框架,该框架建立在MapReduce之上,将数据处理过程按照数据流图建模。在该框架的基础上,提出了一种高效的k-means算法,它采用基于多次采样的初始聚类中心选取方法来实现负载均衡及减少迭代次数。实验结果表明,该算法的可扩展性较好,且效率比现有算法高。 相似文献
19.
介绍了PB中数据窗口的Update Properties各项的含义和作用,利用这些属性可以设置数据窗口关联的数据库表是否能被更新、并发控制及键的更新方式,使得用户不仅懂得设置还知道其所以然。 相似文献
20.
Nowadays, we are witnessing the fast production of very large amount of data, particularly by the users of online systems on the Web. However, processing this big data is very challenging since both space and computational requirements are hard to satisfy. One solution for dealing with such requirements is to take advantage of parallel frameworks, such as MapReduce or Spark, that allow to make powerful computing and storage units on top of ordinary machines. Although these key-based frameworks have been praised for their high scalability and fault tolerance, they show poor performance in the case of data skew. There are important cases where a high percentage of processing in the reduce side ends up being done by only one node.In this paper, we present FP-Hadoop, a Hadoop-based system that renders the reduce side of MapReduce more parallel by efficiently tackling the problem of reduce data skew. FP-Hadoop introduces a new phase, denoted intermediate reduce (IR), where blocks of intermediate values are processed by intermediate reduce workers in parallel. With this approach, even when all intermediate values are associated to the same key, the main part of the reducing work can be performed in parallel taking benefit of the computing power of all available workers.We implemented a prototype of FP-Hadoop, and conducted extensive experiments over synthetic and real datasets. We achieved excellent performance gains compared to native Hadoop, e.g. more than 10 times in reduce time and 5 times in total execution time. 相似文献