共查询到20条相似文献,搜索用时 15 毫秒
1.
大规模集群上的并行计算软件需要具备处理部分节点、网络等失效的容错能力,也需要具有易于管理、维护、移植和可扩展的服务能力。针对星形计算模型,研究和开发了一套并行计算框架。利用调度节点内部的可变粒度分解器、相关队列等方法,实现了全系统容错,且具有较好的易用性、可移植性和可扩展性。系统目前可以实现300TFlops计算能力下连续运行超过150h,而且还具有进一步的可扩展能力。 相似文献
2.
The performance of a conventional parallel application is often degraded by load‐imbalance on heterogeneous clusters. Although it is simple to invoke multiple processes on fast processing elements to alleviate load‐imbalance, the optimal process allocation is not obvious. Kishimoto and Ichikawa presented performance models for high‐performance Linpack (HPL), with which the sub‐optimal configurations of heterogeneous clusters were actually estimated. Their results on HPL are encouraging, whereas their approach is not yet verified with other applications. This study presents some enhancements of Kishimoto's scheme, which are evaluated with four typical scientific applications: computational fluid dynamics (CFD), finite‐element method (FEM), HPL (linear algebraic system), and fast Fourier transform (FFT). According to our experiments, our new models (NP‐T models) are superior to Kishimoto's models, particularly when the non‐negative least squares method is used for parameter extraction. The average errors of the derived models were 0.2% for the CFD benchmark, 2% for the FEM benchmark, 1% for HPL, and 28% for the FFT benchmark. This study also emphasizes the importance of predictability in clusters, listing practical examples derived from our study. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献
3.
Parallel loop self‐scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self‐scheduling schemes have been proposed as applicable to heterogeneous cluster computing environments. In recent years, multicore computers have been widely included in cluster systems. However, previous researches into parallel loop self‐scheduling did not consider certain aspects of multicore computers; for example, it is more appropriate for shared‐memory multiprocessors to adopt Open Multi‐Processing (OpenMP) for parallel programming. In this paper, we propose a performance‐based approach using hybrid OpenMP and MPI parallel programming, which partition loop iterations according to the performance weighting of multicore nodes in a cluster. Because iterations assigned to one MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献
4.
Csar Gmez‐Martín Miguel A. Vega‐Rodríguez Jos‐Luis Gonzlez‐Snchez 《Concurrency and Computation》2015,27(17):5436-5459
Today, in an energy‐aware society, job scheduling is becoming an important task for computer engineers and system analysts that may lead to a performance per Watt trade‐off of computing infrastructures. Thus, new algorithms, and a simulator of computing environments, may help information and communications technology and data center managers to make decisions with a solid experimental basis. There are several simulators that try to address performance and, somehow, estimate energy consumption, but there are none in which the energy model is based on benchmark data that have been countersigned by independent bodies such as the Standard Performance Evaluation Corporation. This is the reason why we have implemented a performance and energy‐aware scheduling (PEAS) simulator for high‐performance computing. Furthermore, to evaluate the simulator, we propose an implementation of the non‐dominated sorting genetic algorithm‐II (NSGA‐II) algorithm, a fast and elitist multiobjective genetic algorithm, for the resource selection. With the help of the PEAS simulator, we have studied if it is possible to provide an intelligent job allocation policy that may be able to save energy and time without compromising performance. The results of our simulations show a great improvement in response time and power consumption. In most of the cases, NSGA‐II performs better than other ‘intelligent’ algorithms like multiobjective heterogeneous earliest finish time and clearly outperforms the first‐fit algorithm. We demonstrate the usefulness of the simulator for this type of studies and conclude that the superior behavior of multiobjective algorithms makes them recommended for use in modern scheduling systems. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
5.
Parallel computation model is an abstraction for the performance characteristics of parallel computers, and should evolve with the development of computational infrastructure. The heterogeneous CPU/Graphics Processing Unit (GPU) systems have been and will be important platforms for scientific computing, which introduces an urgent demand for new parallel computation models targeting this kind of supercomputers. In this research, we propose a parallel computation model called HLognGP to abstract the computation and communication features of heterogeneous platforms like TH‐1A. All the substantial parameters of HLognGP are in vector form and deal with the new features in GPU clusters. A simplified version HLog3GP of the proposed model is mapped to a specific GPU cluster and verified with two typical benchmarks. Experimental results show that HLog3GP outperforms the other two evaluated models and can well model the new particularities of GPU clusters. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
6.
对大规模科学数据的分析与理解最终要依赖于可视化手段来完成,数据的存储与组织方式是影响可视化效率的关键因素,特别是集成大量可视化与数据分析功能的大规模并行分布式可视化与数据分析系统、可视化流程及数据流网络中所需要的优化手段对数据的存储与组织提出更高的要求.通过分析针对大规模数据可视化所采用的方式方法总结对数据及相关信息的存储与组织需求,借助高效的数据组织方式加快科学可视化过程. 相似文献
7.
8.
在无线传感器网络中,容错性和高效性是衡量网络性能的重要指标,在应用中如何同时兼顾这两个因素一直是算法研究的关键问题。针对多事件的监测和感知处理的应用,特别是当事件的感知区域发生重叠情况时,提出了具有能量效率的容错事件簇算法AECA。算法中首先给出了既考虑节点剩余能量又考虑节点容错性的分布式的簇头选举方法;然后研究了事件簇之间具有重叠区域的情况下节点处理的策略。通过仿真实验表明,算法AECA能有效地提高传感器网络的容错性和生存周期,并具有可靠性和可扩展性的特点。 相似文献
9.
大规模数值模拟数据对可视化分析提出了挑战,I/O是影响可视化交互性能的重要因素.HDF5是科学计算领域广泛采用的存储格式,介绍了HDF5的抽象数据模型、数据读写流程,并使用典型数值模拟数据测试了HDF5的读性能.测试发现HDF5的数据集定位开销较大.根据数值模拟数据的数据块以整数有规律编号的特点,通过在HDF5中增加数据块视图对象来提高读性能.测试表明,该方法可显著加速数据的读取性能. 相似文献
10.
Rodrigo da Rosa Righi Cristiano Andr da Costa Vinicius Facco Rodrigues Gustavo Rostirolla 《Concurrency and Computation》2016,28(5):1548-1571
A key characteristic of cloud computing is elasticity, automatically adjusting system resources to an application's workload. Both reactive and horizontal approaches represent traditional means to offer this capability, in which rule‐condition‐action statements and upper and lower thresholds occur to instantiate or consolidate compute nodes and virtual machines. Although elasticity can be beneficial for many HPC (high‐performance computing) scenarios, it also imposes significant challenges in the development of applications. In addition to issues related to how we can incorporate this new feature in such applications, there is a problem associated with the performance and resource pair and, consequently, with energy consumption. Further exploring this last difficulty, we must be capable of analyzing elasticity effectiveness as a function of employed thresholds with clear metrics to compare elastic and non‐elastic executions properly. In this context, this article explores elasticity metrics in two ways: (i) the use of a cost function that combines application time with different energy models; (ii) the extension of speedup and efficiency metrics, commonly used to evaluate parallel systems, to cover cloud elasticity. To accomplish (i) and (ii), we developed an elasticity model known as AutoElastic, which reorganizes resources automatically across synchronous parallel applications. The results, obtained with the AutoElastic prototype using the OpenNebula middleware, are encouraging. Considering a CPU‐bound application, an upper threshold close to 70% was the best option for obtaining good performance with a non‐prohibitive elasticity cost. In addition, the value of 90% for this threshold was the best option when we plan an efficiency‐driven execution. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
11.
基于无线传感器网络中事件簇容错和能效的要求,本文给出多事件簇数据容错模式MED-FT。该模式首先利用剩余能量和事件可信度的积值,给出分布式簇头节点的选举方法;然后,提出了多事件簇重叠区域下节点的处理策略,并且建立了事件簇的数据容错补偿机制。仿真实验表明,具有数据容错模式的多事件簇不仅能获得更长的网络生存周期,并且能获得更好的数据正确性和容错性能。 相似文献
12.
An increasing number of enterprise applications are intensive in their consumption of IT but are infrequently used. Consequently, either organizations host an oversized IT infrastructure or they are incapable of realizing the benefits of new applications. A solution to the challenge is provided by the large‐scale computing infrastructures of clouds and grids, which allow resources to be shared. A major challenge is the development of mechanisms that allow efficient sharing of IT resources. Market mechanisms are promising, but there is a lack of research in scalable market mechanisms. We extend the multi‐attribute combinatorial exchange mechanism with greedy heuristics to address the scalability challenge. The evaluation shows a trade‐off between efficiency and scalability. There is no statistical evidence for an influence on the incentive properties of the market mechanism. This is an encouraging result as theory predicts heuristics to ruin the mechanism's incentive properties. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
13.
F. Fdez‐Riverola D. Glez‐Peña H. López‐Fernández M. Reboiro‐Jato J.R. Méndez 《Software》2012,42(8):1015-1036
This paper presents AIBench (SING group, Ourense, Spain), a JAVA desktop application framework mainly focused on scientific software development, with the goal of improving the productivity of research groups. Following the MVC design pattern, the programmer is able to develop applications using only three types of concepts: operations, data‐typesand views. The framework provides the rest of the functionality present in typical scientific applications, including user parameter requests, logging facilities, multithreading execution, experiment repeatability and graphic user interface generation, among others. The proposed framework is implemented following a plugin‐based architecture, which also allows assembling new applications by the reuse of modules from past development projects. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献
14.
15.
Jahanzeb Sherwani Nosheen Ali Nausheen Lotia Zahra Hayat Rajkumar Buyya 《Software》2004,34(6):573-590
Clusters of computers have emerged as mainstream parallel and distributed platforms for high‐performance, high‐throughput and high‐availability computing. To enable effective resource management on clusters, numerous cluster management systems and schedulers have been designed. However, their focus has essentially been on maximizing CPU performance, but not on improving the value of utility delivered to the user and quality of services. This paper presents a new computational economy driven scheduling system called Libra, which has been designed to support allocation of resources based on the users' quality of service requirements. It is intended to work as an add‐on to the existing queuing and resource management system. The first version has been implemented as a plugin scheduler to the Portable Batch System. The scheduler offers market‐based economy driven service for managing batch jobs on clusters by scheduling CPU time according to user‐perceived value (utility), determined by their budget and deadline rather than system performance considerations. The Libra scheduler has been simulated using the GridSim toolkit to carry out a detailed performance analysis. Results show that the deadline and budget based proportional resource allocation strategy improves the utility of the system and user satisfaction as compared with system‐centric scheduling strategies. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献
16.
Rajkumar Buyya 《Software》2000,30(7):723-739
Workstation/PC clusters have become a cost‐effective solution for high performance computing. C‐DAC's PARAM 10000 (or OpenFrame, internal code name) is a large cluster of high‐performance workstations interconnected through low‐latency and high bandwidth networks. The management and control of such a huge system is a tedious and challenging task since workstations/PCs are typically designed to work as a standalone system rather than part of a cluster. We have designed and developed a tool called PARMON that allows effective monitoring and control of large clusters. It supports the monitoring of critical system resource activities and their utilization at three different levels: entire system, node and component level. It also allows the monitoring of multiple instances of the same component; for instance, multiple processors in SMP type cluster nodes. PARMON is a portable, flexible, interactive, scalable, location‐transparent, and comprehensive environment based on client–server technology. The major components of PARMON are parmon‐server—system resource activities and utilization information provider and parmon‐client—a GUI based client responsible for interacting with parmon‐server and users for data gathering in real‐time and presenting information graphically for visualization. The client is developed as a Java application and the server is developed as a multithreaded server using C and POSIX/Solaris threads since Java does not support interfaces to access system internals. PARMON is regularly used to monitor PARAM 10000 supercomputer, a cluster of 48+ Ultra‐4 workstations powered by the Solaris operating system. The recent popularity of Beowulf‐class clusters (dedicated Linux clusters) in terms of price–performance ratio has motivated us to port PARMON to Linux (accomplished by porting system dependent portions of parmon‐server). This enables management/monitoring of both Solaris and Linux‐based clusters (federated clusters) through a single user interface. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献
17.
To support Web clusters with efficient dispatching mechanisms and policies, we propose a light‐weight TCP connection transfer mechanism, TCP Rebuilding, and use it to develop a content‐aware request dispatching platform, LVS‐CAD, in which the request dispatcher can extract and analyze the content in requests and then dispatch each request by its content or type of service requested. To efficiently support HTTP/1.1 persistent connection in Web clusters, request scheduling should be performed per request rather than per connection. Consequently, multiple TCP Rebuilding, as an extension to normal TCP Rebuilding, is proposed and implemented. On this platform, we also devise fast TCP module handshaking to process the handshaking between clients and the request dispatcher in the IP layer instead of in the TCP layer for faster response times. Furthermore, we also propose content‐aware request distribution policies that consider cache locality and various types of costs for dispatching requests in this platform, which makes the resourceutilization of Web servers more effective. Experimental results of a practical implementation on Linux show that the proposed system, mechanisms, and policies can effectively improve the performance of Web clusters. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献
18.
The small gain condition is sufficient for input‐to‐state stability (ISS) of interconnected systems. However, verification of the small gain condition requires large amount of computations in the case of a large size of the system. To facilitate this procedure, we aggregate the subsystems and the gains between the subsystems that belong to certain interconnection patterns (motifs) by using three heuristic rules. These rules are based on three motifs: sequentially connected nodes, nodes connected in parallel, and almost disconnected subgraphs. Aggregation of these motifs keeps the structure of the mutual influences between the subsystems in the network. Furthermore, fulfillment of the reduced small gain condition implies ISS of the large network. Thus, such reduction allows to decrease the number of computations needed to verify the small gain condition. Finally, an ISS‐Lyapunov function for the large network can be constructed using the reduced small gain condition. Applications of these rules is illustrated on an example. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献
19.
Message logging is an attractive solution to provide fault tolerance for message‐passing applications because it is more scalable than coordinated checkpointing. Sender‐based message logging is a well‐known optimization that allows the saving of message payload in the sender memory. Thus, only message reception events have to be logged reliably by using an event logger. This paper proposes solutions to further improve message logging protocol scalability. In existing works on message logging, the event logger has always been considered as a centralized process. We propose a distributed event logger that takes advantage of multi‐core processors that are to be executed in parallel with application processes, leveraging the volatile memory of the nodes to save events reliably. We also propose the combination of our distributed event logger and O2P, an active optimistic message logging protocol using a gossip‐based protocol to disseminate information on new stable events. Our distributed event logger and O2P are implemented in the Open MPI library. Our results show the following: (i) distributed event logging improves message logging protocol scalability and (ii) using O2P with a distributed event logger provides an efficient and scalable fault‐tolerant solution for message‐passing applications. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献
20.
In this paper, we present Jcluster, an efficient Java parallel environment that provides some critical services, in particular automatic load balancing and high‐performance communication, for developing parallel applications in Java on a large‐scale heterogeneous cluster. In the Jcluster environment, we implement a task scheduler based on a transitive random stealing (TRS) algorithm. Performance evaluations show that the scheduler based on TRS can make any idle node obtain a task from another node with much fewer stealing times than random stealing (RS), which is a well‐known dynamic load‐balancing algorithm, on a large‐scale cluster. In the performance aspects of communication, with the method of asynchronously multithreaded transmission, we implement a high‐performance PVM‐like and MPI‐like message‐passing interface in pure Java. The evaluation of the communication performance is conducted among the Jcluster environment, LAM‐MPI and mpiJava on LAM‐MPI based on the Java Grande Forum's pingpong benchmark. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献