首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this work, we propose new techniques to analyze the behavior, the performance, and specially the scalability of High Performance Computing (in short, HPC) applications on different computing architectures. Our final objective is to test applications using a wide range of architectures (real or merely designed) and scaling it to any number of nodes or components. This paper presents a new simulation framework, called SIMCAN, for HPC architectures. The main characteristic of the proposed simulation framework is the ability to be configured for simulating a wide range of possible architectures that involve any number of components. SIMCAN is developed to simulate complete HPC architectures, but putting special emphasis on the storage and network subsystems. The SIMCAN framework can handle complete components (nodes, racks, switches, routers, etc.), but also key elements of the storage and network subsystems (disks, caches, sockets, file systems, schedulers, etc.). We also propose several methods to implement the behavior of HPC applications. Each method has its own advantages and drawbacks. In order to evaluate the possibilities and the accuracy of the SIMCAN framework, we have tested it by executing a HPC application called BIPS3D on a hardware-based computing cluster and on a modeled environment that represent the real cluster. We also checked the scalability of the application using this kind of architecture by simulating the same application with an increased number of computing nodes.  相似文献   

2.
集群计算机系统的运算性能跨入百万亿次、千万亿次时代,节能降耗已成为集群计算机系统必须面对的重要问题之一,本文从系统级节能的角度出发,结合神威高性能集群计算机系统的系统监测、作业管理、IPMI带外电源管理和TuxOnIce系统休眠技术,设计并实现了基于资源调度的集群节能系统,通过对空闲结点的关机或休眠,能够有效地降低集群系统空闲时的能耗,使神威高性能集群计算机系统成为真正的绿色计算机。  相似文献   

3.
高性能计算(high performance computing,HPC)机群具有单一系统和分布式系统的双重特点,从而对机群的安全性提出了新的挑战。根据高性能计算机群的安全现状和需求,提出了一种适用于高性能计算机群的分布式强制访问控制模型;根据该模型设计了一个基于单节点的强制访问控制系统SE Linux,实现了高性能计算机群访问控制系统框架,并搭建了一个原型系统。最后,对高性能计算机群强制访问控制技术的可行性进行了分析和验证。分析结果表明,高性能计算机群分布式强制访问控制技术在功能上能够满足高性能计算机群的安全需求,对系统的计算和带宽的消耗也在可接受的范围内。  相似文献   

4.
Prospects for applying virtualization technology in high-performance computations on the x64 systems are studied. Principal reasons for performance degradation when parallel programs are running in virtual environments are considered. The KVM/QEMU and Palacios virtualization systems are considered in detail, with the HPC Challenge and NAS Parallel Benchmarks used as benchmarks. A modern computing cluster built on the Infiniband high-speed interconnect is used in testing. The results of the study show that, in general, virtualization is reasonable for a wide class of high-performance applications. Fine tuning of the virtualization systems involved made it possible to reduce overheads from 10–60% to 1–5% on the majority of tests from the HPC Challenge and NAS Parallel Benchmarks suites. The main bottlenecks of virtualization systems are reduced performance of the memory system (which is critical only for a narrow class of problems), costs associated with hardware virtualization, and the increased noise caused by the host operating system and hypervisor. Noise can have a negative effect on performance and scalability of fine-grained applications (applications with frequent small-scale communications). The influence of noise significantly increases as the number of nodes in the system grows.  相似文献   

5.
Cluster‐based solutions are being widely adopted for implementing flexible, scalable, low‐cost and high‐performance web server platforms. One of the main difficulties to implement these platforms is the correct dimensioning of the cluster size, so as to satisfy variable and peak demand periods. In this context, virtualization is being adopted by many organizations as a solution not only to provide service elasticity, but also to consolidate server workloads, and improve server utilization rates. A virtualized web server can be dynamically adapted to the client demands by deploying new virtual nodes when the demand increases, and powering off and consolidating virtual nodes during periods of low demand. Furthermore, the resources from the in‐house infrastructure can be complemented with a cloud provider (cloud bursting), so that peak demand periods can be satisfied by deploying cluster nodes in the external cloud, on an on‐demand basis. In this paper, we analyze the scalability of hybrid virtual infrastructures for two different distributed web server cluster implementations: a simple web cluster serving static files and a multi‐tier web server platform running the CloudStone benchmark. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

6.
A complete and efficient CUDA-sharing solution for HPC clusters   总被引:1,自引:0,他引:1  
In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than nodes, as well as permits a single node to exploit the whole set of GPUs installed in the cluster. In our proposal, CUDA applications can seamlessly interact with any GPU in the cluster, independently of its physical location. Thus, GPUs can be either distributed among compute nodes or concentrated in dedicated GPGPU servers, depending on the cluster administrator’s policy. This proposal leads to savings not only in space but also in energy, acquisition, and maintenance costs. The performance evaluation in this paper with a series of benchmarks and a production application clearly demonstrates the viability of this proposal. Concretely, experiments with the matrix–matrix product reveal excellent performance compared with regular executions on the local GPU; on a much more complex application, the GPU-accelerated LAMMPS, we attain up to 11x speedup employing 8 remote accelerators from a single node with respect to a 12-core CPU-only execution. GPGPU service interaction in compute nodes, remote acceleration in dedicated GPGPU servers, and data transfer performance of similar GPU virtualization frameworks are also evaluated.  相似文献   

7.
This paper presents a convergence of distributed key‐value storage systems in clouds and supercomputers. It specifically presents ZHT, a zero‐hop distributed key‐value store system, which has been tuned for the requirements of high‐end computing systems. ZHT aims to be a building block for future distributed systems, such as parallel and distributed file systems, distributed job management systems, and parallel programming systems. ZHT has some important properties, such as being lightweight, dynamically allowing nodes join and leave, fault tolerant through replication, persistent, scalable, and supporting unconventional operations such as append, compare and swap, callback in addition to the traditional insert/lookup/remove. We have evaluated ZHT's performance under a variety of systems, ranging from a Linux cluster with 64 nodes, an Amazon EC2 virtual cluster up to 96 nodes, to an IBM Blue Gene/P supercomputer with 8K nodes. We compared ZHT against other key‐value stores and found it offers superior performance for the features and portability it supports. This paper also presents several real systems that have adopted ZHT, namely, FusionFS (a distributed file system), IStore (a storage system with erasure coding), MATRIX (distributed scheduling), Slurm++ (distributed HPC job launch), Fabriq (distributed message queue management); all of these real systems have been simplified because of key‐value storage systems and have been shown to outperform other leading systems by orders of magnitude in some cases. It is important to highlight that some of these systems are rooted in HPC systems from supercomputers, while others are rooted in clouds and ad hoc distributed systems; through our work, we have shown how versatile key‐value storage systems can be in such a variety of environments. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

8.
提出了一种根据无线传感网络流量自动调节节点睡眠-活动时间比例的MAC协议-ATMAC,在无线传感器网络TMAC协议的基础上,以低能耗、低延迟为目标,主要采用自适应、多级别的占空比及自适应竞争窗口,数据优先级队列使节点在流量较小时能更多地处于睡眠状态以节省能量,而在流量较大时,传输所涉及的节点可相对长时间地进入活动状态,且大流量和小流量节点所采取的占空比可以不同,从而节省低流量节点用于空闲侦听的能耗,降低数据传输的延迟,增大网络的吞吐量。仿真结果显示新协议在能量消耗、数据延迟等方面要超过TMAC。  相似文献   

9.
With the recent emergence of cloud computing based services on the Internet, MapReduce and distributed file systems like HDFS have emerged as the paradigm of choice for developing large scale data intensive applications. Given the scale at which these applications are deployed, minimizing power consumption of these clusters can significantly cut down operational costs and reduce their carbon footprint—thereby increasing the utility from a provider’s point of view. This paper addresses energy conservation for clusters of nodes that run MapReduce jobs. The algorithm dynamically reconfigures the cluster based on the current workload and turns cluster nodes on or off when the average cluster utilization rises above or falls below administrator specified thresholds, respectively. We evaluate our algorithm using the GridSim toolkit and our results show that the proposed algorithm achieves an energy reduction of 33% under average workloads and up to 54% under low workloads.  相似文献   

10.
本文介绍了一种新型的并行程序设计语言HPC+ + 语言.在由多个结点(共享存储的多处理器)互连起来组成的网络环境下,HPC+ + 不仅支持结点间的并行,还支持结点内的线索并行.另外,利用COBRA 的IDL技术,用户可以对远程对象的成员函数进行调用.本文还对它的并行标准模版库进行了描述.  相似文献   

11.
This special issue on the Pervasive Nature of HPC (PN-HPC) collects an extension of the most valuable works presented at the sixth Workshop on Models, Algorithms and Methodologies for Hybrid Parallelism in New HPC Systems (MAMHYP-22), held in Gdansk (Poland) in September 2022, jointly with the 14th conference on Parallel Processing and Applied Mathematics (PPAM-22). New original papers related to the workshop themes are also included. The final aim is to provide a glimpse of the current state of knowledge related to the development of efficient methodologies and algorithms for HPC systems with multiple forms of parallelism.  相似文献   

12.
Software rejuvenation is a preventive and proactive fault management technique that is particularly useful for counteracting the phenomenon of software aging, aimed at cleaning up the system internal state to prevent the occurrence of future failure. The increasing interest in combing software rejuvenation with cluster systems has given rise to a prolific research activity in recent years. However, so far there have been few reports on the dependency between nodes in cluster systems when software rejuvenation is applied. This paper investigates the software rejuvenation policy for cluster computing systems with dependency between nodes, and reconstructs an stochastic reward net model of the software rejuvenation in such cluster systems. Simulation experiments and results reveal that the software rejuvenation strategy can decrease the failure rate and increase the availability of the cluster system. It also shows that the dependency between nodes affects software rejuvenation policy. Based on the theoretic analysis of the software rejuvenation model, a prototype is implemented on the Smart Platform cluster computing system. Performance measurement is carried out on this prototype, and experimental results reveal that software rejuvenation can effectively prevent systems from entering into disabled states, and thereby improving the ability of software fault-tolerance and the availability of cluster computing systems.  相似文献   

13.
企业群集   总被引:1,自引:0,他引:1  
企业群集能最好地满足数据中心对可靠性、灵活性和可伸缩 性的需求,它的节点采用松散连接方式,支持关键业务应用。它可将由操作系统和程序更新 、硬件重配置导致的偶然中断或有计划停机带来的副面影响降至最小,并可保证关键业务应 用免遭软、硬件故障影响。介绍了其成员高可用性磁盘阵列和MC/ServiceGuard。  相似文献   

14.
In this paper, the formalism of Relational Transition Systems (RTSes) is used to model data-intensive reactive systems, and four RTS models of reactive systems based on temporal logic programming, production systems, recurrence equations, and Petri nets are presented. The paper also describes different methods of comparison of the expressive powers of various RTSes in terms of the trajectories they can generate and carries out this comparison for the four RTS formalisms. It is shown that these formalisms have the same expressive power in the deterministic case. The paper also compares expressive powers of non-deterministic production systems and non-deterministic temporal logic programming systems. It is shown that, although the two formalisms are incomparable in the general case, their restricted versions are isomorphic to each other. Received December 7, 1993 / January 26, 1995  相似文献   

15.
Hardware monitoring through performance counters is available on almost all modern processors. Although these counters are originally designed for performance tuning, they have also been used for evaluating power consumption. We propose two approaches for modelling and understanding the behaviour of high performance computing (HPC) systems relying on hardware monitoring counters. We evaluate the effectiveness of our system modelling approach considering both optimizing the energy usage of HPC systems and predicting HPC applications’ energy consumption as target objectives. Although hardware monitoring counters are used for modelling the system, other methods–including partial phase recognition and cross platform energy prediction–are used for energy optimization and prediction. Experimental results for energy prediction demonstrate that we can accurately predict the peak energy consumption of an application on a target platform; whereas, results for energy optimization indicate that with no a priori knowledge of workloads sharing the platform we can save up to 24% of the overall HPC system’s energy consumption under benchmarks and real-life workloads.  相似文献   

16.
It is widely accepted that future HPC systems will be limited by their power consumption. Current HPC systems are built from commodity server processors, designed over years to achieve maximum performance, with energy efficiency being an after-thought. In this paper we advocate a different approach: building HPC systems from low-power embedded and mobile technology parts, over time designed for maximum energy efficiency, which now show promise for competitive performance.We introduce the architecture of Tibidabo, the first large-scale HPC cluster built from ARM multicore chips, and a detailed performance and energy efficiency evaluation. We present the lessons learned for the design and improvement in energy efficiency of future HPC systems based on such low-power cores. Based on our experience with the prototype, we perform simulations to show that a theoretical cluster of 16-core ARM Cortex-A15 chips would increase the energy efficiency of our cluster by 8.7×, reaching an energy efficiency of 1046 MFLOPS/W.  相似文献   

17.
In recent years, we have witnessed a growing interest in high performance computing (HPC) using a cluster of workstations. This growth made it affordable to individuals to have exclusive access to their own supercomputers. However, one of the challenges in a clustered environment is to keep system failure to the minimum and to achieve the highest possible level of system availability. High-Availability (HA) computing attempts to avoid the problems of unexpected failures through active redundancy and preemptive measures. Since the price of hardware components are significantly dropping, we propose to combine both HPC and HA concepts and layout the design of a HA-HPC cluster, considering all possible measures. In particular, we explore the hardware and the management layers of the HA-HPC cluster design, as well as a more focused study on the parallel-applications layer (i.e. FT-MPI implementations). Our findings show that combining HPC and HA architectures is feasible, in order to achieve HA cluster that is used for High Performance Computing.  相似文献   

18.
Attributed graphs describe nodes via attribute vectors and also relationships between different nodes via edges. To partition nodes into clusters with tighter correlations, an effective way is applying clustering techniques on attributed graphs based on various criteria such as node connectivity and/or attribute similarity. Even though clusters typically form around nodes with tight edges and similar attributes, existing methods have only focused on one of these two data modalities. In this paper, we comprehend each node as an autonomous agent and develop an accurate and scalable multiagent system for extracting overlapping clusters in attributed graphs. First, a kernel function with a tunable bandwidth factor δ is introduced to measure the influence of each agent, and those agents with highest local influence can be viewed as the “leader” agents. Then, a novel local expansion strategy is proposed, which can be applied by each leader agent to absorb the most relevant followers in the graph. Finally, we design the cluster-aware multiagent system (CAMAS), in which agents communicate with each other freely under an efficient communication mechanism. Using the proposed multiagent system, we are able to uncover the optimal overlapping cluster configuration, i.e. nodes within one cluster are not only connected closely with each other but also with similar attributes. Our method is highly efficient, and the computational time is shown that nearly linearly dependent on the number of edges when δ ∈ [0.5, 1). Finally, applications of the proposed method on a variety of synthetic benchmark graphs and real-life attributed graphs are demonstrated to verify the systematic performance.  相似文献   

19.
在推荐系统中应用K-means算法聚类可有效降维,然而聚类效果往往依赖于选定的初始中心,并且一旦选定目标簇后,推荐过程只针对目标簇进行,与其他簇无关。针对上述两个问题,提出一种基于满二叉树的二分K-means聚类并行推荐算法。该算法首先反复迭代二分K-means算法,迭代过程中使用簇内凝聚度作为分裂阈值,形成一颗满二叉树;然后通过层次遍历将用户归入到K个叶子节点(簇);最后针对K个簇,应用MapReduce框架进行并行推荐预测。MovieLens上的实验结果表明,该算法可大幅度提高推荐系统准确性,同时增强系统可扩展性。  相似文献   

20.
This paper addresses the growing need for mechanisms supporting intra-node application composition in high-performance computing (HPC) systems. It provides a novel shared memory interface that allows composite applications, two or more coupled applications, to share internal data structures without blocking. This allows independent progress of the applications such that they can proceed in a parallel, overlapped fashion. Composite applications using in-node shared memory can reduce the amount of data to be communicated between nodes, allowing checkpointing and data reduction or analytics to be performed locally and in parallel. The approach is implemented in Linux, and evaluated using benchmarks that represent typical composite applications on a large HPC testbed. The results show that the proposed approach significantly outperforms the traditional ones (up to a 15-fold speed increase on a 200 node machine).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号