首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Volunteer computing which benefits from idle cycles of volunteer resources over the Internet can integrate the power of hundreds to thousands of resources to achieve high computing power. In such an environment the resources are heterogeneous in terms of CPU speed, RAM, disk capacity, and network bandwidth. So finding a suitable resource to run a particular job becomes difficult. Resource discovery architecture is a key factor for overall performance of peer-to-peer based volunteer computing systems. The main contribution of this paper is to develop a proximity-aware resource discovery architecture for peer-to-peer based volunteer computing systems. The proposed resource discovery algorithm consists of two stages. In the first stage, it selects resources based on the requested quality of service and current load of peers. In the second stage, a resource with higher priority to communication delay is selected among the discovered resources. Communication delay between two peers is computed by a network model based on queuing theory, taking into account the background traffic of the Internet. Simulation results show that the proposed resource discovery algorithm improves the response time of user’s requests by a factor of 4.04 under a moderate load.  相似文献   

2.
A number of technology and workload trends motivate us to consider the appropriate resource allocation mechanisms and policies for streaming media services in shared cluster environments. We present MediaGuard – a model-based infrastructure for building streaming media services – that can efficiently determine the fraction of server resources required to support a particular client request over its expected lifetime. The proposed solution is based on a unified cost function that uses a single value to reflect overall resource requirements such as the CPU, disk, memory, and bandwidth necessary to support a particular media stream based on its bit rate and whether it is likely to be served from memory or disk. We design a novel, time-segment-based memory model of a media server to efficiently determine in linear time whether a request will incur memory or disk access when given the history of previous accesses and the behavior of the server's main memory file buffer cache. Using the MediaGuard framework, we design two media services: (1) an efficient and accurate admission control service for streaming media servers that accounts for the impact of the server's main memory file buffer cache, and (2) a shared streaming media hosting service that can efficiently allocate the predefined shares of server resources to the hosted media services, while providing performance isolation and QoS guarantees among the hosted services. Our evaluation shows that, relative to a pessimistic admission control policy that assumes that all content must be served from disk, MediaGuard (as well as services that are built using it) deliver a factor of two improvement in server throughput.  相似文献   

3.
Internet computing is emerging as an important new distributed computing paradigm in which resource intensive computing is integrated over Internet-scale networks. Over these large networks, different users and organizations share their computing resources, and computations take place in a distributed fashion. In such an environment, a framework is needed in which the resource providers are given incentives to share their resources. CompuP2P is a lightweight architecture for enabling Internet computing. It uses peer-to-peer networks for sharing of computing resources. CompuP2P create dynamic markets of network accessible computing resources, such as processing power, memory storage, disk space, etc., in a completely distributed, scalable, and fault-tolerant manner. This paper discusses the system architecture, functionality, and applications of the proposed CompuP2P architecture. We have implemented a Java-based prototype, and our results show that the system is light-weight and can provide almost a perfect speedup for applications that contain several independent compute-intensive tasks  相似文献   

4.
The widespread adoption of traditional heterogeneous systems has substantially improved the computing power available and, in the meantime, raised optimisation issues related to the processing of task streams across both CPU and GPU cores in heterogeneous systems. Similar to the heterogeneous improvement gained in traditional systems, cloud computing has started to add heterogeneity support, typically through GPU instances, to the conventional CPU-based cloud resources. This optimisation of cloud resources will arguably have a real impact when running on-demand computationally-intensive applications.In this work, we investigate the scaling of pattern-based parallel applications from physical, “local” mixed CPU/GPU-clusters to a public cloud CPU/GPU infrastructure. Specifically, such parallel patterns are deployed via algorithmic skeletons to exploit a peculiar parallel behaviour while hiding implementation details.We propose a systematic methodology to exploit approximated analytical performance/cost models, and an integrated programming framework that is suitable for targeting both local and remote resources to support the offloading of computations from structured parallel applications to heterogeneous cloud resources, such that performance values not available on local resources may be actually achieved with the remote resources. The amount of remote resources necessary to achieve a given performance target is calculated through the performance models in order to allow any user to hire the amount of cloud resources needed to achieve a given target performance value. Thus, it is therefore expected that such models can be used to devise the optimal proportion of computations to be allocated on different remote nodes for Big Data computations.We present different experiments run with a proof-of-concept implementation based on FastFlow  on small departmental clusters as well as on a public cloud infrastructure with CPU and GPU using the Amazon Elastic Compute Cloud. In particular, we show how CPU-only and mixed CPU/GPU computations can be offloaded to remote cloud resources with predictable performances and how data intensive applications can be mapped to a mix of local and remote resources to guarantee optimal performances.  相似文献   

5.
Contemporary long-term storage devices feature powerful embedded processors and sizeable memory buffers. Active Storage Devices (ASD) is the hard disk technology that makes use of these significant resources to not only manage the disk operation but also to execute custom application code on large amounts of data. While prior research has shown that ASDs perform exceedingly well with filter-type algorithms, the evaluation of binary-relational operators has been limited. In this paper, we analyze and evaluate inter-operator parallelism of GRACE-based join algorithms that function atop ASDs. We derive accurate cost expressions for existing algorithms and expose performance bottlenecks; upon these findings we propose Active Hash Join, a new algorithm that exploits all system resources. Through experimentation, we confirm that existing algorithms are best suited for systems with either small or large numbers of ASDs. However, we find that the “adaptive” nature of Active Hash Join yields enhanced parallelism in all cases, especially when the aggregate ASD resources are comparable to the main CPU and main memory. Recommended by: Ahmed Elmagarmid Work partially supported by the University of Athens Research Foundation.  相似文献   

6.
CodiP2P and DisCoP are two peer-to-peer (P2P) computing overlays aimed at sharing computing resources (CPU, Memory, etc.) to execute parallel applications. Their component nodes are basically PC’s and a wide range of computer servers, desktops or laptops. This paper joins these two platforms into a new one, DisCoP2P, to combine the features from both overlays. CodiP2P is highly scalable, and DisCoP has an efficient searching mechanism and the ability to classify computing resources. The new platform takes advantage of these features and uses them to offer new facilities to schedule and execute parallel applications efficiently. This is accomplished at null cost because the platform is made up of nodes that share resources for free. This research field can also be classified in desktop computing. The success of this platform depends greatly on the added overhead. This overhead is produced mainly in searching for resources and system administration. The obtained results in a preliminary prototype, although not sufficiently conclusive, demonstrate the applicability of DisCoP2P in the real world, i.e. Internet.  相似文献   

7.
Consistency techniques for continuous constraints   总被引:1,自引:0,他引:1  
We consider constraint satisfaction problems with variables in continuous, numerical domains. Contrary to most existing techniques, which focus on computing one single optimal solution, we address the problem of computing a compact representation of the space of all solutions admitted by the constraints. In particular, we show how globally consistent (also called decomposable) labelings of a constraint satisfaction problem can be computed.Our approach is based on approximating regions of feasible solutions by 2 k -trees, a representation commonly used in computer vision and image processing. We give simple and stable algorithms for computing labelings with arbitrary degrees of consistency. The algorithms can process constraints and solution spaces of arbitrary complexity, but with a fixed maximal resolution.Previous work has shown that when constraints are convex and binary, path-consistency is sufficient to ensure global consistency. We show that for continuous domains, this result can be generalized to ternary and in fact arbitrary n-ary constraints using the concept of (3,2)-relational consistency. This leads to polynomial-time algorithms for computing globally consistent labelings for a large class of constraint satisfaction problems with continuous variables.  相似文献   

8.
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact the local users. Such resources are generally provided voluntarily and their availability fluctuates highly. Guest jobs may fail unexpectedly, as resources become unavailable. To improve this situation, we consider methods to predict resource availability. This paper presents empirical studies on resource availability in FGCS systems and a prediction method. From studies on resource contention among guest jobs and local users, we derive a multi-state availability model. The model enables us to detect resource unavailability in a non-intrusive way. We analyzed the traces collected from a production FGCS system for 3 months. The results suggest the feasibility of predicting resource availability, and motivate our method of applying semi-Markov Process models for the prediction. We describe the prediction framework and its implementation in a production FGCS system, named iShare. Through the experiments on an iShare testbed, we demonstrate that the prediction achieves an accuracy of 86% on average and outperforms linear time series models, while the computational cost is negligible. Our experimental results also show that the prediction is robust in the presence of irregular resource availability. We tested the effectiveness of the prediction in a proactive scheduler. Initial results show that applying availability prediction to job scheduling reduces the number of jobs failed due to resource unavailability. This work was supported, in part, by the National Science Foundation under Grants No. 0103582-EIA, 0429535-CCF, and 0650016-CNS. We thank Ruben Torres for his help with the reference prediction algorithms used in our experiments.  相似文献   

9.
一种基于内存服务的内存共享网格系统   总被引:1,自引:0,他引:1  
褚瑞  肖侬  卢锡城 《计算机学报》2006,29(7):1225-1233
内存密集型应用对运行环境的物理内存要求严格,在物理内存不足时将会引发大量磁盘IO,降低系统性能.传统的网络内存致力于在集群内部通过共享空闲节点的物理内存解决该问题,但受集群负载和内部网络影响较大.通过结合网络内存和服务计算、网格计算等技术,提出一种基于内存服务的内存共享网格系统——内存网格,并分析和讨论了实现内存服务的关键技术和算法.内存网格弥补了网络内存的不足,扩展了网格计算的应用范围.通过基于真实应用运行状态的模拟,证明了内存网格与网络内存相比具有性能的提高.  相似文献   

10.
Virtualization technology has been widely adopted in Internet hosting centers and cloud-based computing services, since it reduces the total cost of ownership by sharing hardware resources among virtual machines (VMs). In a virtualized system, a virtual machine monitor (VMM) is responsible for allocating physical resources such as CPU and memory to individual VMs. Whereas CPU and I/O devices can be shared among VMs in a time sharing manner, main memory is not amendable to such multiplexing. Moreover, it is often the primary bottleneck in achieving higher degrees of consolidation. In this paper, we present VMMB (Virtual Machine Memory Balancer), a novel mechanism to dynamically monitor the memory demand and periodically re-balance the memory among the VMs. VMMB accurately measures the memory demand with low overhead and effectively allocates memory based on the memory demand and the QoS requirement of each VM. It is applicable even to guest OS whose source code is not available, since VMMB does not require modifying guest kernel. We implemented our mechanism on Linux and experimented on synthetic and realistic workloads. Our experiments show that VMMB can improve performance of VMs that suffers from insufficient memory allocation by up to 3.6 times with low performance overhead (below 1%) for monitoring memory demand.  相似文献   

11.
Sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi‐core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We study important features of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the ‘CPU core hours’ metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology‐aware mapping heuristic using simplified network load model. We have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the ‘CPU core hours’ metric and significantly reduces data movement overheads. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
Peer To Peer计算技术的应用领域与典型问题讨论   总被引:5,自引:0,他引:5  
With the rapid development of computer networks there are more and more resources involoved in it,such as CPU cycles,PC‘‘s disk space and network bandwidth.Many people believe that peer to peer computing can harness these disruptive powers.In this paper we describe the applied fields,research issues and typical problems in peer to peer computing.  相似文献   

13.
The cluster system we consider for load sharing is a compute farm which is a pool of networked server nodes providing high-performance computing for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing resource management systems mainly target at balancing the usage of CPU loads among server nodes. With the rapid advancement of CPU chips, memory and disk access speed improvements significantly lag behind advancement of CPU speed, increasing the penalty for data movement, such as page faults and I/O operations, relative to normal CPU operations. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in clusters. We study two types of application workloads: 1) Memory demands are known in advance or are predictable and 2) memory demands are unknown and dynamically changed during execution. Besides using workload traces with known memory demands, we have also made kernel instrumentation to collect different types of workload execution traces to capture dynamic memory access patterns. Conducting different groups of trace-driven simulations, we show that our proposed policies can effectively improve overall job execution performance by well utilizing both CPU and memory resources with known and unknown memory demands  相似文献   

14.
This paper presents a cooperative heterogeneous computing framework which enables the efficient utilization of available computing resources of host CPU cores for CUDA kernels, which are designed to run only on GPU. The proposed system exploits at runtime the coarse-grain thread-level parallelism across CPU and GPU, without any source recompilation. To this end, three features including a work distribution module, a transparent memory space, and a global scheduling queue are described in this paper. With a completely automatic runtime workload distribution, the proposed framework achieves speedups of 3.08 $\times $ in the best case and 1.42 $\times $ on average compared to the baseline GPU-only processing.  相似文献   

15.
Modern computing hardware has a very good task parallelism, but resource contention between tasks remains high. This renders large fractions of CPU time wasted and leads to application interference. Even tasks running on dedicated CPU cores can still incur interference from other tasks, most notably because of the caches and other hardware components shared by more than one core. The level of interference depends on the nature of executed tasks and is difficult to predict. A customer who has been granted that his task will run as if it were alone (e.g., a CPU core dedicated to a virtual machine), indeed suffers from significant performance degradation due to the time spent waiting for resources occupied by other tasks. Measuring actual performance of a task or a virtual machine can be difficult. However, even more challenging is estimating what the performance of the task should be if it were running completely in isolation. In this paper, we present a measurement technique Freeze'nSense. It is based on the hardware performance counters and allows measuring actual performance of a task and estimating performance as if the task were in isolation, all during runtime. To estimate performance in isolation, the proposed technique performs a short‐time freezing of the potentially interfering tasks. Freeze'nSense introduces lower than 1% overhead and is confirmed to provide accurate and reliable measurements. In practice, Freeze'nSense becomes a valuable tool helping to automatically identify tasks that suffer the most in a shared environment and move them to a distant core. The observed performance improvement can be as large as 80–100% for individual tasks, and scale up to 15–20% for the computing node. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

16.
Gadget is a simulation application for N‐body and smoothed particle hydrodynamics problems in cosmology, and it is widely applied in solving series of cosmological problems. N‐body focuses on the motion of the interaction of N particles, and smoothed particle hydrodynamics is a fluid simulation algorithm that studies the movement of fluid through particle simulation. Most scholars focus their attention on accelerating Gadget on multi‐core CPU or graphics processing units (GPUs) platforms. However, these research activities failed to achieve CPU–GPU hybrid computing, which resulted in tremendous waste of CPU computing resources. In this paper, we propose a CPU–GPU hybrid parallel strategy to accelerate Gadget‐2, a massively parallel structure formation code for cosmological simulations. This strategy uses CPU and GPU to process the calculation of short‐range force. To ensure CPU and GPU workload balance, a dynamic task allocation scheme is proposed according to the computational performance difference between the CPU and GPU. Experimental results showed that our CPU–GPU hybrid parallel strategy achieved an overall speedup factor of 18.6 and a partial speedup factor for short‐range force calculation of 28.35 compared with a single‐core CPU implementation for particles in million‐size magnitudes. Moreover, compared with a GPU platform that contained 12 CPU cores and one GPU, our hybrid parallel strategy obtained overall speedup and partial speedup factors of 6% and 20%, respectively. Furthermore, the scalability of the hybrid strategy is very fine – its performance will be enhanced when the problem scale is increasing. However, this strategy also has its limitation that the performance enhancement will be decreasing if the ratio(the number of CPU cores divides that of the GPU cards) reduces. Finally, in our hybrid strategy, the CPU coefficient of utilization improved by 17.14% or better. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

17.
We propose a storage efficient, fast and parallelizable out-of-core framework for streaming computations of high resolution level sets. The fundamental techniques are skewing and tiling transformations of streamed level set computations which allow for the combination of interface propagation, re-normalization and narrow-band rebuild into a single pass over the data stored on disk. When combined with a new data layout on disk, this improves the overall performance when compared to previous streaming level set frameworks that require multiple passes over the data for each time-step. As a result, streaming level set computations are now CPU bound and consequently the overall performance is unaffected by disk latency and bandwidth limitations. We demonstrate this with several benchmark tests that show sustained out-of-core throughputs close to that of in-core level set simulations.  相似文献   

18.
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications.

Program summary

Program title: SWsolverCatalogue identifier: AEGY_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: GPL v3No. of lines in distributed program, including test data, etc.: 59 168No. of bytes in distributed program, including test data, etc.: 453 409Distribution format: tar.gzProgramming language: C, CUDAComputer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator.Operating system: LinuxHas the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs.RAM: Tested on Problems requiring up to 4 GB per compute node.Classification: 12External routines: MPI, CUDA, IBM Cell SDKNature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA.Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster.Additional comments: Sub-program numdiff is used for the test run.  相似文献   

19.
Peer-to-Peer (P2P) Desktop Grids are computing infrastructures that aggregate a set of desktop-class machines in which all the participating entities have the same roles, responsibilities, and rights. In this paper, we present ShareGrid, a P2P Desktop Grid infrastructure based on the OurGrid middleware, that federates the resources provided by a set of small research laboratories to easily share and use their computing resources. We discuss the techniques and tools we employed to ensure scalability, efficiency, and usability, and describe the various applications used on it. We also demonstrate the ability of ShareGrid of providing good performance and scalability by reporting the results of experimental evaluations carried out by running various applications with different resource requirements. Our experience with ShareGrid indicates that P2P Desktop Grids can represent an effective answer to the computing needs of small research laboratories, as long as they provide both ease of management and use, and good scalability and performance.  相似文献   

20.
Advances in wired and wireless networking technologies are making networked computing the most common form of high performance computing. Similarly, software like Mosaic and Netscape have not only unified the networked computing landscape, but they have made it available to the masses in a simple, machine independent way. These developments are changing the way we do computational science, learn, research, collaborate, access information and resources, and maintain local and global relations. We envision a scenario where large scale computational science and engineering applications like virtual classrooms and laboratories are ubiquitous, and information resources are accessible on-demand from anywhere. In this paper we present the design of a user interface that will be appropriate to this scenario. We argue that interfaces modeled on the pen and paper paradigm are suited in this context. Specifically, we present the software architecture of a notebook interface. We lay down the requirements for such an interface and present its implementation using the World Wide Web. A realization of the notebook model is presented for a problem solving environment (PDELab) to support the numerical simulation of PDE based applications on a network of heterogeneous high performance machines. © 1997 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号