期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Violeta Felea Bernard Toursel 《Concurrency and Computation》2006,18(3):305-331

Program environments or operating systems generally leave the decision on the allocation of program entities to the developer, offering either placement directives, or tools available through the manipulation of a graphical interface. These approaches cannot always take into account the dynamic behavior of applications, dynamicity in the execution environment or the heterogeneity of the execution platform. Transparent deployment algorithms are necessary for automizing and optimizing application distribution. The Adaptive Distributed Applications in Java (ADAJ) project deals with placement and migration of Java objects. It automatically deploys parallel Java applications on a cluster of workstations using monitoring information about the application behavior. The transparency obtained through the integration of these tools in the middleware makes such an environment easy to use and improves efficiency. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

2.

一种基于可变权重的最少连接优先算法 总被引：3，自引：0，他引：3

余胜生杨立辉卢嵩周敬利《小型微型计算机系统》2003,24(11):1946-1949

针对多媒体传输的要求,在研究了现有的集群结构、节点负载分配和平衡算法的基础上,本文提出了自己的改进算法：“基于可变权重的最少连接优先算法”,并进行了验证性实验．实验表明．该算法在基于一个中心控制节点的虚拟服务器下取得了较好的负载平衡效果．相似文献

3.

机群系统的动态负载平衡方法研究

胡光《计算机与现代化》2004,(3):10-11,15

通过对机群系统中的动态负载平衡算法的研究,解决任务再分配时由于进程迁移而引起额外开销较大的问题,提出了一个有效的动态负载平衡算法。通过实验结果分析,可以证明此算法能够提高并行程序的运行性能。相似文献

4.

A. Omar Portillo‐Dominguez Philip Perry Damien Magoni Miao Wang John Murphy 《Software》2016,46(12):1705-1733

Nowadays, clustered environments are commonly used in high‐performance computing and enterprise‐level applications to achieve faster response time and higher throughput than single machine environments. Nevertheless, how to effectively manage the workloads in these clusters has become a new challenge. As a load balancer is typically used to distribute the workload among the cluster's nodes, multiple research efforts have concentrated on enhancing the capabilities of load balancers. Our previous work presented a novel adaptive load balancing strategy (TRINI) that improves the performance of a clustered Java system by avoiding the performance impacts of major garbage collection, which is an important cause of performance degradation in Java. The aim of this paper is to strengthen the validation of TRINI by extending its experimental evaluation in terms of generality, scalability and reliability. Our results have shown that TRINI can achieve significant performance improvements, as well as a consistent behaviour, when it is applied to a set of commonly used load balancing algorithms, demonstrating its generality. TRINI also proved to be scalable across different cluster sizes, as its performance improvements did not noticeably degrade when increasing the cluster size. Finally, TRINI exhibited reliable behaviour over extended time periods, introducing only a small overhead to the cluster in such conditions. These results offer practitioners a valuable reference regarding the benefits that a load balancing strategy, based on garbage collection, can bring to a clustered Java system. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

5.

一种IP分组重组的集群并行计算的方法

秦勇叶建锋梁活民蔡昭权《计算机工程与应用》2008,44(17):104-106

基于N元非合作模型的路由切割调度算法的基础,利用集群来解决大规模的IP分组重组问题是一个可行的办法。论文设计实现了IP分组重组（又称网络地址转换NAT,Network Address Transfer）的集群并行计算方法。使用普通PC构建了基于MPI用于IP分组重组计算的SMP集群,研究了在典型的校园网环境下的较大规模的IP分组重组环境并讨论了集群内部计算节点上的两种负载平衡方法。相似文献

6.

一种IP分组重组的两级并行计算与负载调度方法

秦勇梁根叶建锋蔡昭权《计算机工程与科学》2008,30(9):8-10

本文构建了基于MPICH和SMP／CMP的IP分组重组两级并行计算集群,并基于N元非合作模型的路由切割调度算法研究了在较大规模IP分组重组环境下集群内部计算节点上的两种负载平衡方法。实验说明,利用集群来解决大规模的IP分组重组问题是一个可行的办法。相似文献

7.

一种适用于机群系统的任务动态调度方法^* 总被引：21，自引：1，他引：21

下载免费PDF全文

傅强郑纬民《软件学报》1999,10(1):19-23

任务调度是机群系统上实现并行计算需要解决的重要问题之一.对于在运行中动态产生任务的并行应用程序,由于很难作出准确的任务分配决策,可能导致各个计算结点的任务负载失衡,最终引起整个系统的性能显著下降.因此,需要通过任务再分配来维持负载平衡.该文提出一种任务分配与再分配方法,它通过尽量延迟任务的执行开始时刻,在任务再分配时避免了进程迁移,使得引入的调度开销很小.分析和实验结果表明,该方法在许多情况下能够有效地提高并行程序的运行性能. 相似文献

8.

提升大规模集群上并行计算软件系统可靠性和服务性的方法与实践

林彦宇陈虎苗军韩佳龙媚赖路双《计算机工程与科学》2015,37(1):1-6

大规模集群上的并行计算软件需要具备处理部分节点、网络等失效的容错能力,也需要具有易于管理、维护、移植和可扩展的服务能力。针对星形计算模型,研究和开发了一套并行计算框架。利用调度节点内部的可变粒度分解器、相关队列等方法,实现了全系统容错,且具有较好的易用性、可移植性和可扩展性。系统目前可以实现300TFlops计算能力下连续运行超过150h,而且还具有进一步的可扩展能力。相似文献

9.

任务序列强度感知的大规模集群服务器控制模型

蔡文伟朱嘉贤张会兵《计算机应用研究》2020,37(12):3753-3756

异构云数据中心各类服务器的控制成本和性能上的差异将影响其运维管理成本及Qo S博弈平衡关系,针对任务序列强度具有的时效性,提出了任务序列强度感知的大规模任务调度模型。依据当前到达数据中心的任务序列强度以及集群中服务器的当前状态,在任务调度中强调节约服务器运维管理成本和各服务器负载均衡的基础上实现优化数据中心对任务序列处理的平均响应时间和系统的吞吐量。通过对实验结果的分析,验证了集群服务器控制模型在任务调度中的可信度大于95%,同时通过与当前应用广且具代表性的算法——最短任务优先、公平分发机制进行比较分析,其效果是三者中最好的,也验证了模型的有效性和可行性。相似文献

10.

遥感影像并行处理中基于优先级的任务分配策略

付征叶凡高娟王俊岭《计算机工程》2014,(2):48-51,57

对集群环境下大规模遥感影像并行计算中任务分配效率低、负载不均衡的问题进行分析讨论,在此基础上建立多机任务分配模型,提出一种基于计算节点优先级的任务分配算法。该算法综合考虑计算节点的负载和性能,在任务分配时实时地收集各个节点的信息,计算出各个计算节点的优先级,按照优先级的高低分配任务,保证在满足集群间负载均衡的前提下能合理地将任务分配到计算节点。实验结果表明,该算法能快速实时地进行任务分配,任务的分布更加合理和均匀,并且当任务个数增多时,算法的执行效率要比轮转调度算法高出约2倍。相似文献

11.

Chao‐Tung Yang Keng‐Yi Chou Kuan‐Chou Lai 《Concurrency and Computation》2011,23(15):1701-1722

Cluster computing is an attractive approach to provide high‐performance computing for solving large‐scale applications. Owing to the advances in processor and networking technology, expanding clusters have resulted in the system heterogeneity; thus, it is crucial to dispatch jobs to heterogeneous computing resources for better resource utilization. In this paper, we propose a new job allocation system for heterogeneous multi‐cluster environments named the Adaptive Job Allocation Strategy (AJAS), in which a self‐scheduling scheme is applied in the scheduler to dispatch jobs to the most appropriate computing resources. Our strategy focuses on increasing resource utility by dispatching jobs to computing nodes with similar performance capacities. By doing so, execution times among all nodes can be equalized. The experimental results show that AJAS can improve the system performance. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

12.

Ioana Banicescu Sheikh Ghafoor Vijay Velusamy Samuel H. Russ Mark Bilderback 《Concurrency and Computation》2001,13(2):121-139

Load balancing increases the efficient use of existing resources for parallel and distributed applications. At a coarse level of granularity, advances in runtime systems for parallel programs have been proposed in order to control available resources as efficiently as possible by utilizing idle resources and using task migration. Simultaneously, at a finer granularity level, advances in algorithmic strategies for dynamically balancing computational loads by data redistribution have been proposed in order to respond to variations in processor performance during the execution of a given parallel application. Combining strategies from each level of granularity can result in a system which delivers advantages of both. The resulting integration is systemic in nature and transfers the responsibility of efficient resource utilization from the application programmer to the runtime system. This paper presents the design and implementation of a system that combines an algorithmic fine-grained data parallel load balancing strategy with a systemic coarse-grained task-parallel load balancing strategy, and reports on recent experimental results of running a computationally intensive scientific application under this integrated system. The experimental results indicate that a distributed runtime environment which combines both task and data migration can provide performance advantages with little overhead. It also presents proposals for performance enhancements of the implementation, as well as future explorations for effective resource management. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

13.

D. J. Haglin K. R. Mayes A. M. Manning J. Feo J. R. Gurd M. Elliot J. A. Keane 《Concurrency and Computation》2009,21(9):1131-1158

Three parallel implementations of a divide‐and‐conquer search algorithm (called SUDA2) for finding minimal unique itemsets (MUIs) are compared in this paper. The identification of MUIs is used by national statistics agencies for statistical disclosure assessment. The first parallel implementation adapts SUDA2 to a symmetric multi‐processor cluster using the message passing interface (MPI), which we call an MPI cluster; the second optimizes the code for the Cray MTA2 (a shared‐memory, multi‐threaded architecture) and the third uses a heterogeneous ‘group’ of workstations connected by LAN. Each implementation considers the parallel structure of SUDA2, and how the subsearch computation times and sequence of subsearches affect load balancing. All three approaches scale with the number of processors, enabling SUDA2 to handle larger problems than before. For example, the MPI implementation is able to achieve nearly two orders of magnitude improvement with 132 processors. Performance results are given for a number of data sets. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

14.

Jahanzeb Maqbool Sangyoon Oh Geoffrey C. Fox 《Concurrency and Computation》2015,27(17):5390-5410

The power consumption of modern high‐performance computing (HPC) systems that are built using power hungry commodity servers is one of the major hurdles for achieving Exascale computation. Several efforts have been made by the HPC community to encourage the use of low‐powered system‐on‐chip (SoC) embedded processors in large‐scale HPC systems. These initiatives have successfully demonstrated the use of ARM SoCs in HPC systems, but there is still a need to analyze the viability of these systems for HPC platforms before a case can be made for Exascale computation. The major shortcomings of current ARM‐HPC evaluations include a lack of detailed insights about performance levels on distributed multicore systems and performance levels for benchmarking in large‐scale applications running on HPC. In this paper, we present a comprehensive evaluation of results that covers major aspects of server and HPC benchmarking for ARM‐based SoCs. For the experiments, we built an unconventional cluster of ARM Cortex‐A9s that is referred to as Weiser and ran single‐node benchmarks (STREAM, Sysbench, and PARSEC) and multi‐node scientific benchmarks (High‐performance Linpack (HPL), NASA Advanced Supercomputing (NAS) Parallel Benchmark, and Gadget‐2) in order to provide a baseline for performance limitations of the system. Based on the experimental results, we claim that the performance of ARM SoCs depends heavily on the memory bandwidth, network latency, application class, workload type, and support for compiler optimizations. During server‐based benchmarking, we observed that when performing memory intensive benchmarks for database transactions, x86 performed 12% better for multithreaded query processing. However, ARM performed four times better for performance to power ratios for a single core and 2.6 times better on four cores. We noticed that emulated double precision floating point in Java resulted in three to four times slower performance as compared with the performance in C for CPU‐bound benchmarks. Even though Intel x86 performed slightly better in computation‐oriented applications, ARM showed better scalability in I/O bound applications for shared memory benchmarks. We incorporated the support for ARM in the MPJ‐Express runtime and performed comparative analysis of two widely used message passing libraries. We obtained similar results for network bandwidth, large‐scale application scaling, floating‐point performance, and energy‐efficiency for clusters in message passing evaluations (NBP and Gadget 2 with MPJ‐Express and MPICH). Our findings can be used to evaluate the energy efficiency of ARM‐based clusters for server workloads and scientific workloads and to provide a guideline for building energy‐efficient HPC clusters. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

15.

基于MPI的集群监控系统

邢小虎宋安军《计算机辅助工程》2006,15(4):27-30

为了有效地监控集群系统,基于消息传递接口（Message Passing Interface,MPI）并行库构建一个简单易行的并行任务模型．详细介绍该任务模型中的集群监控、节点负载均衡评估模型结构以及Linux集群数据采集．实验表明该模型配置简单、资源开销低,且对集群系统的干扰小．相似文献

16.

J. Luitjens M. Berzins T. Henderson 《Concurrency and Computation》2007,19(10):1387-1402

In this paper we consider the scalability of parallel space‐filling curve generation as implemented through parallel sorting algorithms. Multiple sorting algorithms are studied and results show that space‐filling curves can be generated quickly in parallel on thousands of processors. In addition, performance models are presented that are consistent with measured performance and offer insight into performance on still larger numbers of processors. At large numbers of processors, the scalability of adaptive mesh refined codes depends on the individual components of the adaptive solver. One such component is the dynamic load balancer. In adaptive mesh refined codes, the mesh is constantly changing resulting in load imbalance among the processors requiring a load‐balancing phase. The load balancing may occur often, requiring the load balancer to perform quickly. One common method for dynamic load balancing is to use space‐filling curves. Space‐filling curves, in particular the Hilbert curve, generate good partitions quickly in serial. However, at tens and hundreds of thousands of processors serial generation of space‐filling curves will hinder scalability. In order to avoid this issue we have developed a method that generates space‐filling curves quickly in parallel by reducing the generation to integer sorting. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献

17.

测控系统的集群计算机的设计 总被引：1，自引：0，他引：1

郝丽蕊刘晓刚《计算机测量与控制》2012,20(4):935-938

并行程序开发的复杂性是并行系统得到广泛应用的主要障碍,文章针对靶场测控系统复杂多源信息的特点,提出了一种基于任务数据结构表并行计算;通过复杂多源信息的任务分解、任务数据结构的描述,并行计算模型设计以及负载平衡的实现等方面来实现机群系统并行计算;最后对系统进行了可扩展性和可用性分析(Availability),经分析系统能适应新任务的要求,具有很好的扩展性并具有连续不间断运行工作能力满足高可用性要求。相似文献

18.

大规模分布虚拟环境的分级兴趣管理 总被引：4，自引：3，他引：4

何连跃李思昆曾亮《计算机辅助设计与图形学学报》2000,12(9):711-714

大规模分布虚拟环境中,兴趣管理技术使其中的裕本只向对实体只向对它感兴趣的实体发送数据,大幅度减少了系统斌销及计算开销,然而由于参与与分布交互仿真的实体兴趣域的巨大差异,可能导致网络通信负载的极度不平衡,由此分级兴趣管理技术,分级兴趣管理技术把兴趣域内实体根据兴趣度分级,在此基础上,对低兴趣度实体提出了高效的状态数据包压缩、还的算法,并提出多阈值ＤＲ技术,以减少通信频率,分级兴趣管理技术可以缓解实相似文献

19.

基于物联网技术的异构集群动态负载均衡算法

李娟《计算机与现代化》2021,(4):104-108

为提高异构集群的动态负载均衡性,引入物联网技术中的传感技术构建异构集群信道传输模型,然后采用动态加权方法配置输出信道,并完成对信道特征的分解;在建立信道模糊重组结构模型的基础上,采用噪声干扰抑制方法对异构集群通信信道进行多径干扰抑制,并结合波特间隔均衡采样方法控制信道输出的均衡性,通过模糊度均衡配置和空间均衡调度过程实... 相似文献

20.

Zhou Lei Gabrielle Allen Promita Chakraborty Dayong Huang John Lewis Xin Li Christopher D. White 《Concurrency and Computation》2008,20(18):2123-2140

Uncertainty analysis is critical for conducting reservoir performance prediction. However, it is challenging because it relies on (1) massive modeling‐related, geographically distributed, terabyte, or even petabyte scale data sets (geoscience and engineering data), (2) needs to rapidly perform hundreds or thousands of flow simulations, being identical runs with different models calculating the impacts of various uncertainty factors, (3) an integrated, secure, and easy‐to‐use problem‐solving toolkit to assist uncertainty analysis. We leverage Grid computing technologies to address these challenges. We design and implement an integrated problem‐solving environment ResGrid to effectively improve reservoir uncertainty analysis. The ResGrid consists of data management, execution management, and a Grid portal. Data Grid tools, such as metadata, replica, and transfer services, are used to meet massive size and geographically distributed characteristics of data sets. Workflow, task farming, and resource allocation are used to support large‐scale computation. A Grid portal integrates the data management and the computation solution into a unified easy‐to‐use interface, enabling reservoir engineers to specify uncertainty factors of interest and perform large‐scale reservoir studies through a web browser. The ResGrid has been used in petroleum engineering. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献