期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张永洪田俊峰蔡红云赵威《计算机工程与应用》2004,40(29):87-89,168

在网格计算环境中,分布协同工作的邻近主机需要多次拷贝远程文件,存在着重复操作的问题。针对这一问题该文提出了远程数据复制过程中的二级缓存的策略,并对可能出现的读写一致性问题给出了解决的方案。从而减少了分布协同工作时各主机多次拷贝远程文件和并行计算环境中多进程读取输入文件的通信开销,提高了一次通信的有效性。相似文献

2.

遗传算法在蜜网动态负载均衡中的应用

曾蛟龙胡荣贵谷裕许成喜《计算机应用研究》2012,29(6):2253-2257

针对蜜网动态负载均衡过程中产生的额外通信开销问题,首先分析了蜜网动态负载均衡的特点,建立了基于最小通信开销的动态负载均衡数学模型;然后设计和实现了一种利用遗传算法解决该问题的新方法。实验测试表明,与贪心算法相比,遗传算法可获得更小通信开销的负载分配方案,能进一步减少蜜网动态负载均衡中负载迁移次数,降低额外通信开销。相似文献

3.

一种新的切换技术——分步流水的树状电路切换

江松郑世荣《计算机研究与发展》1996,33(6):413-417

越来越多的并行计算问题需要直接在选路协议中得到多点通信的支持以减少通信开销。本文提出一种新的支持多点通信的切换技术－分步流水的树状电路切换，图图解决目前基于虫洞切换的多点通信选路算法中普遍存在的通信延迟与通信量之间的冲突，避免能够拥塞，以减少重负载情况下平均通信延迟，提高吞吐率。相似文献

4.

基于改进图划分的异构并行计算模型设计

下载免费PDF全文

袁再龙 《计算机测量与控制》2014,22(6):1941-1943

为了实现大规模计算机集群上的高效分布式并行计算,设计了一种基于改进图划分和量子遗传算法的异构节点并行计算模型;首先,介绍了传统图划分模型并分析了其不足,然后从图的有向性、通信开销计算和负载均衡度等方面对传统的图划分模型进行了改进,从而得到一个改进的图划分模型;最后,以最小化通信开销和优化资源负载均衡为目标,通过设计编码方案,在改进的图划分模型上提出了采用量子遗传算法获取最优任务划分方案的最优解;仿真实验表明:文中方法能有效实现任务的并行计算,与其它方法相比,具有较小的通信开销和较好的负载均衡度,具有很强的可行性。相似文献

5.

基于局域网和MPI的PC集群计算环境 总被引：7，自引：1，他引：6

刘维峰卢伟许海燕《计算机工程与设计》2005,26(5):1327-1329

利用现成PC构建由几十乃至几百台PC组成的廉价、实用且性能优良的并行计算机。实验系统是在由40台PC组成的以太局域网内,建立一个基于TRUBO—LINUX和MPI的集群计算环境,并在此基础上进行并行计算实验和性能测试。实验表明,这种环境适合于那些进程之间通信不频繁的或者通信开销相对于计算开销小得多的中粒度或粗粒度的计算任务。相似文献

6.

基于最小通信开销的动态负载均衡策略

曾蛟龙胡荣贵黄海军谷裕《计算机工程与应用》2013,49(17):103-107

针对动态负载均衡过程产生额外通信开销的问题,建立了一种基于最小通信开销的数学模型。在此基础上,提出一种利用遗传算法解决该问题的新策略。该策略可减少负载迁移次数,降低动态负载均衡过程中的网络流量。仿真实验表明,该策略可获得比贪心策略具有更小通信开销的分配方案。相似文献

7.

分布式并行处理中的异步通信技术及其分析 总被引：3，自引：1，他引：3

陈先桥郭庆平《计算机工程与应用》2004,40(13):140-142

在基于网络环境的分布式并行计算中,因为一般情况下,局域网的底层通信协议多为以太网协议,而以太网采用的是总线通信和信道竞争两种技术,因此基于网络环境的分布式并行计算中最大的问题可能就是要解决好通信开销的问题。根据以太网的特性,提出了一种子任务计算和通信错开的解决方案,并已成功用于求解一个经典的流体力学问题,取得了良好的效果。该文着从理论上分析该方案的加速比和并行效率等。相似文献

8.

基于多层k路划分的三维网格并行任务分配策略

于方郑晓薇孙晓鹏《计算机工程与设计》2010,31(2)

为解决传统任务划分方法在三维网格并行计算任务分配阶段产生的通信开销大的问题,提出了一种基于多层k路划分算法的并行任务分配策略.首先利用多层k路划分算法划分三维网格,将任务划分问题转化为图划分问题,然后基于图划分结果给出一个任务映射并行算法将计算任务分配到各计算结点.在深腾1800上求解三维网格模型最短路径问题的实验结果表明,相比于传统的行列划分任务分配策略,该策略在保证负裁平衡的同时有效地降低了通信开销,算法的运行时间减少,加速比得到提高. 相似文献

9.

基于簇分割的无线传感网数据汇聚方案

郭江鸿张海峰刘志宏《计算机工程与设计》2013,34(7)

为了减少数据汇聚的通信开销,提出一种基于簇分割的无线传感网数据汇聚方案,将簇划分为3个分区,为每个分区指定报告点,分区内与报告点具有相同读数的传感器在数据汇聚时不进行数据发送,减少了簇内数据传输量.分析与实验结果表明,该方案中簇内汇聚所需的通信量低于相关方案,在数据冗余度较高的情况下通信开销下降较为明显. 相似文献

10.

P2P网络中最大频繁项集挖掘算法研究*

邓忠军宋威郑雪峰王少杰《计算机应用研究》2010,27(9):3490-3492

为解决P2P网络频繁项集挖掘中存在的全体频繁项集数量过多和网络通信开销较大这两个问题,提出了一种在P2P网络中挖掘最大频繁项集的算法P2PMaxSet。首先,该算法只挖掘最大频繁项集,减少了结果的数量;其次,每个节点只需与邻居节点进行结果交互,节省了大量的通信开销;最后,讨论了网络动态变化时算法的调整策略。实验结果表明,算法P2PMaxSet具有较高的准确率和较少的通信开销。相似文献

11.

A Hybrid Analysis of an Optimization Approach for Cluster Applications

Ming Zhu Wentong Cai Bu-Sung Lee Xudong Wu 《The Journal of supercomputing》2005,32(3):191-215

Cluster/distributed computing has become a popular, cost-effective alternative to high-performance parallel computers. Many parallel programming languages and related programming models have become widely accepted on clusters. However, the high communication overhead is a major shortcoming of running parallel applications on cluster/distributed computing environments. To reduce the communication overhead and thus the completion time of a parallel application, this paper introduces and evaluates an efficient Key Message (KM) approach to support parallel computing on cluster computing environments. In this paper, we briefly present the model and algorithm, and then analytical and simulation methods are adopted to evaluate the performance of the algorithm. It demonstrates that when network background load increases or the computation to communication ratio decreases, the analysis results show better improvement on communication of a parallel application over the system which does not use the KM approach. 相似文献

12.

WKBZ简正波模型混合并行计算方法研究

范培勤刘晓妍过武宏崔宝龙《计算机工程与科学》2020,42(3):404-410

针对水声传播模型的计算量大,难以满足实时化、精细化水下声传播信息保障需求的难题,基于MPI+OpenMP混合并行编程方法,开展了WKBZ简正波模型混合并行计算方法研究,实现了水下声场2级混合并行计算。该方法通过节点间消息传递、节点内内存共享的方式,有效克服了MPI并行编程模型通信开销大和OpenMP并行编程环境可扩展性差的缺点,较好地解决了水下声传播快速计算的问题。测试结果表明,该方法能够较好地利用SMP集群节点间和节点内多级并行机制,充分发挥消息传递编程模型和共享内存编程模型各自的优势,大幅降低MPI进程间通信带来的时间开销,有效提升程序的可扩展性和并行效率。相似文献

13.

Achieving Robustness and Minimizing Overhead in Parallel Algorithms Through Overlapped Communication/Computation

Somani Arun K. Sansano Allen M. 《The Journal of supercomputing》2000,16(1-2):27-52

One of the major goals in the design of parallel processing machines and algorithms is to achieve robustness and reduce the effects of the overhead introduced when a given problem is parallelized or a fault occurs. A key contributor to overhead is communication time, in particular when a node is faulty and another node is substuiting for its operation. Many architectures try to reduce this overhead by minimizing the actual time for a communication to occur, including latency and bandwidth figures. Another approach is to hide communication by overlapping it with computation assuming that the computation is the most prominent factor. This paper presents the mechanisms provided in the Proteus parallel computer and its effective use of communication hiding through overlapping communication/computation techniques with and without the presence of a fault. These techniques are easily extended for use in compiler support of parallel programming. We also address the complexity (or rather simplicity) in achieving complete exchange on the Proteus Machine. 相似文献

14.

An efficient scheduler of RTOS for multi/many-core system

Xiongli GuAuthor VitaePeng LiuAuthor Vitae Mei YangAuthor VitaeJie YangAuthor Vitae Cheng LiAuthor VitaeQingdong YaoAuthor Vitae 《Computers & Electrical Engineering》2012,38(3):785-800

Recently there is a trend to broaden the usage of lower-power embedded media processor core to build the future high-end computing machine or the supercomputer. However the embedded solution also faces the operating system (OS) design challenge which the thread invoking overhead is higher for fine-grained scientific workload, the message passing among threads is not managed efficiently enough and the OS does not provide convenient enough service for parallel programming. This paper presents a scheduler of master-slave real-time operating system (RTOS) to manage the thread running for the distributed multi/many-core system without shared memories. The proposed scheduler exploits the data-driven feature of scientific workloads to reduce the thread invoking overhead. And it also defines two protocols: (1) one is between the RTOS and application program, which is used to reduce the burden of parallel programming for the programmer; (2) another one is between the RTOS and networks-on-chip, which is used to manage the message passing among threads efficiently. The experimental results show that the proposed scheduler can manage the thread running with lower overhead and less storage requirement, thereby, improving the multi/many-core system performance. 相似文献

15.

Programming support and scheduling for communicating parallel tasks

Jörg Dümmler Thomas Rauber Gudula Rünger 《Journal of Parallel and Distributed Computing》2013

Task-based programming models are beneficial for the development of parallel programs for several reasons. They provide a decoupling of the specification of parallelism from the scheduling and mapping to execution resources of a specific hardware platform, thus allowing a flexible and individual mapping. For platforms with a distributed address space, the use of parallel tasks, instead of sequential tasks, adds the additional advantage of a structuring of the program into communication domains that can help to reduce the overall communication overhead. 相似文献

16.

Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs

Pedro C. Diniz Martin C. Rinard 《Journal of Parallel and Distributed Computing》1998,49(2):858

Atomic operations are a key primitive in parallel computing systems. The standard implementation mechanism for atomic operations uses mutual exclusion locks. In an object-based programming system, the natural granularity is to give each object its own lock. Each operation can then make its execution atomic by acquiring and releasing the lock for the object that it accesses. But this fine lock granularity may have high synchronization overhead because it maximizes the number of executed acquire and release constructs. To achieve good performance it may be necessary to reduce the overhead by coarsening the granularity at which the computation locks objects.In this article we describe a static analysis technique—lock coarsening—designed to automatically increase the lock granularity in object-based programs with atomic operations. We have implemented this technique in the context of a parallelizing compiler for irregular, object-based programs and used it to improve the generated parallel code. Experiments with two automatically parallelized applications show these algorithms to be effective in reducing the lock overhead to negligible levels. The results also show, however, that an overly aggressive lock coarsening algorithm may harm the overall parallel performance by serializing sections of the parallel computation. A successful compiler must therefore negotiate a trade-off between reducing lock overhead and increasing the serialization. 相似文献

17.

Using hybrid MPI and OpenMP programming to?optimize communications in parallel loop self-scheduling schemes for multicore PC clusters

Chao-Chin Wu Lien-Fu Lai Chao-Tung Yang Po-Hsun Chiu 《The Journal of supercomputing》2012,60(1):31-61

Recently, a series of parallel loop self-scheduling schemes have been proposed, especially for heterogeneous cluster systems. However, they employed the MPI programming model to construct the applications without considering whether the computing node is multicore architecture or not. As a result, every processor core has to communicate directly with the master node for requesting new tasks no matter the fact that the processor cores on the same node can communicate with each other through the underlying shared memory. To address the problem of higher communication overhead, in this paper we propose to adopt hybrid MPI and OpenMP programming model to design two-level parallel loop self-scheduling schemes. In the first level, each computing node runs an MPI process for inter-node communications. In the second level, each processor core runs an OpenMP thread to execute the iterations assigned for its resident node. Experimental results show that our method outperforms the previous works. 相似文献

18.

Performance-based path determination for interprocessorcommunication in distributed computing systems

JunSeong Kim Lilja D.J. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(3):316-327

The different types of messages used by a parallel application program executing in a distributed computing system can each have unique characteristics so that no single communication network can produce the lowest latency for all messages. For instance, short control messages may be sent with the lowest overhead on one type of network, such as Ethernet, while bulk data transfers may be better suited to a different type of network, such as Fibre Channel or HIPPI. This work investigates how to exploit multiple heterogeneous communication networks that interconnect the same set of processing nodes using a set of techniques we call performance-based path determination (PBPD). The performance-based path selection (PBPS) technique selects the best (lowest latency) network among several for each individual message to reduce the communication overhead of parallel programs. The performance-based path aggregation (PBPA) technique, on the other hand, aggregates multiple networks into a single virtual network to increase the available bandwidth. We test the PBPD techniques on a cluster of SGI multiprocessors interconnected with Ethernet, Fibre Channel, and HiPPI networks using a custom communication library built on top of the TCP/IP protocol layers. We find that PBPS can reduce communication overhead in applications compared to using either network alone, while aggregating networks into a single virtual network can reduce communication latency for bandwidth-limited applications. The performance of the PBPD techniques depends on the mix of message sizes in the application program and the relative overheads of the networks, as demonstrated in our analytical models 相似文献

19.

产出率并行加速比模型

王之元《计算机工程》2011,37(5):10-12

针对并行计算系统的性能度量问题,在产出率度量模型的基础上,建立综合系统可靠性、通信、并行化控制和成本投入要素的产出率并行加速比模型,分析总结模型中各要素影响产出率并行加速比的关键因子,包括容错开销因子、通信开销因子、并行控制开销因子及成本开销因子,对上述关键因子进行模拟实验,以验证该模型的有效性。相似文献

20.

ParTransgrid: A scalable parallel preprocessing tool for unstructured-grid cell-centered computational fluid dynamics applications

Jian Zhang Jie Liu Naichun Zhou Jing Tang Xie He Jianqiang Chen 《Software》2023,53(1):6-26

The development of a basic scalable preprocessing tool is the key routine to accelerate the entire computational fluid dynamics (CFD) workflow toward the exascale computing era. In this work, a parallel preprocessing tool, called ParTransgrid, is developed to translate the general grid format like CFD General Notation System into an efficient distributed mesh data format for large-scale parallel computing. Through ParTransgrid, a flexible face-based parallel unstructured mesh data structure designed in Hierarchical Data Format can be obtained to support various cell-centered unstructured CFD solvers. The whole parallel preprocessing operations include parallel grid I/O, parallel mesh partition, and parallel mesh migration, which are linked together to resolve the run-time and memory consumption bottlenecks for increasingly large grid size problems. An inverted index search strategy combined with a multi-master-slave communication paradigm is proposed to improve the pairwise face matching efficiency and reduce the communication overhead when constructing the distributed sparse graph in the phase of parallel mesh partition. And we present a simplified owner update rule to fast the procedure of raw partition boundaries migration and the building of shared faces/nodes communication mapping list between new sub-meshes with an order of magnitude of speed-up. Experiment results reveal that ParTransgrid can be easily scaled to billion-level grid CFD applications, the preparation time for parallel computing with hundreds of thousands of cores is reduced to a few minutes. 相似文献