首页 | 本学科首页   官方微博 | 高级检索  
     


Multiprogrammed non-blocking checkpoints in support of optimistic simulation on myrinet clusters
Affiliation:1. College of Computer, National University of Defense Technology, Changsha 410073, China;2. National Supercomputer Center in Changsha, Changsha 410082, China;3. School of Computer Science, Hunan University of Technology, Zhuzhou 412007, China;4. School of Data and Computer Science, Sun Yat-Sen University, Guangzhou 510006, China;5. National Supercomputer Center in Guangzhou, Guangzhou 510006, China;6. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China;1. TSYS School of Computer Science, Columbus State University, Columbus, GA 31907-5645, USA;2. Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849-5347, USA;3. Department of Computer Science, Sonoma State University, Rohnert Park, CA 94928, USA;4. Department of Computer Science, Earlham College, Richmond, Indiana 47374-4095, USA;5. School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
Abstract:CCL (checkpointing and communication library) is a software layer in support of optimistic parallel discrete event simulation (PDES) on myrinet-based COTS clusters. Beyond classical low latency message delivery functionalities, this library implements CPU offloaded, non-blocking (asynchronous) checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. These functionalities are unique since optimistic simulation systems conventionally rely on checkpointing implemented as a synchronous, CPU-based data copy. Releases of CCL up to v2.4 only support monoprogrammed non-blocking checkpoints. This forces re-synchronization between CPU and DMA activities, which is a potential source of overhead, each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. In this paper we present a redesigned release of CCL (v3.0) that, exploiting hardware capabilities of more advanced myrinet clusters, supports multiprogrammed non-blocking checkpoints. The multiprogrammed approach allows higher degree of concurrency between checkpointing and other simulation specific operations carried out by the CPU, with benefits on performance. We also report the results of the experimental evaluation of those benefits for the case of a Personal Communication System (PCS) simulation application, selected as a real world test-bed.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号