A methodology for detailed performance modeling of reduction computations on SMP machines期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A methodology for detailed performance modeling of reduction computations on SMP machines

Authors:	Ruoming Gagan

Affiliation:	Department of Computer and Information Sciences, Ohio State University, Columbus, OH 43210, USA

Abstract:	In this paper, we revisit the problem of performance prediction on SMP machines, motivated by the need for selecting parallelization strategy for random write reductions. Such reductions frequently arise in data mining algorithms. In our previous work, we have developed a number of techniques for parallelizing this class of reductions. Our previous work has shown that each of the three techniques, full replication, optimized full locking, and cache-sensitive, can outperform others depending upon problem, dataset, and machine parameters. Therefore, an important question is, “Can we predict the performance of these techniques for a given problem, dataset, and machine?”. This paper addresses this question by developing an analytical performance model that captures a two-level cache, coherence cache misses, TLB misses, locking overheads, and contention for memory. Analytical model is combined with results from micro-benchmarking to predict performance on real machines. We have validated our model on two different SMP machines. Our results show that our model effectively captures the impact of memory hierarchy (two-level cache and TLB) as well as the factors that limit parallelism (contention for locks, memory contention, and coherence cache misses). The difference between predicted and measured performance is within 20% in almost all cases. Moreover, the model is quite accurate in predicting the relative performance of the three parallelization techniques.

Keywords:	Parallel processing Shared memory Memory hierarchy Data mining
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏