共查询到19条相似文献,搜索用时 140 毫秒
1.
LP到PP的高效映射是加速并行性能模拟的关键技术之一。针对交互规则的并行应用程序,设计映射生成方法A2-LP3M从Trace中提取LP间的交互模式,以宿主机物理进程间通信最小化为目标,兼顾计算负载平衡,从循环块映射中选取合适的映射方式。实验表明,相对于常规映射方法,A2-LP3M减少并行模拟时间最多可达16.2%。 相似文献
2.
Trace生成是trace驱动体系结构模拟中不可或缺的步骤。Trace不仅需要占用大量存储空间,其生成过程还可能对目标应用程序的模拟执行产生一定程度的干扰,导致性能数据误差。Trace驱动并行性能模拟器由于其设计实现特点和所运行的宿主并行平台的多样性,使得trace生成的影响具有其独特性。本文选取典型并行模拟器BigSim和若干具有不同计算通信比的目标并行程序,在三个支持不同traceI/O方式的宿主机平台上设计实验评估了trace生成对并行性能模拟的影响,结果表明trace生成对模拟效率和精度均有较大的影响,并分析了这种影响与并行模拟器实现和宿主机平台I/O方式的关系,进而讨论了几种可行的改进方案,对trace驱动并行模拟器设计、实现和使用具有一定的指导意义。 相似文献
3.
基于新型多核SMP集群的层次化性能模型,本文在BigSim并行性能模拟器基础上实现了一个Trace驱动的多核SMP集群并行性能模拟器Sim-MSC。在一个InfiniBand多核SMP集群的宿主机平台上采用jacobi3D程序进行了测试,结果表明Sim-MSC能够模拟MPI消息传递并行应用程序在多核SMP集群上的执行特征,精确预测系统和应用性能。 相似文献
4.
5.
6.
7.
模拟器是计算机体系结构研究的重要工具.近年来并行计算机体系结构的发展给计算机模拟带来了巨大的挑战.一方面,随着体系结构朝着多核以及众核处理器发展,模拟的目标系统规模随着模拟核数以摩尔定律的速度增加而不断增大;另一方面,串行模拟的速度因为模拟器运行所在宿主机主频提速减缓而停滞不前.上述两方面的原因使得传统的串行模拟方式无法满足对新兴体系结构模拟规模和速度的需求.以众核处理器和众核集群这两种体系结构为例,并行模拟技术在并行计算机体系结构模拟中是必要而且可行的.对于众核处理器的模拟,使用并行离散事件模拟对其进行加速,在模拟精度不变的前提下,提高模拟速度10.9倍.对于众核集群的模拟,模拟的目标系统总规模达到1024核,并且支持MPI/Pthreads混合编程的运行环境. 相似文献
8.
HPL测试性能仿真与预测 总被引:1,自引:0,他引:1
HPL是大规模并行系统广泛采用的Linpack测试软件包.在HPL程序算法分析和实践测试的基础上,发掘了理论确定矩阵分块大小NB的规律,突破了长期以来对尝试性实验的依赖.进而将算法复杂性的估算提升到测试程序执行时间的精度,并建立HPL测试仿真模型,对执行时间进行更细致的评估.同时,在大量真实测试验证的基础上,利用该模型对各种系统性能提高因素将带给Linpack测试的益处进行预见,期望为体系的改进方向提供参考. 相似文献
9.
10.
11.
Bin Xie Tianzhou Chen Wei Hu Xingsheng Tang Dazhou Wang 《The Journal of supercomputing》2013,64(3):1021-1037
With the development of the semiconductor technology, more processors can be integrated onto a single chip. Network-on-Chip is an efficient communication solution for many-core system. However, enhancing performance with lower energy consumption is still a challenge. One critical issue is mapping applications to NoC. This work proposed an online mapping method, which optimizes task mapping algorithm to reduce communication energy consumption. The communication status of applications at runtime is analyzed first. Then, the algorithm computes the mapping placement dynamically and implements the real-time mapping online. Experimental results based on simulation show that the algorithm proposed in this article can achieve more than 20% communication energy saving compared with first fit mapping and nearest neighbor mapping. The migration cost caused by the remapping process is also considered, and can be calculated at the runtime to estimate the effect of remapping. 相似文献
12.
UML已经成为面向对象分析与设计建模事实上的标准。基于UML的C3I系统模型的设计结果是对系统的静态表述,而C3I系统的本质是动态的。着色Petri网(CPN)具有强大的描述能力及严密的数学基础和多种分析手段,并且是可以仿真运行的。提出了从UML产品映射到可用于逻辑、行为和性能方面体系结构评价的CPN可执行模型的方法,对光电干扰武器系统C3I进行了UML建模的一般描述,阐述了光电干扰武器系统C3I从UML产品映射到CPN建立可执行模型的过程。通过对可执行模型CPN的仿真可对光电干扰武器系统C3I进行完整准确的评价。 相似文献
13.
Sarzamin Khan Sheraz Anjum Usman Ali Gulzari Farruh Ishmanov Maurizio Palesi Muhammad Khalil Afzal 《Applied Intelligence》2018,48(12):4792-4804
In this paper, we propose an optimized, search based near-optimal mapping heuristic, named as ONMAP for mapping real time embedded application workloads on 2D based on-chip interconnection network platforms. ONMAP exploits NMAP, a well-known and fast nearest neighbor heuristic algorithm by using the modular exact optimization method. The proposed hybrid algorithm minimizes the on-chip inter-processor communication energy consumption and optimizes the interconnection network performance parameters. The algorithm inherits the constructive search based heuristic nature of the NMAP algorithm, as well as the property of exact optimization for mapping embedded applications on the target communication architecture. To verify the efficiency and effectiveness of the algorithm, we have compared the proposed algorithm with NMAP and random mapping algorithm under similar simulation environments and traffic conditions. The mapping results of the exemplary real world applications such as VOPD, PIP, MPEG4, MWD, MMS and WiFi-80211arx indicate that ONMAP algorithm is more efficient than its competitors for most of the performance parameters of the on-chip network designs. The algorithm successfully optimized the energy consumption, up to 20 % and 26% in comparison to NMAP and random algorithms, respectively. Similarly, the cost is optimized up to 10% and 60% as compared to NMAP and random mapping algorithms, respectively. 相似文献
14.
15.
为了提高分子动力学模拟在对称多处理(SMP)集群上的计算速度,在分子动力学并行方法中引入MPI+TBB的混合并行编程模型。基于该模型,在分子动力学软件LAMMPS中设计并实现混合并行算法,在节点间采用MPI及空间分解技术实施进程级并行,节点内采用TBB及临界区技术实施线程级并行。在SMP集群中的测试表明,该方法在体系较大以及节点数较多时可以明显减少通信时间,使加速比在纯MPI模型上提高45%。结果表明,MPI+TBB混合并行编程模型可促进分子动力学并行模拟且效率明显提升。 相似文献
16.
Yomin Hou Chien-Min Wang Chiu-Yu Ku Lih-Hsing Hsu 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(5):514-527
In this paper, we address the problem of minimizing channel contention of linear-complement communication on wormhole-routed hypercubes. Our research reveals that, for traditional routing algorithms, the degree of channel contention of a linear-complement communication can be quite large. To solve this problem, we propose an alternative approach, which applies processor reordering mapping at compile time. In this compiler approach, processors are logically reordered according to the given communication(s) so that the new communication(s) can be efficiently realized on the hypercube network. It is proved that, for any linear-complement communication, there exists a reordering mapping such that the new communication has minimum channel contention. An O(n3) algorithm is proposed to find such a mapping for an n-dimensional hypercube. An algorithm based on dynamic programming is also proposed to find an optimal reordering mapping for a set of linear-complement communications. Several computer simulations have been conducted and the results clearly show the advantage of the proposed approach 相似文献
17.
18.
现代超级计算机具有越来越多的计算结点,同时结点内具有多个处理器核。由于互联带宽的差异,结点间与结点内构成两个通信性能不同的通信层次,后者的通信性能好于前者。但是,目前MPI程序的默认进程映射未考虑该通信层次差异,无法利用结点内较好的通信带宽,严重束缚了超级计算机的性能发挥。针对该问题,本文设计实现了能利用层次通信差异的MPI程序自动进程优化映射工具POM,提供了高效、低开销获取MPI程序通信信息的方法,最终通过优化通信在通信层次上的分布提高了程序的通信效率,从而提高了应用程序的性能。本文解决了硬件平台通信层次的抽象、MPI程序通信信息的低开销获取与映射方案的计算三个问题。首先,按照通信能力差异将超级计算机结构抽象为高速互联的不同计算结点与相同结点上的多个处理器核两层。其次,提出了将集合通信转化成点到点通信的简单实现方法。最后,利用无向加权边图来表示MPI程序的进程间通信关系,将MPI程序的进程映射问题转化为图划分问题。在曙光5000A和曙光4000A上的实验结果表明,利用POM工具能够显著提高MPI程序的性能。 相似文献
19.
半实物仿真技术为通信的发展提供了重要的技术支撑。基于OMNEST仿真软件设计了一个半实物仿真网络模型,分析了模型中仿真网络和物理网络的地址映射,实现了仿真数据流和物理数据流的交互,并对端到端时延和误码率等数据传输性能指标进行统计分析。仿真结果表明,这种网络仿真模型可以将仿真网络和物理网络有效地结合起来。 相似文献