首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 109 毫秒
1.
SIMD结构能有效地开发多媒体和复杂科学计算的并行性,成为产业应用和研究的热点.在大规模SIMD体系结构研究中,为缓解FPGA芯片容量对仿真系统规模的限制,提出了适用于SIMD体系结构的FPGA分页仿真模型,有效降低了SIMD结构对FPGA计算资源和存储资源的需求,提高了SIMD结构的可验证规模.对MASA流处理器的仿真实验结果表明,不采用任何仿真优化技术,FPGA芯片EP2S180可支持的最大仿真规模为8个cluster的MASA,采用分页仿真模型,EP2S180的最大仿真规模可增加至256个cluster的MASA,而且仿真时间的增量是可接受的.  相似文献   

2.
以网格互联型SIMD计算机为例,提出一个数据并行计算的面向对象仿真方法:首先对网格互联型SIMD计算机系统结构进行研究,抽象出其数学模型。然后在数学模型基础上,结合3个辅助表,设计出一个高度兼容的仿真机类,该类可实例化为不同结构参数及指令集的网格互联型SIMD计算机的仿真机对象。这种方法能大大提高计算机仿真软件的开发效率。  相似文献   

3.
针对目前二维SIMD结构编译技术研究的不足,结合二维SIMD结构中普遍采用的复用数据通路和寄存器少的限制和应用程序的特点,提出了一种解决数据向量复用的算法.该算法先使用数据向量的代表元计算各SIMD指令间数据向量的重用信息,再根据这些信息对SIMD指令进行调度.该算法可以有效缓解应用程序在二维SIMD结构执行时加载数据的压力,有效提高结构受限二维SIMD结构的并行性.实验数据显示,该算法对各种应用程序可获得平均2.97的加速比和平均3.86的SIMD指令级并行度.  相似文献   

4.
ATP是现代铁路信号系统中不可缺少的核心部分,对列车的运行安全、提高行车效率起着重要作用。该文论述了一个分布式城市轨道交通ATP仿真测试系统所采用的仿真钟推进策略和集成环境。系统采用了先进的半实物仿真和分布式交互仿真技术,能够为ATP设备提供一个拟实工况,从而测试设备的准确性和可靠性。针对半实物仿真和分布式仿真的要求,在全数字仿真和半实物仿真中分别采用混合机制的事件驱动策略和墙钟驱动策略。由于 CORBA相对于其他分布式仿真集成环境具有功能和价格的相对优势,所以利用CORBA集成各子系统。  相似文献   

5.
随着问题规模的增大和对实时性要求的提高,SIMD向量处理器尤其是带有向量运算单元的处理器在业界得到广泛应用。处理器上程序的运行状态一般由编译器通过堆栈进行管理。已有编译器堆栈设计机制在SIMD体系结构中严重影响了整个应用程序的运行性能。根据SIMD体系结构特点,提出了一种高效分布式堆栈设计方法——HEDSSA。实验结果表明,HEDSSA堆栈使得应用程序在进行局部数据访问、函数调用、发生中断以及动态分配数据时能够以更高的效率访问堆栈数据。  相似文献   

6.
针对船用装备出海试验存在着试验繁琐、费用高、周期长等问题,采用仿真技术研制舰船仿真试验台,可以缩短产品试验周期,节省开发费用;论述了仿真试验台测量系统设计,包括测量系统与仿真试验台总体的关系,角度测量误差分析和系统精度设计,采取粗精两级传感器耦合实现精密测角,采用时统信号同步锁存摇摆角数据,以及计算机数据录取与处理软件设计;实现了大型仿真设备实时高精度检测功能,为舰船设备提供了强有力的实验手段。  相似文献   

7.
为了实现车桥耦合振动精细化仿真研究,利用多体系统动力学软件SIMPACK建立完整的车辆空间模型,采用空间杆系和板壳混合单元有限元方法建立桥梁的动力分析模型;然后将车辆和桥梁两个子系统在轮轨接触面离散的信息点上进行数据交换,实现车桥耦合振动联合仿真分析。以高速铁路上的简支梁桥为研究对象,采用基于多体系统动力学和有限元法结合的联合仿真技术,计算了弹性轮轨接触时动车组列车以不同车速通过桥梁的空间耦合振动响应,证明了该研究方法的可行性。  相似文献   

8.
SIMD技术是用于高速向量和矩阵计算,它的结构主要是由数据缓存系统和对准网络组成,基于SIMD的图像卷积是数字图像处理中的一项主要技术。本文主要对SIMD的图像卷积系统结构和速彩色图像识别方法进行分析,来探讨SIMD技术在数字图像处理中的应用。  相似文献   

9.
本文针对有效载荷数据仿真系统的特点及需求,基于HLA仿真技术提出了面向运控流程的数据仿真系统的联邦结构和软件框架,并在此基础上,明确了各节点功能,阐述了仿真系统的开发过程,分析了系统的运行过程.最后给出了整个仿真系统的工作流程.文章设计的仿真系统具有较好的开放性、可移植性和重用行,能够满足一般的应用需要,为进一步研究有效载荷数据仿真系统提供了良好的仿真支撑环境.  相似文献   

10.
高超声速飞行器的系统仿真是一个气动流场、结构应力场以及结构温度场的高度耦合过程,仿真过程中,需要在耦合界面上完成数据的迭代交换,传递的内容主要包括位移、速度、压力、温度等耦合数据,主要作用就是完成各个耦合场的方程之间参数的传递,在统一的仿真时间内实现仿真的迭代.耦合界面数据传递的技术难点在于要保证耦合界面处的总功守恒以及多物理场耦合仿真的精度[1].目前大多采用以插值为主的局部数据传递方法针对二元耦合开展研究与应用[1,3],局部插值的缺点在于全局精度偏低,偏差较大,计算速度不高.采用径向基函数法(RBF),对界面数据交换方法的仿真应用进行了研究,验证结果表明,计算精度满足仿真要求,求解方程的速度较快,工程应用效果较好.  相似文献   

11.
景晓军  方滨兴 《软件学报》1996,7(7):401-408
SIMC(SIMDC)是通过对C语言进行语法扩展(未进行语义扩展)得到的支持SIMD(singleinstructionmultipledata)并行程序设计的并行语言.SIMC可方便地描述SIMD并行算法,具有SIMD计算机系统结构定义能力,可支持多种系统结构上的并行算法研究.SIMC语言的模拟执行系统已在单机上实现,并作为作者研究开发的SIMD计算机程序设计及性能评价模拟环境的并行程序设计语言,用于SIMD计算机算法及结构的性能评价.  相似文献   

12.
SIMD arrays are likely to become increasingly important as coprocessors in domain specific systems as architects continue to leverage RAM technology in their design. The problem this work addresses is the efficient evaluation of SIMD arrays with respect to complex applications while accounting for operating frequency and chip area. The underlying issues include the size of the architecture space, the lack of portability of the test programs, and the inherent complexity of simulating up to hundreds of thousands of processing elements. The overall method we use is to combine architecture level and Electronic Design Automation (EDA) level modeling by using an EDA-based tool to calibrate architectural simulations. The resulting system retains much of the high throughput of the architecture level simulator but it also has accuracy similar to that of an early pass EDA synthesis and circuit simulation. The particular problem of computational cost of the architectural level simulation is addressed with a novel approach to trace-based simulation (we call it trace compilation), which we find to be one to two orders of magnitude faster than instruction level simulation while still retaining much of the accuracy of the model. Furthermore, traces must be generated for only a small fraction of the possible parameter combinations. Using trace compilation also addresses program portability by allowing the user to code in a single data parallel language with a single compiler, regardless of the target architecture. We have used our system to evaluate thousands of potential SIMD array designs with respect to real applications and present some sample results.  相似文献   

13.
High single instruction multiple data (SIMD) efficiency and low power consumption have made graphic processing units (GPUs) an ideal platform for many complex computational applications. Thousands of threads can be created by programmers and grouped into fixed-size SIMD batches, known as warps. High throughput is then achieved by concurrently executing such warps with minimal control overhead. However, if a branch instruction occurs, which assigns different paths to different threads, one warp will be broken into multiple warps that have to be executed serially, consequently reducing the efficiency advantage of SIMD. In this paper, the contemporary fixed-size warp design is abandoned for a hybrid warp size (HWS) mechanism. Mixed-size warps are generated according to HWS and are scheduled and issued flexibly. The simulation results show that this mechanism yields an average speedup of 1.20 over the baseline architecture for a wide variety of general purpose GPU applications. The paper also integrates HWS with dynamic warp formation (DWF), which is a well-known branch handling mechanism used to improve SIMD utilization by forming new warps out of split warps in real time. The simulation results show that the combination of DWF and HWS generates an average speedup of 1.27 over the DWF-only platform with an estimated area increase of about 1% of DWF.  相似文献   

14.
Hardware parallelism should be exploited to improve the performance of computing systems. Single instruction multiple data (SIMD) architecture has been widely used to maximize the throughput of computing systems by exploiting hardware parallelism. Unfortunately, branch divergence due to branch instructions causes underutilization of computational resources, resulting in performance degradation of SIMD architecture. Graphics processing unit (GPU) is a representative parallel architecture based on SIMD architecture. In recent computing systems, GPUs can process general-purpose applications as well as graphics applications with the help of convenient APIs. However, contrary to graphics applications, general-purpose applications include many branch instructions, resulting in serious performance degradation of GPU due to branch divergence. In this paper, we propose concurrent warp execution (CWE) technique to reduce the performance degradation of GPU in executing general-purpose applications by increasing resource utilization. The proposed CWE enables selecting co-warps to activate more threads in the warp, leading to concurrent execution of combined warps. According to our simulation results, the proposed architecture provides a significant performance improvement (5.85 % over PDOM, 91 % over DWF) with little hardware overhead.  相似文献   

15.
《Parallel Computing》2013,39(10):586-602
Multimedia applications have become increasingly important in daily computing. These applications are composed of heterogeneous regions of code mixed with data-level parallelism (DLP) and instruction-level parallelism (ILP). A standard solution for a multimedia coprocessor resembles of single-instruction multiple-data (SIMD) engines into architectures exploiting ILP at compile time, such as very long instruction word (VLIW) and transport triggered architecture (TTA). However, the ILP regions fail to scale with the increased vector length to achieve high performance in the DLP regions. Furthermore, the register-to-register nature of SIMD instructions causes current SIMD engines to have limitations in handling memory alignment, data reorganization, and control flow. Many supporting instructions such as data permutations, address generations, and loop branches, are required to aid in the execution of the real SIMD computation instructions. To mitigate these problems, we propose optimized SIMD engines that have the capabilities for combining VLIW or TTA processing with a unified scalar and long vector computations as well as efficient SIMD hardware for real computation. Our new architecture is based on TTA and is called multimedia coprocessor (MCP). This architecture includes following features: (1) a simple coprocessor structure with 8-way TTA, (2) cost-effective SIMD hardware capable of performing floating-point operations, (3) long vector capabilities built upon existing SIMD hardware and a single register file and processor data path for both scalar operands and vector elements, and (4) an optimized SIMD architecture that addresses the SIMD limitations. Our experimental evaluations show that MCP can outperform conventional SIMD techniques by an average of 39% and 12% in performance for multimedia kernels and applications, respectively.  相似文献   

16.
二维SIMD结构是指一个由N×N的处理单元按一定的拓扑结构连接组成的阵列结构,其同行/列的处理单元以SIMD方式工作。二维SIMD结构作为多媒体加速部件广泛应用在各种多媒体处理的SOC中,因此其体系结构的设计是获得高性能多媒体计算的重要因素。结合多媒体应用程序的特点,研究分析不同设计参数对二维SIMD结构性能的影响,并设计实现了一个二维SIMD结构的性能模拟器。实验结果显示了二维SIMD结构对多媒体程序有很好的加速比并证实了研究分析结论。  相似文献   

17.
目前多媒体应用已经成为各种运算平台的主要应用类型。随着多媒体应用的多样性和复杂性,共享主存多SIMD结构逐渐成为主从多核结构中多媒体加速部件的首要选择。总结了目前共享主存多SIMD结构的特征,同时深入分析了共享主存多SIMD编译优化的主要问题以及相关编译技术。  相似文献   

18.
Current multimedia extensions provide a mechanism for general-purpose processors to meet the growing performance demand of multimedia applications. However, the computing performance of these extensions is often limited for the design conceptions of the single data stream. This paper presents an architecture called “multi-streaming SIMD” that enables current multimedia extensions to simultaneously manipulate multiple data streams. To efficiently and flexibly realize the proposed architecture, an operation cell is designed by fusing the logic gates and the storage cells together. Multiple operation cells then are connected to compose a register file with the ability of performing SIMD operations called “Multimedia Operation Storage Unit (MOSU)”. Further, many MOSUs are used to compose a multi-streaming SIMD computing engine that can simultaneously manipulate multiple data streams and exploit the subword parallelisms of the elements in each data stream. This paper also designs three instruction modes (global, coupling, and isolated modes) for programmers to dynamically configure the multi-streaming SIMD computing engine at the instruction level to manipulate different amounts of data streams. Simulation results show that when the multi-streaming SIMD architecture has four 4-register MOSUs, it provides a factor of 3.3×–5.5× performance enhancement for traditional MMX extensions on 12 multimedia kernels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号