首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
并行计算为时域有限差分(FDTD)方法仿真电大尺寸和复杂结构提供了强大的计算能力和内存资源。文章针对多核PC集群系统,提出了一种高性能并行FDTD算法,它采用Windows Socket(WinSock)实现高效的进程间通信,同时采用多线程技术充分利用多核处理器资源。在集群系统上的实际测试表明:以10个处理器(30个核)为例,该算法获得的加速比为16.0,并行效率为53.3%,优于单独使用消息传递接口(MPI)以及MPI结合OpenMP的传统FDTD并行算法,后两者在相同测试条件下仅分别获得13.7,12.2的加速比和45.8%,40.7%的并行效率。  相似文献   

2.
着重讨论半空间FDTD并行计算方法,入射源的特殊处理方式,和如何调节负载不平衡三个问题,给出一种简单易行的地下目标雷达散射截面的并行时域有限差分(FDTD)方案,并在北京理工大学电磁仿真中心刘徽并行计算平台上,做了具体的程序实现和数值实验,实验不仅证实了算法的精度,而且还表明并行方案的高效,即当参与计算的处理器数量达到14个时,并行效率仍然可以保持在80%以上.在此基础上,用开发的程序计算了以往串行算法无法计算的电大地下典型目标的雷达散射截面.  相似文献   

3.
在光栅显示器原理样机研制阶段,我们采用了IBMPC/XT个人计算机作为GS(光栅)机的调试控制台.它方便用户可以直接在PC/XT上使用高级语言编制应用画面程序,可节省大量的软件开发周期.高级语言为了同IBM个人计算机的DOS或BIOS通讯,调用宏汇编是非常有用的.本文所述的宏汇编接口程序,为用FOR TRAN语言编制的GS画面程序与GS系统程序的接口,  相似文献   

4.
王文山 《电子技术》1992,19(7):26-27
一、引言单片机已广泛地应用于工业自动化过程的实时控制和数据处理。但是,由于单片机的计算功能有限,难以进行复杂的数据处理。所以,对复杂的数据处理常借助于IBM PC机来完成,而单片机主要完成一些较直接的数据处理和控制功能。为了保证控制和数据处理的实时性,单片机与IBM PC机的快速数据通信就成为一个首先必须解决的问题。本文就是针对这一情况,介绍了一种单片机与IBM PC机的DMA数据传输方法。该方法利用IBM PC主机板上已有的DMA资源,具有接口电路简单、编程容易、数据传输快和可靠性高等优点。  相似文献   

5.
针对大规模数据流需要巨量存储空间,以及串行处理速度瓶颈,着力于解决处理后的精确度及数据压缩.在并行平台下,利用阈值滑动窗口技术将数据流分段送入各处理器,并使用基因表达式编程(Gene Expression Programming,GEP)的函数发现算法实现对数据模型的函数挖掘,提出了基于基因表达式编程的多数据流压缩并行函数替代算法PFR-GEP(Parallel Function Replace-GEP).在PC机群上的实验结果表明:该算法有效提高了压缩比例与运算速度,且具有线性加速比.  相似文献   

6.
基于SMP机群并行编程环境下,提出了一种MPI Pthread的程序并行化模式.使用这种并行模式实现了对生物DNA序列拼接程序Phrap的并行化.具体分析了Phrap程序的实现流程,对每一流程针对性的提出并行化方案.并且在曙光3000高性能计算机上得到了实现,获得了较好的性能.  相似文献   

7.
为了加速并行时域有限差分仿真,提出了基于单指令多数据流式扩展(SSE)的一种新的加速方法,在Intel T2300的PC机上实现了对并行时域有限差分仿真的加速,给出了基于消息传递接口(MPI)、OpenMP和SSE指令集的三级数据并行算法。为了验证该算法的加速效率,计算了自由空间中电磁波的辐射问题,得到的加速比为2.62。实验结果表明:这种加速技术无需任何额外的硬件投资就能在很大程度上提高计算效率。  相似文献   

8.
厂商信息     
IBM发布两种新型PC机 IBM日本公司最近推上市两种新型PC机:PS/V 2405DME和PS/V Vision 2408DMA。这两种机型是IBM日本公司和日本Fujitsu(富士通)公司联合开发的,是两公司签订的在多媒体领域合作协议的一部分,可使用为富士通的FM Towns CD-ROM PC机开发的应用软件,都配置有FM Towns应用卡和CPU卡。这两种机型  相似文献   

9.
引言由于IBM生产的PC机越来越普及,价格越来越低,Intel公司停止了生产专门的微机开发系统,而推出在IBM PC机上执行的各种微机开发软件。拥有IBM PC机的用户只需化购买微机开发系统几十分之一的费用购买开发软件,就能取得相同的微机开发功能。本文介绍在IBM PC机上开发86系列产品的全过程,及在DOS 3.0版以上版本支持下执行AEDIT,ASM86,E8087等开发软件的特点和应用。本文还着重介绍自行开发成功的“加装调试软件”。该软件利用并行或串行方式把PC机和86系列样机联接在一起,直接把开  相似文献   

10.
DSP与PC机通讯   总被引:6,自引:0,他引:6  
数字信号处理器(DSP)与PC机构成主—从式高速数字信号处理系统是一种常见的DSP应用系统,它可以有效地利用PC机丰富的软硬件资源。由于数字信号处理的数据量很大,因此主—从式系统中DSP与PC机之间的通讯速度直接影响系统的执行效率。本文以TMS320C25为例,在对比了几种常用的DSP与PC机并行通讯方法的基础上,提出一种用DMA进行DSP与PC机并行通讯的系统模式,文中对系统的软硬件设计给予较详细地论述。利用DMA传输数据可以显著地提高主—从式DSP应用系统实时软件的性能。  相似文献   

11.
Rapid developments in high-performance supercomputers, with upward of 65,536 processors and 32 terabytes of memory, have dramatically changed the landscape in computational electromagnetics. The IBM BlueGene/L supercomputer are examples. They have recently made it possible to solve extremely large problems efficiently. For instance, they have reduced 52 days of simulation on a single Pentium 4 processor to only about 10 minutes on 4000 processors in a BlueGene/L supercomputer. In this article, we investigate the performance of a parallel Finite-Difference Time-Domain (FDTD) code on a large BlueGene/L system. We show that the efficiency of the code is excellent, and can reach up to 90%. The code has been used to simulate a number of electrically large problems, including a 100 * 100 patch antenna array, a 144-element dual- polarized Vivaldi array, a 40-element helical antenna array, and an electronic packaging problem. The results presented serve to demonstrate the efficiency of the parallelization of the code on the BlueGene/L system. In addition, we also introduce the development of the high-performance Beowulf clusters for simulation of electrically large problems.  相似文献   

12.
Parallel computing is rapidly entering mainstream computing, and multicore processors can now be found in the heart of supercomputers, desktop computers, and laptops. Consequently, applications will increasingly need to be parallelized to fully exploit the multicore processor throughput gains that are becoming available. Unfortunately, writing parallel code is more complex than writing serial code. An introductory parallel computing course aims to introduce students to this technology shift and to explain that parallelism calls for a different way of thinking and new programming skills. The course covers theoretical topics and offers practical experience in writing parallel algorithms on state-of-the-art parallel computers, parallel programming environments, and tools.  相似文献   

13.
We present a new, parallel version of the numerical electromagnetics code (NEC). The parallelization is based on a bidimensional block-cyclic distribution of matrices on a rectangular processor grid, assuring a theoretically optimal load balance among the processors. The code is portable to any platform supporting message passing parallel environments such as message passing interface and parallel virtual machine, where it could even be executed on heterogeneous clusters of computers running on different operating systems. The developed parallel NEC was successfully implemented on two parallel supercomputers featuring different architectures to test portability. Large structures containing up to 24000 segments, which exceeds currently available computer resources were successfully executed and timing and memory results are presented. The code is applied to analyze the penetration of electromagnetic fields inside a vehicle. The computed results are validated using other numerical methods and experimental data obtained using a simplified model of a vehicle (consisting essentially of the body shell) illuminated by an electromagnetic pulse (EMP) simulator.  相似文献   

14.
A domain decomposition technique together with an implicit finite-difference scheme is used to design a parallel algorithm to solve for electromagnetic scattering in the time-domain by an infinite square metallic cylinder. The implicit difference scheme yields second-order accuracy, unconditional stability, and at each time step, a large system of linear equations. The domain decomposition technique reduces the solution of this large system to that of many independent smaller subsystems. The present algorithm can be easily implemented on coarse-grain parallel vector supercomputers to obtain a speedup close to the number of available central processing units (CPUs)  相似文献   

15.
This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. We cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, we assess the potentials of optical and neural technologies for developing future supercomputers.  相似文献   

16.
The supercomputers of the 1980's have already impacted large-scale computation. This paper discusses the status and anticipated impact of supercomputers on finite-element analysis which is the primary tool for structural analysis and is also very useful in other areas of engineering analysis. The initial impact has been the significant reduction in turnaround time for large problems and the corresponding opportunity to solve heretofore unsolvable problems. In these cases, emphasis has been placed on employing already-proven computing software which was modifed to take advantage of vector processing and other forms of parallel operations. This trend is expected to continue because the established usage base of commercially available programs is not likely to be quickly dislodged. The near term will see the further use of design optimization, broader use of nonlinear mechanics, and a closer link between designers and analysts because of improved computer turnaround. The economy of scale suggests that solution techniques will be performed not only faster but cheaper than is possible with scalar processors which will further encourage the analysis of larger, more complex structures. The supercomputers of the future are expected to offer additional challenges to today's application systems. A primary factor in this will be the effective use of multiprocessors. Additional influence is expected as Artificial Intelligence matures to the point where Expert Systems become a reality for selected engineering and scientific disciplines. In order to effectively compete, today's software companies must address the possibility of significant changes in the architecture and methodology currently embodied in today's systems. Improved packaging, most likely in the form of pre- and postprocessors, will be necessary to provide industry- or technology-specific systems solutions.  相似文献   

17.
Efficient Utilization of SIMD Extensions   总被引:1,自引:0,他引:1  
This paper targets automatic performance tuning of numerical kernels in the presence of multilayered memory hierarchies and single-instruction, multiple-data (SIMD) parallelism. The studied SIMD instruction set extensions include Intel's SSE family, AMD's 3DNow!, Motorola's AltiVec, and IBM's BlueGene/L SIMD instructions. FFTW, ATLAS, and SPIRAL demonstrate that near-optimal performance of numerical kernels across a variety of modern computers featuring deep memory hierarchies can be achieved only by means of automatic performance tuning. These software packages generate and optimize ANSI C code and feed it into the target machine's general-purpose C compiler to maintain portability. The scalar C code produced by performance tuning systems poses a severe challenge for vectorizing compilers. The particular code structure hampers automatic vectorization and, thus, inhibits satisfactory performance on processors featuring short vector extensions. This paper describes special-purpose compiler technology that supports automatic performance tuning on machines with vector instructions. The work described includes: 1) symbolic vectorization of digital signal processing transforms; 2) straight-line code vectorization for numerical kernels; and 3) compiler back ends for straight-line code with vector instructions. Methods from all three areas were combined with FFTW, SPIRAL, and ATLAS to optimize both for memory hierarchy and vector instructions. Experiments show that the presented methods lead to substantial speedups (up to 1.8 for two-way and 3.3 for four-way vector extensions) over the best scalar C codes generated by the original systems as well as roughly matching the performance of hand-tuned vendor libraries.  相似文献   

18.
In many scientific and signal processing applications, there are increasing demands for large-volume and/or high-speed computations which call for not only high-speed computing hardware, but also for novel approaches in computer architecture and software techniques in future supercomputers. Tremendous progress has been made on several promising parallel architectures for scientific computations, including a variety of digital filters, fast Fourier transform (FFT) processors, data-flow processors, systolic arrays, and wavefront arrays. This paper describes these computing networks in terms of signal-flow graphs (SFG) or data-flow graphs (DFG), and proposes a methodology of converting SFG computing networks into synchronous systolic arrays or data-driven wavefront arrays. Both one- and two-dimensional arrays are discussed theoretically, as well as with illustrative examples. A wavefront-oriented programming language, which describes the (parallel) data flow in systolic/wavefront-type arrays, is presented. The structural property of parallel recursive algorithms points to the feasibility of a Hierarchical Iterative Flow-Graph Design (HIFD) of VLSI Array Processors. The proposed array processor architectures, we believe, will have significant impact on the development of future supercomputers.  相似文献   

19.
An algorithm has been developed that maps pseudo-random number codes onto a set of sample times and stores the results in a bit-wise parallel format. It allows bit-wise parallel software correlation of code-division multiple-access signals to be carried out using long pseudo-random number codes. Bit-wise parallel algorithms speed up software correlation by operating in parallel on multiple samples. The new algorithm uses table look-ups to over-sample a pseudo-random number code. It reduces the PRN code memory requirements of a GPS software receiver by a factor of 3.8 or more, maintains operational efficiency, and keeps code distortion low.  相似文献   

20.
Automatic generation of prime length FFT programs   总被引:2,自引:0,他引:2  
Describes a set of programs for circular convolution and prime length fast Fourier transforms (FFTs) that are relatively short, possess great structure, share many computational procedures, and cover a large variety of lengths. The programs make clear the structure of the algorithms and clearly enumerate independent computational branches that can be performed in parallel. Moreover, each of these independent operations is made up of a sequence of suboperations that can be implemented as vector/parallel operations. This is in contrast with previously existing programs for prime length FFTs: They consist of straight line code, no code is shared between them, and they cannot be easily adapted for vector/parallel implementations. The authors have also developed a program that automatically generates these programs for prime length FTTs. This code-generating program requires information only about a set of modules for computing cyclotomic convolutions  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号