首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
基于Altera浮点IP核实现浮点矩阵相乘运算时,由于矩阵阶数的增大,造成消耗的器件资源虽增加但系统性能反而下降的问题,针对现有IP核存在数据加载不连贯、存储带宽不均匀的不足,提出采用并行化数据存储、依据查找表加载数据和处理数据的方式对IP核进行改进。然后将改进的浮点矩阵运算在FPGA中实现,经过Quartus、Matlab软件联合仿真并进行结果比对,其误差不超过万分之一,且节省了器件资源、提升了系统性能。仿真结果表明该设计可行,有利于提高诸多高性能领域浮点矩阵的运算速度。  相似文献   

2.
A class of image processing and analysis tasks is well suited to parallel processor implementation using a simple software architecture. Such a system functions with loosely coupled master and multiple slave processes. As an example, implementation of a viable laboratory instrument for chromosome analysis is described. Its processing hardware consists of several conventional MC68000 processors with dual ported memory on a VME-bus.  相似文献   

3.
Bees Algorithm is a population-based method that is a computational bound algorithm whose inspired by the natural behavior of honey bees to finds a near-optimal solution for the search problem. Recently, many parallel swarm based algorithms have been developed for running on GPU (Graphic Processing Unit). Since nowadays developing a parallel Bee Algorithm running on the GPU becomes very important. In this paper, we extend the Bees Algorithm (CUBA (i.e. CUDA based Bees Algorithm)) in order to be run on the CUDA (Compute Unified Device Architecture). CUBA (CUDA based Bees Algorithm). We evaluate the performance of CUBA by conducting some experiments based on numerous famous optimization problems. Results show that CUBA significantly outperforms standard Bees Algorithm in numerous different optimization problems.  相似文献   

4.
In this paper, we describe a massively parallel implementation of the Splitting Equilibration Algorithm using CM FORTRAN on the Thinking Machines CM-2 system. Numerical results using upwards of 32 768 (32 K) processors on the CM-2 system, the Connection Machine, are presented for both input/output and social accounting matrix estimation problems and compared with those obtained for the same problems on the IBM 3090. Our experiences with the relative ease/difficulty of the implementations on these fine-grain and coarse-grain parallel architectures are also presented and discussed.  相似文献   

5.
Logic density increases have made feasible the implementation of multiprocessor systems able to meet the intensive data processing demands of highly concurrent systems. We describe the research and hardware implementation of a high-performance parallel multicompressor chip. A detailed investigation into the performances of alternative input and output routing strategies for realistic data sets demonstrate that the design of parallel compression devices involves important trade offs that affect compression performance, latency, and throughput. The most promising approach is implemented into FPGA hardware and is shown to provide a scalable compression solution at throughputs able to cope with the demands of modern high-bandwidth applications.  相似文献   

6.
在并行计算的作业调度过程中,涉及到调度系统两个方面的内容:调度策略和调度算法。文章讨论了调度策略的设计和调度算法的选择.并通过一个实际的并行处理系统加以说明。它使调度系统更好地满足了并行处理系统作业调度的需要,提高了系统资源的利用率。  相似文献   

7.
现有的p2p流媒体系统中的数据调度机制给网络带来了很大的压力,提出了一种基于分布式文件系统的调度机制,具有就近调度、站内调度客户节点、站间服务节点自调度的特点,有效降低了主干网上的数据调度。基于这种调度机制,本文以组播为例具体介绍了采用分片技术实现流媒体的数据分发。经过测试本系统数据缓冲稳定,节点更新性能较好。  相似文献   

8.
罗丹  周波 《计算机应用》2011,31(2):562-564
面向服务的体系架构(SOA)为遗留系统的再工程提供了解决方案,使得遗留系统可以支持分布式应用环境,但是由于技术的陈旧和架构的局限性,无法支持多线程、并行处理以及内存泄露等问题依旧在部分遗留系统中存在,极大地限制了它们的应用。为了解决这几个问题,通过深入分析研究Windows 通信基础(WCF)的通信机制,提出了一种并行架构,对WCF的基本架构进行了改造,即在默认的体系架构中添加一层服务控制器,用来在客户端和服务端之间传递消息和选择服务,很好地解决了这几个问题,并在某大型金融软件中得到了应用。  相似文献   

9.
针对高质量的轮廓提取算法计算量大、实时性差的问题,提出了一种基于现场可编程门阵列(FPGA)的图像轮廓并行计算系统。通过设计适合的硬件结构及相应的算法改进,采用了多种不同的并行方式加速算法的计算。实现了一种高质量的轮廓提取算法--Pb(Probability Boundary)算法的高速计算。实验结果表明,在FPGA工作频率200 MHz时,被处理图像分辨率为481×321时,该系统处理速度可达39帧/s,为将Pb算法应用于实际系统提供了条件。  相似文献   

10.
把网格和高性能计算结合起来,基于网格高性能计算平台的并行计算系统(GPCS),主要介绍了GPCS的体系结构、功能及其设计、实现等几个主要的问题。该平台以通用网络为基础,以网格平台中间件为桥梁,实现了各种高性能计算资源之间的互联互通、共享和协同工作。  相似文献   

11.
椭圆曲线密码体制的快速实现取决于标量乘算法的运算效率。在传统的(2,3)-双基数标量乘算法的基础上,提出了一种新的(2,5)-双基数标量乘算法。实验数据表明,该算法不仅继承了双基数标量乘算法的优点,同时还改进了传统双基数标量乘算法的不足,如预计算时间长和存储空间要求大等问题,使其应用于存储空间较小的领域如智能卡等成为可能。  相似文献   

12.
A fast parallel architecture for the implementation of elliptic curve scalar multiplication over binary fields is presented. The proposed architecture is implemented on a single-chip FPGA device using parallel strategies that trades area requirements for timing performance. The results achieved show that our proposed design is able to compute GF(2191) elliptic curve scalar multiplication operations in 63 μs.  相似文献   

13.
广域网并行TCP加速系统的研究与实现   总被引:1,自引:0,他引:1  
随着互联网业务的高速发展,广域网在网络响应速度方面已经越来越无法满足用户的需求。鉴于此,提出了一种基于并行TCP的广域网加速方案,采用双网关加速模式,设计实现了一个并行TCP加速网关系统。该系统实现了会话接入控制管理以及会话数据截获、处理、加速传输和流量控制。最后,通过对系统进行了Web网页访问测试和文件传输测试表明,该系统大大提高了广域网的响应速度,表现出很好的广域网加速效果。  相似文献   

14.
李锡武  毛先俊 《计算机工程与设计》2007,28(21):5183-5185,5240
随着超大规模集成电路技术的发展,数字信号处理器的处理能力不断提高.介绍系统软件运行的硬件平台及功能,描述系统硬件的软件接口技术及其实现方法,设计并实现了一个基于VME总线技术.采用Link口互联的DSP 21160N并行软件系统,对系统性能进行测试与分析,结果表明,该并行软件系统能够满足设计要求.最后对系统的研究和实际工作做出总结.  相似文献   

15.
在总结了现有并行数据库实现模型的基础上,基于"半重写变换"模型[1]实现了一个并行数据库系统的原型.通过对数据划分/重划分、并行选择、并行排序、并行连接等关键操作的实验分析,指出了.半重写变换"模型存在的缺陷,并提出了一种混合式的改进模型.从理论上说,在机群架构下实现并行数据库系统,这种混合模型较单一模型更有优势.  相似文献   

16.
粒子模拟是目前化工、材料、生物等领域重要的研究手段之一.随着计算机软硬件的发展和大规模并行集群的出现,可模拟的粒子规模越来越大,模拟对象也越来越复杂.前处理足粒子模拟初始数据的生成环节,它负责将模拟对象转化为粒子系统,并按照模拟算例需求,将粒子数据输出为文件.前处理是连接模拟对象和模拟计算的纽带.是粒子模拟过程中关键的一步.本文提出的设计方案是首先使用BRLCAD建立模拟对象的三维模型,然后将二维模型转换为空间枚举,接着在空间枚举的规则块中填充粒子,同时通过使用元胞法检测粒子之间的冲突来保证粒子的合法性,最后根据粒子的类型和位置计算粒子的物性并将粒子数据输出到文件.本文根据该设计方案结合MPI并行计算技术,实现了大规模粒子模拟并行前处理系统,并进行了一系列的测试证明了该系统的实用性和可靠性.  相似文献   

17.
张小燕  王国欣  严波 《计算机工程与设计》2011,32(10):3321-3324,3328
为模拟不同激励下的网络行为,评估网络协议对网络结构发生变化时的反应,在研究了路由器及路由选择协议仿真技术的基础上,结合高性能计算机模拟体系结构HSA,提出并实现了一种并行网络环境仿真系统。通过对通用路由器节点、插件式路由协议、可替换流量节点以及网络拓扑生成部分的具体描述,对该系统作了详尽的探讨。最后通过一个环形拓扑,验证了该并行网络仿真系统与基于单机实现的网络仿真软件相比具有显著的优势。  相似文献   

18.
19.
Multiplication of polynomials of large degrees is the predominant operation in lattice-based cryptosystems in terms of execution time. This motivates the study of its fast and efficient implementations in hardware. Also, applications such as those using homomorphic encryption need to operate with polynomials of different parameter sets. This calls for design of configurable hardware architectures that can support multiplication of polynomials of various degrees and coefficient sizes.In this work, we present the design and an FPGA implementation of a run-time configurable and highly parallelized NTT-based polynomial multiplication architecture, which proves to be effective as an accelerator for lattice-based cryptosystems. The proposed polynomial multiplier can also be used to perform Number Theoretic Transform (NTT) and Inverse NTT (INTT) operations. It supports 6 different parameter sets, which are used in lattice-based homomorphic encryption and/or post-quantum cryptosystems. We also present a hardware/software co-design framework, which provides high-speed communication between the CPU and the FPGA connected by PCIe standard interface provided by the RIFFA driver [1]. For proof of concept, the proposed polynomial multiplier is deployed in this framework to accelerate the decryption operation of Brakerski/Fan-Vercauteren (BFV) homomorphic encryption scheme implemented in Simple Encrypted Arithmetic Library (SEAL), by the Cryptography Research Group at Microsoft Research [2]. In the proposed framework, polynomial multiplication operation in the decryption of the BFV scheme is offloaded to the accelerator in the FPGA via PCIe bus while the rest of operations in the decryption are executed in software running on an off-the-shelf desktop computer. The hardware part of the proposed framework targets Xilinx Virtex-7 FPGA device and the proposed framework achieves the speedup of almost 7 ×  in latency for the offloaded operations compared to their pure software implementations, excluding I/O overhead.  相似文献   

20.
A parallel implementation of an algorithm for solving the one-dimensional, Fourier transformed Vlasov-Poisson system of equations is documented, together with the code structure, file formats and settings to run the code. The properties of the Fourier transformed Vlasov-Poisson system is discussed in connection with the numerical solution of the system. The Fourier method in velocity space is used to treat numerical problems arising due the filamentation of the solution in velocity space. Outflow boundary conditions in the Fourier transformed velocity space removes the highest oscillations in velocity space. A fourth-order compact Padé scheme is used to calculate derivatives in the Fourier transformed velocity space, and spatial derivatives are calculated with a pseudo-spectral method. The parallel algorithms used are described in more detail, in particular the parallel solver of the tri-diagonal systems occurring in the Padé scheme.

Program summary

Title of program:vlasovCatalogue identifier:ADVQProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVQProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandOperating system under which the program has been tested: Sun Solaris; HP-UX; Read Hat LinuxProgramming language used: FORTRAN 90 with Message Passing Interface (MPI)Computers: Sun Ultra Sparc; HP 9000/785; HP IPF (Itanium Processor Family) ia64 Cluster; PCs clusterNumber of lines in distributed program, including test data, etc.:3737Number of bytes in distributed program, including test data, etc.:18 772Distribution format: tar.gzNature of physical problem: Kinetic simulations of collisionless electron-ion plasmas.Method of solution: A Fourier method in velocity space, a pseudo-spectral method in space and a fourth-order Runge-Kutta scheme in time.Memory required to execute with typical data: Uses typically of the order 105-106 double precision numbers.Restriction on the complexity of the problem: The program uses periodic boundary conditions in space.Typical running time: Depends strongly on the problem size, typically few hours if only electron dynamics is considered and longer if both ion and electron dynamics is important.Unusual features of the program: No  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号