首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
The complexity of software is ever increasing, and it requires more and more computational resources for its execution. A way to satisfy these requirements is the use of vector instructions that can operate with fixed-length vectors of data of the same. A method for representing vector instructions of one processor architecture in terms of the vector instructions of another architecture during the dynamic binary translation is proposed. An implementation of this method that includes the translation of vector addition and memory access increased the performance of the QEMU emulator by a factor greater than three on an artificial example and 12% on a real-life application.  相似文献   

《Computers & Structures》1986,24(4):625-635
Linear and nonlinear finite element software development considerations for vector processors are presented. Areas of discussion include performance measurement, data management, element level calculations and nonlinear problem solution. An example problem which demonstrates software performance is also presented.Incorporation of the methods presented in this paper can lead to finite element software which requires approximately one tenth the CPU time and as little as one-hundredth the I/O effort of conventional software.  相似文献   

The architecture of a biologically motivated visual-information processor that can perform a variety of tasks associated with the early stages of machine vision is described. The computational operations performed by the processor emulate the spatiotemporal information-processing capabilities of certain neural-activity fields found along the human visual pathway. The state-space model of the neurovision processor is a two-dimensional nural network of densely interconnected nonlinear processing elements PE's. An individual PE represents the dynamic activity exhibited by a spatially localized population of excitatory and inhibitory nerve cells. Each PE may receive inputs from an external signal space as well as from the neighboring PE's within the network. The information embedded within the signal space is extracted by the feedforward subnet. The feedback subnet of the neurovision processor generates useful steady-state and temporal-response characteristics that can be used for spatiotemporal filtering, short-term visual memory, spatiotemporal stabilization, competitive feedback interaction, and content-addressable memory. To illustrate the versatility of the multitask processor design for machine-vision applications, a computer simulation of a simplified vision system for filtering, storing, and classifying noisy gray-level images in presented.  相似文献   

The preconditioned conjugate gradient method is well established for solving linear systems of equations that arise from the discretization of partial differential equations. Point and block Jacobi preconditioning are both common preconditioning techniques. Although it is reasonable to expect that block Jacobi preconditioning is more effective, block preconditioning requires the solution of triangular systems of equations that are difficult to vectorize. We present an implementation of block Jacobi for vector computers, especially for the Cray Y-MP/264, and discuss several techniques to improve vectorization. We present these in a progression to show the effect on performance. For the model problem, resulting from a self-adjoint operator, the final implementation of one block Jacobi step uses almost the same amount of time as one point Jacobi step on the Cray Y-MP/264 despite the solution of triangular systems.  相似文献   

本文提出了一种用于32位浮点DSP处理器的改进型DMA结构.采用两级数据流水线结构,外设与内部存储器的数据传输速率比原来提高了一倍.使用verilog HDL语言对其进行编码和仿真,仿真结果表明工作频率达到250MHz以上,满足设计要求.  相似文献   

多态并行处理器中的线程管理器设计   总被引:2,自引:2,他引:2  
基于多态并行处理器提出了一种硬件线程管理器,支持MIMD模式8个线程管理操作和SIMD模式SC控制器统一管理两种工作模式,实现了线程级并行计算;可以监测各个线程的工作情况以及近邻通信寄存器和路由器的状态;能够在通信时停止、切换、启动线程,记录每个线程的工作状态,同时避免了因数据阻塞带来的等待问题,能够最大程度地提高单个处理器的执行效率。  相似文献   

介绍了DC/DC开关稳压电源系统的设计,电源的拓扑采用全桥电路图拓扑、倍流同步整流方式。设计了一款为工业处理器供电的板载电源产品,进行了功率器件的选型并对影响电源效率的主要功率损失进行了分析,完成此款电源产品的PCB设计。最终的分析结果显示,此款电源产品的电性能参数符合客户的预期效果,并成功应用在工业处理器供电设备上。  相似文献   

A new type of high performance array processor system is presented in this paper.Unlikethe conventional host-peripheral array processor systems,this system is designed with afunctionally distributed approach.The design philosophy is described first.Then the hardwareorganizations of two concrete systems,namely:150-AP and GF-10/12,including thecommunication between processors are shown.Some attractive system performances for usersprograms are also given.  相似文献   

The raw performance of vector processors such as the CDC CYBER-205 has been well documented. The ability to apply this raw power to ever more complex algebraic algorithms has been reported in [9]. The final step in making computers of this class truly the revolutionary tools they are claimed to be is to develop whole applications that perform at a significant fraction of the raw power. This involves two distinct subclasses of problems. On the one hand, there are those pre-existing applications that must be mapped onto vector processors in such a way that not only is performance maintained, but also a (sometimes vague) set of computational boundary conditions of the user community is satisfied. On the other hand, there are those models which are developed ab initio with machines such as the CYBER-205 in mind. The development of solutions to problems in the former class involves psychology and politics as well as mathematics and computer science. We limit ourselves here to reporting on an example of the latter class, viz. a model to study a particular fluid-dynamic phenomenon, that was specifically designed with the CYBER-205 in mind.  相似文献   

Hardware and software codesign and flexibility requirements often necessitate embedded application-specific instruction-set processors in system-on-chip designs. Spaceman, a reusable stack-processor virtual component, offers a customer-configurable instruction set; parameterizable bus widths, stack depths, and stack access ranges; and selectable bus interfaces  相似文献   

多核同时多线程处理器(SMT_PAAG)是用于图形、图像及数字信号处理的一种多核处理器。基于这种处理器提出了一种硬件线程调度器,该调度器采用同时多线程技术,最多可同时执行四个线程,支持八个线程阻塞模式下的快速上下文切换。这样避免了因阻塞带来的等待问题,能够有效提高处理器的工作效率和资源利用率。通过在处理器上运行图形处理算法进行性能评测。结果表明,SMT-PAAG处理器通过挖掘指令级并行和线程级并行,将处理器的性能提高了69.25%。  相似文献   

《Computer Networks》2003,41(5):623-640
We present a design methodology for a modular network processor architecture that leads to a balanced, service-defined mix between programmable processor cores, configurable hardware assists, and specialized coprocessors. Whereas the processor cores address the flexibility and extendibility needs of the networking market, the hardware components offload the processors, or even allow them to be bypassed for certain network processor-typical tasks to optimize chip area, performance, and power efficiency. We describe the rationale behind the selected functional partitioning in hardware and software components and discuss the challenges of designing the hardware components, and of organizing and integrating the programmable cores. We quantify our approach with a performance evaluation of the overall system.  相似文献   

针对现有FFT算法结构复杂、难以并行扩展的问题,提出了一种改进的FFT算法,在此基础上设计了一种基于浮点运算的FFT处理器,并进行了仿真验证。结果表明,新算法大大简化了系统结构,减少了系统的硬件开销,非常容易并行实现,且显著提高了运算效率,完成一次N点的FFT运算只需要N/2个时钟,完全满足实时信号处理的要求。  相似文献   

一种脉冲多普勒雷达数字信号处理机的设计   总被引:1,自引:0,他引:1  
针对某型脉冲对多普勒雷达的信号处理要求,设计了一种全数字化信号处理机。该信号处理机采用"ADC+FPGA+DSP+存储器"结构,具有体积小、重量轻、功耗低、可靠性高等优点。重点讨论了信号处理中数据采集、脉冲积累及目标检测的方法和实现。  相似文献   

Computations involving symmetric, positive definite and band matrices are kernel operations in the numerical treatment of many models arising in science and engineering. It is desirable to achieve a high level of performance when such operations are to be carried out on a vector processor. If the operations are performed by rows or columns (as in the EXTENDED BLAS subroutines), then the loops are vectorized but the speed of computations, measured in Mflops, is not very high, because the arrays involved are normally short. Therefore the computations should be organized by diagonals. Furthermore, some special devices are to be applied in order to unrol the loops. Finally, one should be careful with the storage scheme. It is demonstrated that if (i) the computations are organized by diagonals, (ii) the main loops are unrolled and (iii) the storage scheme is such that the work with some zero-elements is avoided, then the speed of computations is nearly the same as that obtained in the computations with dense matrices. If a particular vector machine is in use (in our case a CRAY X-MP computer), then the speed can be increased further by (iv) coding some basic operations in machine language and (v) using the different processors of the vector computer in parallel. The efficiency of the exploitation of the special features of the particular computer that is to be used is also illustrated by numerical examples.

Kernel subroutines performing matrix-vector multiplications are described. Representative tests are used to demonstrate the efficiency of these kernels.  相似文献   

多态并行处理器的数据通信和路由器的设计   总被引:2,自引:1,他引:2  
随着多核技术的发展,核间通信问题面临新的挑战,核间通信性能决定了整个多核处理器的性能。通过分析多核处理器的数据通信需求,提出了一种适用于多态并行处理器的数据通信结构。该结构采用邻接共享寄存器实现的核间近邻通信和路由器硬件加速结构实现的远程通信两种数据通信方式,远程通信机制的路由器使用输入缓存机制实现,采用经典的确定性路由算法——XY路由算法实现了路由计算,加入多播和容错技术,采用专用的仲裁机制简化了设计复杂度。这些改进降低了处理器的核间通信延迟和功耗,提高了多态并行处理器的性能。  相似文献   

魏敏  刘以安  吴鸿雁 《计算机应用》2015,35(5):1290-1295
针对企业生产过程中存在大量原始数据需要实时处理的问题,设计并实现了一个基于自定义架构的局部处理机.在设计之初以Hadoop的并行架构为参考,对MapReduce的工作原理和缓存方式进行了分析,在此基础上根据实际生产环境设计了一个"多类线程协同处理"的程序架构,并辅以两类自定义的数据缓存方式,保证了分布式系统中的局部处理机在接收、计算、上传各环节的并发性和正确性.该系统投入实际生产并连续使用一年有余,实现了将企业多个车间生成的原始数据进行实时处理的预期目标,具有很好的稳定性、有效性和可扩展性.实际应用结果表明,自定义的程序架构和有效的缓存方式能实现大量数据的同步处理及分析.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号