首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
一种可重构计算系统设计与实现   总被引:3,自引:1,他引:3  
可重构计算系统是一种新的实现计算系统的方法,它补充了原有通用处理器和专用硬件计算系统的不足,既具有在制造后的可编程性,又能提供较高的计算性能和计算密度。在简单介绍可重构计算系统体系结构的基础上,通过一个嵌入式实时控制系统实例,给出了可重构计算系统的一种实现方法。  相似文献   

2.
Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grained reconfigurable array, and proposes a speculative execution mechanism for dynamic loop scheduling with the goal of one iteration per cycle and implementation techniques to support decoupling synchronization between the token generator and the collector. This paper also introduces the techniques of exploiting both data dependences of intra- and inter-iteration, with the help of two instructions for special data reuses in the loop-carried dependences. The experimental results show that the number of memory accesses reaches on average 3% of an RISC processor simulator with no memory optimization. In a practical image matching application, LEAP architecture achieves about 34 times of speedup in execution cycles, compared with general-purpose processors. Supported by the National Natural Science Foundation of China (Grant No. 60633050, 60621003) and the National High Technology Research and Development Program of China (Grant No. 2007AA01Z06)  相似文献   

3.
MorphoSys reconfigurable hardware for?cryptography:?the?twofish?case   总被引:1,自引:0,他引:1  
This paper presents the mapping and performance analysis of the Twofish algorithm on MorphoSys. MorphoSys is a reconfigurable architecture that can provide high performance compared to custom hardware and yet preserves a level of flexibility compared to general-purpose processors. With today’s high demand for secure data transfer mediums including wired and wireless networks, there is a growing demand for real-time implementation of cryptographic algorithms. The choice of the Twofish algorithm, one of the five AES finalists, is because it is computationally intensive algorithm. It requires lookup tables, logical and arithmetic computations that stipulate high flexibility and performance. So it is a perfect algorithm to be mapped in order to evaluate such hardware.  相似文献   

4.
The abundant hardware resources on current reconfigurable computing systems provide new opportunities for high-performance parallel implementations of scientific computations. In this paper, we study designs for floating-point matrix multiplication, a fundamental kernel in a number of scientific applications, on reconfigurable computing systems. We first analyze design trade-offs in implementing this kernel. These trade-offs are caused by the inherent parallelism of matrix multiplication and the resource constraints, including the number of configurable slices, the size of on-chip memory, and the available memory bandwidth. We propose three parameterized algorithms which can be tuned according to the problem size and the available hardware resources. Our algorithms employ linear array architecture with simple control logic. This architecture effectively utilizes the available resources and reduces routing complexity. The processing elements (PEs) used in our algorithms are modular so that it is easy to embed floating-point units into them. Experimental results on a Xilinx Virtex-ll Pro XC2VP100 show that our algorithms achieve good scalability and high sustained GFLOPS performance. We also implement our algorithms on Cray XD1. XD1 is a high-end reconfigurable computing system that employs both general-purpose processors and reconfigurable devices. Our algorithms achieve a sustained performance of 2.06 GFLOPS on a single node of XD1  相似文献   

5.
按照可重配置处理器的体系结构建立并实现功耗模型;模型对处理器的电路级特性进行抽象,基于体系结构级属性和工艺参数进行静态峰值功耗估算,基于性能模拟器进行动态功耗统计,并实现三种条件时钟下的门控技术;可重配置处理器与超标量通用微处理器相比,在性能方面获得的平均加速比为3.59,而在功耗方面的平均增长率仅为1.48;通过实验还说明采用简单的CC1门控技术能有效地降低可重配置系统的功耗和硬件复杂度;该模型为可重配置处理器低功耗设计和编译器级低功耗优化研究奠定了基础。  相似文献   

6.
This work aims to pave the way for an efficient open system architecture applied to embedded electronic applications to manage the processing of computationally complex algorithms at real-time and low-cost. The target is to define a standard architecture able to enhance the performance-cost trade-off delivered by other alternatives nowadays in the market like general-purpose multi-core processors. Our approach, sustained by hardware/software (HW/SW) co-design and run-time reconfigurable computing, is synthesizable in SRAM-based programmable logic. As proof-of-concept, a run-time partially reconfigurable field-programmable gate array (FPGA) is addressed to carry out a specific application of high-demanding computational power such as an automatic fingerprint authentication system (AFAS). Biometric personal recognition is a good example of compute-intensive algorithm composed of a series of image processing tasks executed in a sequential order. In our pioneer conception, these tasks are partitioned and synthesized first in a series of coprocessors that are then instantiated and executed multiplexed in time on a partially reconfigurable region of the FPGA. The implementation benchmark of the AFAS either as a pure software approach on a PC platform under a dual-core processor (Intel Core 2 Duo T5600 at 1.83 GHz) or as a reconfigurable FPGA co-design (identical algorithm partitioned in HW/SW tasks operating at 50 or 100 MHz on the second smallest device of the Xilinx Virtex-4 LX family) highlights a speed-up of one order of magnitude in favor of the FPGA alternative. These results let point out biometric recognition as a sensible killer application for run-time reconfigurable computing, mainly in terms of efficiently balancing computational power, functional flexibility and cost. Such features, reached through partial reconfiguration, are easily portable today to a broad range of embedded applications with identical system architecture.  相似文献   

7.
Custom-instruction selection is an essential phase in instruction set extension for reconfigurable processors. It determines the most profitable custom-instruction candidates for implementing in the reconfigurable fabric of a reconfigurable processor. In this paper, a practical computing model is proposed for the custom-instruction selection problem that takes into account the area constraint of the reconfigurable fabric. Based on the new computing model, two heuristic algorithms and an exact algorithm are proposed. The first heuristic algorithm, denoted as HEA, dynamically assigns priorities to the custom instruction candidates and incorporates efficient strategies to select custom instructions with the highest priority. The second heuristic algorithm, denoted as TSA, employs an efficient tabu search algorithm to refine the results of HEA to near-optimal ones. Also, a branch-and-bound algorithm (BnB) is proposed to produce exact solutions for relatively small-sized problems or problems with stringent area-constraints. Experimental results show that HEA can produce more specific approximate solutions with a difference of only about 3% when compared to the optimal solutions produced by BnB. This difference is further reduced to about 0.6% by TSA. In addition, for large-sized problems where the exact algorithm becomes prohibitive, HEA and TSA can still produce solutions within reasonable time.  相似文献   

8.
With advances in reconfigurable hardware, especially field-programmable gate arrays (FPGAs), it has become possible to use reconfigurable hardware to accelerate complex applications such as those in scientific computing. There has been a resulting development of reconfigurable computers, that is, computers that have both general-purpose processors and reconfigurable hardware, as well as memory and high-performance interconnection networks. In this paper, we describe the acceleration of molecular dynamics simulations with reconfigurable computers. We evaluate several design alternatives for the implementation of the application on a reconfigurable computer. We show that a single node accelerated with reconfigurable hardware, utilizing fine-grained parallelism in the reconfigurable hardware design, is able to achieve a speedup of about two times over the corresponding software-only simulation. We then parallelize the application and study the effect of acceleration on performance and scalability. Specifically, we study strong scaling, in which the problem size is fixed. We find that the unaccelerated version actually scales better, because it spends more time in computation than the accelerated version does. However, we also find that a cluster of P accelerated nodes gives better performance than a cluster of 2P unaccelerated nodes.  相似文献   

9.
《Computer》1998,31(11):24-32
In the past few years, two important trends have evolved that could change the shape of computing: multimedia applications and portable electronics. Together, these trends will lead to a personal mobile-computing environment, a small device carried all the time that incorporates the functions of the pager, cellular phone, laptop computer, PDA, digital camera, and video game. The microprocessor needed for these devices is actually a merged general-purpose processor and digital-signal processor, with the power budget of the latter. Yet for almost two decades, architecture research has focused on desktop or server machines. We are designing processors of the future with a heavy bias toward the past. To design successful processor architectures for the future, we first need to explore future applications and match their requirements in a scalable, cost-effective way. The authors describe Vector IRAM, an initial approach in this direction, and challenge others in the very successful computer architecture community to investigate architectures with a heavy bias for the future  相似文献   

10.
Hash函数是密码学中保证数据完整性的有效手段,性能需求使得某些应用必须采用硬件实现。本文通过分析常用Hash函数在算法上的相似性设计出了专用可重构单元,并将这些可重构单元耦合到传输触发体系结构中,得到一种可重构Hash函数处理器TTAH。常用Hash算法在TTAH上的映射结果表明:与细粒度可重构结构相比,其速度快,资源利用率高;与ASIC相比,可以在额外开销增加较小的前提下有效地支持多种常用Hash函数。  相似文献   

11.
To improve the performance of embedded processors, an effective technique is collapsing critical computation subgraphs as application-specific instruction set extensions and executing them on custom functional units. The problem with this approach is the immense cost and the long times required to design a new processor for each application. As a solution to this issue, we propose an adaptive extensible processor in which custom instructions (CIs) are generated and added after chip-fabrication. To support this feature, custom functional units are replaced by a reconfigurable matrix of functional units (FUs). A systematic quantitative approach is used for determining the appropriate structure of the reconfigurable functional unit (RFU). We also introduce an integrated framework for generating mappable CIs on the RFU. Using this architecture, performance is improved by up to 1.33, with an average improvement of 1.16, compared to a 4-issue in-order RISC processor. By partitioning the configuration memory, detecting similar/subset CIs and merging small CIs, the size of the configuration memory is reduced by 40%.  相似文献   

12.
How multimedia workloads will change processor design   总被引:1,自引:0,他引:1  
Diefendorff  K. Dubey  P.K. 《Computer》1997,30(9):43-45
Workloads drive architecture design and will change in the next two decades. For high-performance, general-purpose processors, there is a consensus that multimedia will continue to grow in importance. The authors predict these processors will incorporate more media processing capabilities, eventually bringing about the demise of specialized media processors, except perhaps, in embedded applications. These enhanced general-purpose processor capabilities will arise from multimedia applications that require real-time response, continuous-media data types and significant fine-grained data parallelism  相似文献   

13.
可重构计算系统既具有在制造后的可编程性.又能提供较高的计算性能和计算密度。对可重构计算系统的基本概念以及系统构成概要介绍的基础上,通过一个实例给出了可重构计算系统的一种快速实现方法,与具有相同功能的其他系统相比.这种系统更灵活和高效。  相似文献   

14.
流水线配置技术在可重构处理器中的应用   总被引:1,自引:1,他引:0       下载免费PDF全文
提出一种应用于可重构处理器中的流水线配置技术,能够有效减低配置时间,提高应用程序的执行速度。可重构处理器包括通用处理器和一个粗颗粒度的可重构阵列。可重构阵列将处理应用中占据大量执行时间的循环,这些循环将被分解为不同的行在阵列上以流水线的方式执行。该技术在FPGA验证系统上得到了验证。验证的应用包括H.264基准中的整数离散余弦变换和运动估计。相比传统的可重构处理器PipeRench, MorphoSys以及TI的DSP TMS320DM642有大约3.5倍的性能提升。  相似文献   

15.
FPGAs combine the programmability of processors with the performance of custom hardware. As they become more common in critical embedded systems, new techniques are necessary to manage security in FPGA designs. This article discusses FPGA security problems and current research on reconfigurable devices and security, and presents security primitives and a component architecture for building highly secure systems on FPGAs.  相似文献   

16.
This paper starts proposing a complete recommender system implemented on reconfigurable hardware with the purpose of testing on-chip, low-energy embedded collaborative filtering applications. Although the computing time is lower than the one obtained from usual multicore microprocessors, this proposal has the advantage of providing an approach to solve any prediction problem based on collaborative filtering by using an off-line, highly-portable light computing environment. This approach has been successfully tested with state-of-the-art datasets. Next, as a result of improving certain tasks related to the on-chip recommender system, we propose a custom, fine-grained parallel circuit for quick matrix multiplication with floating-point numbers. This circuit was designed to accelerate the predictions from the model obtained by the recommender system, and tested with two small datasets for experimental purposes. The accelerator is built from two levels of parallelism. On the one hand, several predictions run in parallel through the simultaneous multiplication of different vectors of two matrices. On the other hand, the operation of each vector is executed in parallel by multiplying pairs of floating-point values to later add the corresponding results in parallel as well. This circuit was compared with other approaches designed for the same purpose: circuits built using automatized tools of high-level synthesis, a general-purpose microprocessor, and high-performance graphical processing units. The performance of the prediction accelerator in terms of time surpassed that of the other approaches. We also evaluated the scalability of the circuit to practical problems using the high-level synthesis approach, and confirmed that implementations based on reconfigurable hardware allow acceptable speedups of multi-core processors.  相似文献   

17.
The design of a new high-performance computing platform to model biological neural networks requires scalable, layered communications in both hardware and software. SpiNNaker’s hardware is based upon Multi-Processor System-on-Chips (MPSoCs) with flexible, power-efficient, custom communication between processors and chips. The architecture scales from a single 18-processor chip to over 1 million processors and to simulations of billion-neuron, trillion-synapse models, with tens of trillions of neural spike-event packets conveyed each second. The communication networks and overlying protocols are key to the successful operation of the SpiNNaker architecture, designed together to maximise performance and minimise the power demands of the platform. SpiNNaker is a work in progress, having recently reached a major milestone with the delivery of the first MPSoCs. This paper presents the architectural justification, which is now supported by preliminary measured results of silicon performance, indicating that it is indeed scalable to a million-plus processor system.  相似文献   

18.
Athanas  P.M. Abbott  A.L. 《Computer》1995,28(2):16-25
The authors explore the utility of custom computing machinery for accelerating the development, testing, and prototyping of a diverse set of image processing applications. We chose an experimental custom computing platform called Splash-2 to investigate this approach to prototyping real time image processing designs. Custom computing platforms are emerging as a class of computers that can provide near application specific computational performance. We developed a real time image processing system called VTSplash, based on the Splash-2 general-purpose platform. Splash-2 is an attached processor featuring programmable processing elements (PEs) and communication paths. The Splash-2 system uses arrays of RAM based field programmable gate arrays (FPGAs), crossbar networks, and distributed memory to accomplish the needed flexibility and performance tasks. Such platforms let designers customize specific operations for function and size, and data paths for individual applications  相似文献   

19.
Most conventional object tracking algorithms are implemented on general-purpose processors in software due to its great flexibility. However, the real-time performance is hard to achieve due to the inherent characteristics of the sequential processing of these processors. To tackle this issue, a reconfigurable system-on-chip (rSoC) platform with microprocessors and FPGAs is applied in this paper. To simplify the hardware/software interface, a Belief–Desire–Intention (BDI)-based multi-agent architecture is proposed as the unified framework. Then an agent-based task graph and two heuristic partitioning methods are proposed to partition the hardware and software on an rSoC platform. Compared to the module-based architecture, this BDI-based multi-agent architecture provides more efficiency, flexibility, autonomy, and scalability for the real-time tracking systems. A particle swarm optimization (PSO)-based object detection and tracking algorithm is applied to evaluate the proposed architecture. Extensive experimental results of object tracking demonstrate that the proposed architecture is efficient and highly robust with real-time performance.  相似文献   

20.
基于CORDIC算法的高精度浮点超越函数的FPGA实现   总被引:2,自引:1,他引:2  
提出了一种新的输入输出浮点处理单元硬件架构,将数据从CORDIC算法内部格式转换为处理器能够支持的IEEE754标准浮点数据格式。输入数据支持2种不同的角度单位浮点数据直接输入,同时,硬件模块还直接支持超过360°的大角度数据输入。在Altera公司NiosⅡ处理器系统中以用户自定义指令的形式实现了该浮点硬件计算模块,并通过C语言程序验证了该模块的正确性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号