首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
多片FPGA系统互连结构研究   总被引:3,自引:1,他引:2  
本文在分析现有多FPGA系统互连拓朴结构的基础上,指出其最佳形式,即硬布线和可编程布线相结合,优势互补,并提出了一种新的拓朴结构——最大权生成树与交叉开关相结合,详述了其设计流程和算法。  相似文献   

2.
Multi-FPGA Boards (MFBs) have been in use for more than a decade for implementing systems requiring high performance and for emulation/prototyping of multimillion gate chips. It is important to develop an MFB architecture which can be used for emulation or prototyping of a large number of circuits. A key feature of an MFB is its routing architecture defined by its inter-Field-Programmable Gate Array (FPGA) connections. There are two types of inter-FPGA connections, namely–fixed connections (FCs) connecting a pair of FPGAs through dedicated wires and programmable connections (PCs) which connect a pair of FPGAs through a programmable switch. An architecture which has a mix of both these type of connections is called a hybrid routing architecture. It has been shown in the literature [7] that a hybrid MFB architecture is more efficient for emulation than an architecture with only one type of connections. The cost of an MFB and delay of the emulated circuit on it depends on the number of PCs used for emulation. An objective of a designer of an MFB for circuit emulation is to minimize the required number of PCs. In this paper, we describe algorithms to evaluate the requirement of PCs for many hybrid routing architectures.The requirement of PCs can be reduced if some programmable connections are replaced by a connection using only FCs by routing through FPGAs. Such a routing is called multi-hop routing. We present an optimal and a heuristic algorithm for estimation of PCs when limited number of hops through FPGAs are permitted. The unique feature of our evaluation scheme is that it is generic and treat routing architecture as a parameter. We have used benchmark circuits as well as synthetic cloned circuits for testing our algorithms. Our heuristic algorithm is very fast and gives optimal results most of the time. Our algorithms can be used for actual routing during circuit emulation.  相似文献   

3.
4.
As the logic capacity of field-programmable gate arrays (FPGAs) increases, they are increasingly being used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through FPGA architectural innovations. This paper describes such an FPGA routing architecture, called the multibit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, compared to conventional FPGA routing architectures, the multibit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections.  相似文献   

5.
Logic emulation is so far the fastest method to verify the system functionality in the gate level before chip fabrication. Field-programmable gate array (FPGA)-based logic emulator with large gate capacity generally comprises a large number of FPGAs or special processors connected in mesh or crossbar topology. However, gate utilization of FPGAs and speed of emulation are limited by the number of signal pins among FPGAs and the interconnection architecture of the logic emulator. This paper first describes a new interconnection architecture called TOMi (Time-multiplexed, Off-chip, Multicasting interconnection) and proposes a circuit partitioning algorithm called ATOMi (Algorithm for TOMi) for multi-FPGA system incorporating four to eight FPGAs where FPGAs are interconnected through TOMi. ATOMi reduces the number of off-chip signal transfers to optimize the performance for multi-FPGA system implemented by TOMi. Experimental results using Partitioning93 benchmarks show that, by adopting the proposed TOMi interconnection architecture along with ATOMi, the pin count is reduced to 14.4%–88.6% while the critical path delay is reduced to 66.1%–90.1% compared to traditional architectures including mesh, crossbar, and VirtualWire architecture.  相似文献   

6.
Field programmable gate arrays (FPGAs) with supply voltage (Vdd) programmability have been proposed recently to reduce FPGA power, where the Vdd-level can be customized for FPGA circuit elements and unused circuit elements can be power-gated. In this paper, we first design novel Vdd-programmable and Vdd-gateable interconnect switches with minimal number of configuration SRAM cells. We then evaluate Vdd-programmable FPGA architectures using the new switches. The best architecture in our study uses Vdd-programmable logic blocks and Vdd-gateable interconnects. Compared to the baseline architecture similar to the leading commercial architecture, our best architecture reduces the minimal energy-delay product by 54.39% with 17% more area and 3% more configuration SRAM cells. Our evaluation results also show that LUT size 4 gives the lowest energy consumption, and LUT size 7 leads to the highest performance, both for all evaluated architectures.  相似文献   

7.
8.
Asynchronous serial transceivers have been recently used for data serializing in large on-chip systems to alleviate the routing congestion and improve the routability. FPGAs have considerable potential for using the asynchronous serial transmission but they have serious challenges to use this technology. In this paper, we present a new FPGA architecture corresponding with a new routing algorithm to use the asynchronous data serializing technique in modern FPGAs. Experimental results show that allocated routing tracks and routing congestion can be reduced considerably (18.81% and 48.73%, respectively) by using the asynchronous data serializing without any performance degradation in cost of reasonable overhead in area and power consumption. The resulting improvements will increase for larger and more complex FPGAs.  相似文献   

9.
The Realizer, is a logic emulation system that automatically configures a network of field-programmable gate arrays (FPGAs) to implement large digital logic designs, is presented. Logic and interconnect are separated to achieve optimum FPGA utilization. Its interconnection architecture, called the partial crossbar, greatly reduces system-level placement and routing complexity, achieves bounded interconnect delay, scales linearly with pin count, and allows hierarchical expansion to systems with hundreds of thousands of FPGA devices in a fast and uniform way. An actual multiboard system has been built, using 42 Xilinx XC3090 FPGAs for logic. Several designs, including a 32-b CPU datapath, have been automatically realized and operated at speed. They demonstrate very good FPGA utilization. The Realizer has applications in logic verification and prototyping, simulation, architecture development, and special-purpose execution  相似文献   

10.
Antifuse field programmable gate arrays   总被引:1,自引:0,他引:1  
An antifuse is an electrically programmable two-terminal device with small area and low parasitic resistance and capacitance. Field-programmable gate arrays (FPGAs) using antifuses in a segmented channel routing architecture now offer the digital logic capabilities of an 8000-gate conventional gate array and system speeds of 40-60 MHz. A brief survey of antifuse technologies is provided. the antifuse technology, routing architecture, logic module, design automation, programming, testing and use of ACT antifuse FPGAs are described. Some inherent tradeoffs involving the antifuse characteristics, routing architecture and logic module are illustrated  相似文献   

11.
This article introduces a novel lookup table (LUT) and its usage in the configurable logic block (CLB) architectures for SRAM-based field-programmable gate array (FPGA) architectures. The proposed CLB allows sharing of SRAM tables of LUTs among NPN-equivalent functions to reduce the size of memories used for storing the functions and also reduces the number of configuration bits required. We measured many different characteristics of FPGAs using our new CLB architecture, including area, delay, routing, and power requirements. We experimentally found that for many different FPGA architectures, CLBs can share one-fourth of their SRAM tables between two basic logic elements (BLEs), which reduced both power consumption and area without negatively affecting routing or wirelength, and there was only a negligible increase in critical path delay of 0.27%. Specifically, we find that FPGAs consisting of CLBs with 16 BLEs and 34 inputs can be implemented with eight normal SRAMs and four SRAMs shared between two BLEs, for an overall reduction of four out of sixteen SRAM tables per CLB. With this new CLB architecture, we measured an approximate reduction in overall power consumption of 2% and an estimated reduction in area of 3%  相似文献   

12.
There is currently great interest in using fixed arrays of FPGAs for logic emulators, custom computing devices, and software accelerators. An important part of designing such a system is determining the proper routing topology to use to interconnect the FPGAs. This topology can have a great effect on the area and delay of the resulting system. Crossbar, Hierarchical Crossbar, and Mesh interconnection schemes have all been proposed for use in FPGA-based systems. In this paper, we examine Mesh interconnection schemes, and propose several constructs for more efficient topologies. These reduce interchip delays by more than 60% over the basic four-way Mesh  相似文献   

13.
Architecture of field-programmable gate arrays   总被引:8,自引:0,他引:8  
A survey of field-programmable gate array (FPGA) architectures and the programming technologies used to customize them is presented. Programming technologies are compared on the basis of their volatility, size parasitic capacitance, resistance, and process technology complexity. FPGA architectures are divided into two constituents: logic block architectures and routing architectures. A classification of logic blocks based on their granularity is proposed, and several logic blocks used in commercially available FPGAs are described. A brief review of recent results on the effect of logic block granularity on logic density and performance of an FPGA is then presented. Several commercial routing architectures are described in the context of a general routing architecture model. Finally, recent results on the tradeoff between the flexibility of an FPGA routing architecture, its routability, and its density are reviewed  相似文献   

14.
Fabrication cost of application-specific integrated circuits (ASICs) is exponentially rising in deep submicron region due to rapidly rising non-recurring engineering cost. Field programmable gate arrays (FPGAs) provide an attractive alternative to ASICs but consume an order of magnitude higher power. There is a need to explore ways of reducing FPGA power consumption so that they can also be employed in ultra low power (ULP) applications instead of ASICs. Subthreshold region of operation is an ideal choice for ULP low-throughput FPGAs. The routing of an FPGA consumes most of the chip area and primarily determines the circuit delay and power consumption. There is a need to design moderate-speed ULP routing switches for subthreshold FPGA. This article proposes a novel subthreshold FPGA routing switch box (SB) that utilises the leakage voltage through transistor as biasing voltage which shows 69%, 61.2% and 30% improvement in delay, power delay product and delay variation, respectively, over conventional routing SB.  相似文献   

15.
This paper describes the Transmogrifier-2 (TM-2), a second-generation multifield programmable gate array (FPGA) rapid-prototyping system. The largest version of the system will comprise 16 boards that each contain two Altera 10K50 FPGA's, four I-Cube interconnect chips, and up to 8 Mbytes of memory. The inter-FPGA routing architecture of the TM-2 uses a novel interconnect structure, a nonuniform partial crossbar, that provides a constant delay between any two FPGA's in the system. The TM-2 architecture is modular and scalable, meaning that systems of various sizes can be constructed from copies of the same board, while maintaining routability and the constant delay feature. Other features include a system-level programmable clock that allows single-cycle access to off-chip memory, and programmable clock waveforms with edge resolution of 10 ns. The first Transmogrifier-2 boards have been manufactured and are functional. They have recently been used successfully in some simple graphics acceleration applications  相似文献   

16.
With the current technology, all-optical networks require nonblocking switch architectures for building optical cross-connects. The crossbar switch has been widely used for building an optical cross-connect due to its simple routing algorithm and short path setup time. It is known that the crossbar suffers from huge signal loss and crosstalk. The Clos network uses a crossbar as building block and reduces switch complexity, but it does not significantly reduce signal loss and crosstalk. Although the Spanke's network eliminates the crosstalk problem, it increases the number of switching elements required considerably (to 2N 2 - 2N). In this paper, we propose a new architecture for building nonblocking optical switching networks that has much lower signal loss and crosstalk than the crossbar without increasing switch complexity. Using this architecture we can build non-squared nonblocking networks that can be used as building block for the Clos network. The resulting Clos network will then have not only lower signal loss and crosstalk but also a lower switch complexity.  相似文献   

17.
This paper describes GlitchLess, a circuit-level technique for reducing power in field-programmable gate arrays (FPGAs) by eliminating unnecessary logic transitions called glitches. This is done by adding programmable delay elements to the logic blocks of the FPGA. After routing a circuit and performing static timing analysis, these delay elements are programmed to align the arrival times of the inputs of each lookup table (LUT), thereby preventing new glitches from being generated. Moreover, the delay elements also behave as filters that eliminate other glitches generated by upstream logic or off-chip circuitry. On average, the proposed implementation eliminates 87% of the glitching, which reduces overall FPGA power by 17%. The added circuitry increases the overall FPGA area by 6% and critical-path delay by less than 1%. Furthermore, since it is applied after routing, the proposed technique requires little or no modifications to the routing architecture or computer-aided design (CAD) flow.   相似文献   

18.
陈星  王丽云  王元  吴方  王健  陈利光  来金梅 《电子学报》2011,39(5):1165-1168
传统的可编程互联结构在短距离互连上往往采用单管、中距离上有双向线,这使得在CLB中查找表(LUT)数目变大后,互连上的延迟会随线长增加而呈指数增长.本文提出了一种改进的高性能互连结构,改进了短、中和长距离互连,使得其在CLB中LUT数目增加的情况下让芯片拥有更好的互连延迟特性,通过对这种互连结构和传统的互连结构进行建模...  相似文献   

19.
Field programmable gate arrays usage has been growing steadily for years now. Their popularity stems from the fact that they can be reprogrammed to implement any function, with any amount of parallelism. Unfortunately, exactly due to their flexibility, FPGAs require a huge amount of resources, in the form of LUTs and routing switches, and these can take up to 90% of the chip area. In this paper we present the development of a low-power full CMOS multiple-valued logic to build a LUT for FPGAs. Several circuits are mapped to quaternary LUTs and compared to their binary counterpart. Results show great improvements in terms of area and power consumption. Moreover, we show the positive impact of the proposed architecture in the global reduction of routing switches and wiring, and hence in the total FPGA area.  相似文献   

20.
FPGA布线通道分布对面积效率的影响研究   总被引:2,自引:0,他引:2  
该文提出了现场可编程门阵列(FPGA)布线通道不均匀分布对芯片面积的影响。引入几个典型的数学分布函数(高斯,正弦和三角分布),实现通道容量随函数分布变化的新FPGA结构。将这些结构的FPGA与传统的布线通道均匀分布的FPGA作比较,结果表明按照数学分布变化的布线通道分布结构比均匀分布情况下的面积效率要高。亦即通道分布的变化趋势是峰值位置位于芯片中央,即通道容量最大,从中间位置向边缘按函数变化趋势逐渐变小。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号