首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 648 毫秒
1.
Field-programmable gate arrays (FPGAs) are becoming an increasingly important implementation medium for digital logic. One of the most important keys to using FPGAs effectively is a complete, automated software system for mapping onto the FPGA architecture. Unfortunately, many of the tools necessary require different techniques than traditional circuit implementation options, and these techniques are often developed specifically for only a single FPGA architecture. In this paper we describe automatic mapping tools for Triptych, an FPGA architecture with improved logic density and performance over commercial FPGAs. These tools include a simulated-annealing placement algorithm that handles the routability issues of fine-grained FPGAs, and an architecture-adaptive routing algorithm that can easily be retargeted to other FPGAs. We also describe extensions to these algorithms for mapping asynchronous circuits to Montage, the first FPGA architecture to completely support asynchronous and synchronous interface applications  相似文献   

2.
Field programmable gate arrays (FPGAs) with supply voltage (Vdd) programmability have been proposed recently to reduce FPGA power, where the Vdd-level can be customized for FPGA circuit elements and unused circuit elements can be power-gated. In this paper, we first design novel Vdd-programmable and Vdd-gateable interconnect switches with minimal number of configuration SRAM cells. We then evaluate Vdd-programmable FPGA architectures using the new switches. The best architecture in our study uses Vdd-programmable logic blocks and Vdd-gateable interconnects. Compared to the baseline architecture similar to the leading commercial architecture, our best architecture reduces the minimal energy-delay product by 54.39% with 17% more area and 3% more configuration SRAM cells. Our evaluation results also show that LUT size 4 gives the lowest energy consumption, and LUT size 7 leads to the highest performance, both for all evaluated architectures.  相似文献   

3.
In this paper, we present a novel, high throughput field-programmable gate array (FPGA) architecture, PITIA, which combines the high-performance of application specific integrated circuits (ASICs) and the flexibility afforded by the reconfigurability of FPGAs. The new architecture, which targets datapath circuits, uses the concepts of wave steering and pipelined interconnects. We discuss the FPGA architecture and show results for performance, power consumption, clock network performance, and routability. Results for some commonly used datapath designs are encouraging with throughputs in the neighborhood of 625MHz in 0.25-/spl mu/m 2.5-V CMOS technology. Results for random benchmark circuits are also shown. We characterize designs according to their Rent's exponents and argue that designs with predominantly local interconnects are the best fit in PITIA. We also show that as technology scales down toward deep submicron, PITIA shows an increasing throughput performance.  相似文献   

4.
In this paper, we analyze algorithmic and architectural characteristics of a class of particle filters known as Gaussian Particle Filters (GPFs). GPFs approximate the posterior density of the unknowns with a Gaussian distribution which limits the scope of their applications in comparison with the universally applied sample-importance resampling filters (SIRFs) but allows for their implementation without the classical resampling procedure. Since there is no need for resampling, we propose a modified GPF algorithm that is suitable for parallel hardware realization. Based on the new algorithm, we propose an efficient parallel and pipelined architecture for GPF that is superior to similar architectures for SIRF in the sense that it requires no memories for storing particles and it has very low amount of data exchange through the communication network. We analyze the GPF on the bearings-only tracking problem and the results are compared with results obtained by SIRF in terms of computational complexity, potential throughput, and hardware energy. We consider implementation on FPGAs and we perform detailed comparison of the GPF and SIRF algorithms implemented in different ways on this platform. GPFs that are implemented in parallel pipelined fashion on FPGAs can support higher sampling rates than SIRFs and as such they might be a more suitable candidate for real-time applications.  相似文献   

5.
6.
徐恪  林闯  吴建平 《电子学报》2001,29(11):1449-1453
可编程路由器除了转发IP分组之外,还需要执行计算任务.如何调度可编程路由器中CPU的处理能力是一个需要解决的重要问题.本文首先建立了一种通用的可编程路由器软件体系结构,在此基础上,提出了一种基于缓冲队列长度阈值的CPU调度算法,采用随机Petri网对算法进行了模型分析和计算.结果表明,该调度算法可以同时保证可编程路由器中的尽力发送流和QoS流的计算需求.  相似文献   

7.
Asynchronous serial transceivers have been recently used for data serializing in large on-chip systems to alleviate the routing congestion and improve the routability. FPGAs have considerable potential for using the asynchronous serial transmission but they have serious challenges to use this technology. In this paper, we present a new FPGA architecture corresponding with a new routing algorithm to use the asynchronous data serializing technique in modern FPGAs. Experimental results show that allocated routing tracks and routing congestion can be reduced considerably (18.81% and 48.73%, respectively) by using the asynchronous data serializing without any performance degradation in cost of reasonable overhead in area and power consumption. The resulting improvements will increase for larger and more complex FPGAs.  相似文献   

8.
《Microelectronics Journal》2015,46(6):551-562
Most commercial Field Programmable Gate Arrays (FPGAs) have limitations in terms of density, speed, configuration overhead and power consumption mostly due to the use of SRAM cells in Look-Up Tables (LUTs), configuration memory and programmable interconnects. Also, hardwired Application Specific Integrated Circuit (ASIC) blocks designed for high performance arithmetic circuits in FPGA reduce the area available for reconfiguration. In this paper, we propose a novel generalized hybrid CMOS-memristor based architecture using stateful-NOR gates as basic building blocks for implementation of logic functions. These logic functions are implemented on memristor nanocrossbar layers, while the CMOS layer is used for selection and connection of memristors. The proposed pipelined architecture combines the features of ASIC, FPGA and microprocessor based designs. It has high density due to the use of nanocrossbar layer and high throughput especially for arithmetic circuits. The proposed architecture for three input one output logic block is compared with conventional LUT based Configurable Logic Block (CLB) having the same number of inputs and outputs; which shows 1.82×area saving, 1.57×speedup and 3.63×less power consumption. The automation algorithm to implement any logic function using proposed architecture is also presented.  相似文献   

9.
Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wires, FPGAs and field-programmable interconnect devices (FPIDs) are connected. Several routing architectures for MFSs have been proposed, and previous research has shown that the partial crossbar is one of the best existing architectures. In this paper, we propose a new routing architecture, called the hybrid complete-graph and partial-crossbar (HCGP) which has superior speed and cost compared to a partial crossbar. The new architecture uses both hard-wired and programmable connections between the FPGAs. We compare the performance and cost of the HCGP and partial crossbar architectures experimentally, by mapping a set of 15 large benchmark circuits into each architecture. A customized set of partitioning and interchip routing tools were developed, with particular attention paid to architecture-appropriate interchip routing algorithms. We show that the cost of the partial crossbar (as measured by the number of pins on all FPGAs and FPIDs required to fit a design), is on average 20% more than the new HCGP architecture and as much as 25% more. Furthermore, the critical path delay for designs implemented on the partial crossbar were on average 20% more than the HCGP architecture and up to 43% more. Using our experimental approach, we also explore a key architecture parameter associated with the HCGP architecture-the proportion of hard-wired connections versus programmable connections-to determine its best value  相似文献   

10.
11.
One of major reasons why IP multicast has not been well deployed is the complexity of IP multicast routing. Since existing IP multicast routing protocols have been designed independently of IP unicast routing protocols, a router must maintain routing tables for both IP mutlicast and unicast routing. This is, in particular, a big burden for an inter-domain router. In addition, by using existing IP multicast routing protocols, we cannot realize an application that a sending host outside the designated domain sends IP multicast packets only towards the designated domain. To resolve above issues, we propose a new architecture for IP multicast, which is called Domain Constrained Multicast (DCM). In this architecture, IP multicast packets are forwarded to a border router of the designated domain using IP unicast routing. And then, IP multicast packets are delivered inside the designated domain using IP multicast. We propose an address format when realizing the DCM architecture using IPv6. We describe the extension of the DCM architecture for applying it to inter-domain IP multicast routing. Finally, we have compared the DCM architecture for inter-domain routing, with existing inter-domain IP multicast routing protocols such as MSDP and BGMP.  相似文献   

12.
With increasing scale of Field Programmable Gate Arrays (FPGAs), architecture of interconnect resources (IRs) in FPGA is becoming more and more complicated. IR testing plays an important role to guarantee correct functionality of FPGAs. Usually, architecture of Global IRs is regular, while architecture of Local IRs is more complicated compared to Global IRs. In the paper, a generic IR model revealing the connection relationships for both Global and Local IRs in Xilinx series FPGAs is studied. A routability-aware algorithm based on the generic IR model is also presented. Test configurations (TCs) can be automatically generated by the proposed algorithm. Thus, both Global and Local IRs can be tested with identical method. Further, the algorithm is generic and independent of type and size of FPGAs. The algorithm is evaluated in Virtex series FPGAs. Experimental results demonstrate that the routing algorithm is applicable to Virtex series FPGAs with higher IR coverage achieved.  相似文献   

13.
In submicron technology, during the fabrication process factors like lithography and lens defect can change some of the physical parameters of transistors and interconnects. This change can modify the transistor electrical characteristics such as current, threshold voltage and gate capacitance, and thus it causes variation in power, delay and performance of the circuit. Process variation has become one of designer׳s challenges to the point that in below 45 nm technology it is considered as the most important issue in reliability. Power consumption and transistors variation are limiting factors to physical scalability. In this paper, we propose two approaches to reduce D2D and WID variations effects on digital CMOS circuits, at design time. The first approach concerns a variation-aware algorithm capable of extracting optimal design parameters to decrease variation and power. The second approach, using transistor stacking will help further reduce variation and power. Applying the algorithm on a digital design and according to parameters behavior in the presence of variation, we extract for each parameter value that will lead to power and variation reduction. On the other hand, with the stacking approach only basic gates are considered and subsequently gate configurations that reduce power and variation are proposed. The proposed approaches could be used identically for synchronous and asynchronous circuits. To prove this claim, we apply our approaches to a network-on-chip asynchronous router and a circuit from the ISCAS85 benchmark. All simulations are done in 32 nm technology using the HSPICE tool. The proposed algorithm similar to Monte Carlo simulation achieves the same results; however with lower execution time. The application of stacking approach to both asynchronous router and ISCAS85 circuit reduces variation effects up to 40.9% and 13.35%, respectively.  相似文献   

14.
Multi-FPGA Boards (MFBs) have been in use for more than a decade for implementing systems requiring high performance and for emulation/prototyping of multimillion gate chips. It is important to develop an MFB architecture which can be used for emulation or prototyping of a large number of circuits. A key feature of an MFB is its routing architecture defined by its inter-Field-Programmable Gate Array (FPGA) connections. There are two types of inter-FPGA connections, namely–fixed connections (FCs) connecting a pair of FPGAs through dedicated wires and programmable connections (PCs) which connect a pair of FPGAs through a programmable switch. An architecture which has a mix of both these type of connections is called a hybrid routing architecture. It has been shown in the literature [7] that a hybrid MFB architecture is more efficient for emulation than an architecture with only one type of connections. The cost of an MFB and delay of the emulated circuit on it depends on the number of PCs used for emulation. An objective of a designer of an MFB for circuit emulation is to minimize the required number of PCs. In this paper, we describe algorithms to evaluate the requirement of PCs for many hybrid routing architectures.The requirement of PCs can be reduced if some programmable connections are replaced by a connection using only FCs by routing through FPGAs. Such a routing is called multi-hop routing. We present an optimal and a heuristic algorithm for estimation of PCs when limited number of hops through FPGAs are permitted. The unique feature of our evaluation scheme is that it is generic and treat routing architecture as a parameter. We have used benchmark circuits as well as synthetic cloned circuits for testing our algorithms. Our heuristic algorithm is very fast and gives optimal results most of the time. Our algorithms can be used for actual routing during circuit emulation.  相似文献   

15.
Optical Network-on-Chip (ONoC) is becoming a promising solution for high performance on chip interconnection, which draws much attention from many researchers. ONoC combined with 3D integration technology can address some issues of two-dimensional ONoC such as long distance and limited scalability, which have been shown to be effective solutions for further promoting the performance of ONoC. However, the infeasibility of most existing routers with four or five ports poses a problem in 3D optical interconnect as seven-port optical routers are required in 3D networks. To solve this problem, in this paper, we propose a 3D multilayer optical network on chip (3D MONoC) based on Votex, a non-blocking optical router with seven ports. We describe the optical router and the 3D network in detail. The proposed router architecture not only realizes 3D interconnection and can be utilized in most 3D ONoC, but also can be beneficial in achieving smaller area, lower cost of ONoC. We compare Votex with the traditional \(7\times 7\) optical router based on crossbar, which indicated that Votex can save cost. Moreover, we make a comparison of 3D MONoC employing Votex against its 2D counterpart. Simulation results show that the performance including ETE delay and throughput of 3D MONoC can be improved.  相似文献   

16.
Modern FPGAs have a great market share in hardware prototyping, massive parallel systems and reconfigurable architectures. Although the field-programmability of FPGAs is an effective feature in the growth and diversity of their applications; it has caused security concerns for IPs/Designs on FPGAs. Recent researches show that a reliable mechanism is required to protect the IPs/applications on FPGAs against malicious manipulations during all stages of design lifecycle, especially when they are operating in the field. In this paper, we propose a new tamper-resistant design methodology (Security Path methodology) and a revised security-aware FPGA architecture. This methodology protects the configured design against tampering attacks in parallel with the normal operation of the circuit. When the attack is discovered, the normal data flow is obfuscated and the circuit is blocked. Experimental results show that this methodology provides near full coverage in tampering detection with overhead of 12.32 % in power, 12 % in delay and 38 % in area.  相似文献   

17.
Deficit round-robin scheduling for input-queued switches   总被引:3,自引:0,他引:3  
We address the problem of fair scheduling of packets in Internet routers with input-queued switches. The goal is to ensure that packets of different flows leave a router in proportion to their reservations under heavy traffic. First, we examine the problem when fair queuing is applied only at output link of a router, and verify that this approach is ineffective. Second, we propose a flow-based iterative deficit-round-robin (iDRR) fair scheduling algorithm for the crossbar switch that supports fair bandwidth distribution among flows, and achieves asymptotically 100% throughput under uniform traffic. Since the flow-based algorithm is hard to implement in hardware, we finally propose a port-based version of iDRR (called iPDRR) and describe its hardware implementation.  相似文献   

18.
Recently,Content-Centric Networking(CCN) has been paid more and more attention.The modeling of CCN as an important research point is the foundation of the architecture.The past work of cache network modeling always assumes the virtual round trip time(VRTT) is zero for simplicity.However,this assumption isn't practical and results in model error especially in CCN.CCN's router can aggregate the content requests during the VRTT to avoid content delivery repeatedly.Thus,to modeling CCN data transfer,as well as understanding how it should be managed,the VRTT shouldn't be ignored.In this paper,we model the data transfer in CCN,and propose a multi-cache with aggregation approximation(MCAA)algorithm to get the content miss rate and VRTT at each router.Simulation results show the validity of our MCAA algorithm.  相似文献   

19.
分级调度算法在路由交换机上的应用   总被引:2,自引:0,他引:2  
本文提出了一种两级的分级调度模型,着重研究其算法的具体实现,并分析算法在路由交换机中的应用。分级调度算法的引入,保证在现有Internet体系结构下、best-effort服务能很好地与有QoS要求的实时服务相结合,仿真实验证明该发级调度算法可行有效。  相似文献   

20.
为突破“电子瓶颈”的限制,Internet向波分复用(WDM)全光网络演进已是必然的趋势.在光Internet中,光路由器是最为关键的设备之一.文章提出了一种光路由器实现的结构,该结构是以光突发标记交抉为核心,不需经过多次的O/E/O转换,即可实现边缘到边缘的全光域数据传输和处理.文章还对该结构具体实现中的关键问题:光突发数据的格式和装配、报头的提取和识别、路由及光标记交换和拥塞等进行了详细的分析研究,提出了可行的解决办法.最后设计了一个实验系统,并给出了相应的实验结果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号