共查询到20条相似文献,搜索用时 31 毫秒
1.
Design Trade-offs in Customized On-chip Crossbar Schedulers 总被引:1,自引:0,他引:1
In this paper, we present a design and an analysis of customized crossbar schedulers for reconfigurable on-chip crossbar networks.
In order to alleviate the scalability problem in a conventional crossbar network, we propose adaptive schedulers on customized
crossbar ports. Specifically, we present a scheduler with a weighted round robin arbitration scheme that takes into account
the bandwidth requirements of specific applications. In addition, we propose the sharing of schedulers among multiple ports
in order to reduce the implementation cost. The proposed schedulers arbitrate on-demand (at design time) interconnects and
adhere to the link bandwidth requirements, where physical topologies are identical to logical topologies for given applications.
Considering conventional crossbar schedulers as reference designs, a comparative performance analysis is conducted. The hardware
scheduler modules are parameterized. Experiments with practical applications show that our custom schedulers occupy up to
83% less area, and maintain better performance compared to the reference schedulers. 相似文献
2.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(8):997-1007
3.
《Microelectronics Journal》2014,45(11):1533-1541
Crossbar array is a promising nanoscale architecture which can be used for logic circuit implementation. In this work, a graphene nanoribbon (GNR) based crossbar architecture is proposed. This design uses parallel GNRs as device channel and metal as gate, source and drain contacts. Schottky-barrier type graphene nanoribbon field-effect transistors (SB-GNRFETs) are formed at the cross points of the GNRs and the metallic gates. Benchmark circuits are implemented using the proposed crossbar, Si-CMOS and multi-gate Si-CMOS approaches to evaluate the performance of the crossbar architecture compared to the conventional CMOS logic design. The compact SPICE model of SB-GNRFET was used to simulate crossbar-based circuits. The CMOS circuits are also simulated using 16 nm technology parameters. Simulation results of benchmark circuits using SIS synthesis tool indicate that the GNR-based crossbar circuits outperform conventional CMOS circuits in low power applications. Area optimized cell libraries are implemented based on the asymmetric crossbar architecture. The area of the circuit can be more reduced using this architecture at the expense of higher delay. The crossbar cells can be combined with CMOS cells to exhibit better performance in terms of EDP. 相似文献
4.
Topology/Floorplan/Pipeline Co-Design of Cascaded Crossbar Bus 总被引:1,自引:0,他引:1
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(8):1034-1047
5.
在多核CPU中,当多个处理器核心需要和存储器及输入输出口进行数据存取时就会导致竞争问题,此时传统的总线将会降低系统的性能。而采用CPU-Cache交叉开关无阻塞网络实现点对点的传输则在很大程度上解决了这一问题。本文对交叉开关与传统的共享总线做出比较,并对交叉开关进行全定制电路设计。 相似文献
6.
No aspect of computer design is sacred, not even the system bus, which is giving way to switches in multiprocessing systems, where performance is key. System buses, which string computers together out of circuit boards, have come to strangle system performance in many cases. Another interconnection architecture, though, can free a system from the bus's clutch. Variously known as a switch, crossbar, or crosspoint, it has long been used in speciality computers and is now making its way into lower-cost machines. Meanwhile, silicon and packaging technology have been refined to the point that the crossbar architecture can vie with the system bus for a place in low-cost multiprocessors. More specifically, the crossbar is well suited to use in distributed memory systems, where there is a need for broad path ways for communications between the chunks of memory themselves. The roots of such an approach go deep. In fact, it may be said to have started with an idea for keeping as much data traffic as possible out of general circulation: cache memory 相似文献
7.
宏SIMD短向量管理部件是高性能通用微处理器和媒体处理器的重要部件。文章提出一种基于交叉开关的宏SIMD短向量管理部件设计,用于音频、视频和网络通信等多媒体数据处理,克服了传统SIMD体系结构中的数据结构与系统硬件不匹配的问题,满足了下一代高性能计算的要求。 相似文献
8.
交叉开关是片上网络路由器的关键部分。交叉开关的设计可以采用三态触发器或多路复用器实现。本文针对几种不同形式的交叉开关实现方案,比较了其面积和功耗的开销,同时设计了基于iSLIP算法的交叉开关调度机制。通过基本逻辑门搭建的多路复用器实现的交叉开关相比于采用三态门实现的交叉开关,在功耗、面积上有较大优势。采用iSLIP算法实现的片上网络交叉开关,具有最高的工作频率上限。 相似文献
9.
Feld D.A. Van Duzer T. Yuh P. Kaplan S.B. 《Applied Superconductivity, IEEE Transactions on》1996,6(3):113-124
We present the design and experimental demonstration of a 5-b serial-to-parallel decoder for a crossbar application. A serial train of seven bits is provided at the input with the first five being the code for selecting one of 32 output lines. The last two constitute the code that determines if the selected decoder in the crossbar switch should generate an output. Several circuit innovations were needed to meet the severe restrictions on power, current, and area required for the crossbar application. Operation of the circuit was demonstrated at 2 Gb 相似文献
10.
Khalid M.A.S. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):30-39
Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wires, FPGAs and field-programmable interconnect devices (FPIDs) are connected. Several routing architectures for MFSs have been proposed, and previous research has shown that the partial crossbar is one of the best existing architectures. In this paper, we propose a new routing architecture, called the hybrid complete-graph and partial-crossbar (HCGP) which has superior speed and cost compared to a partial crossbar. The new architecture uses both hard-wired and programmable connections between the FPGAs. We compare the performance and cost of the HCGP and partial crossbar architectures experimentally, by mapping a set of 15 large benchmark circuits into each architecture. A customized set of partitioning and interchip routing tools were developed, with particular attention paid to architecture-appropriate interchip routing algorithms. We show that the cost of the partial crossbar (as measured by the number of pins on all FPGAs and FPIDs required to fit a design), is on average 20% more than the new HCGP architecture and as much as 25% more. Furthermore, the critical path delay for designs implemented on the partial crossbar were on average 20% more than the HCGP architecture and up to 43% more. Using our experimental approach, we also explore a key architecture parameter associated with the HCGP architecture-the proportion of hard-wired connections versus programmable connections-to determine its best value 相似文献
11.
12.
13.
Memristive device based passive crossbar arrays hold a great promise for high-density and non-volatile memories. A significant challenge of ultra-high density integration of these crossbars is unwanted sneak-path currents. The most common way of addressing this issue today is an integrated or external selecting device to block unwanted current paths. In this paper, we use a memristive device with intrinsic rectifying behavior to suppress sneak-path currents in the crossbar. We systematically evaluate the read operation performance of large-scale crossbar arrays with regard to read margin and power consumption for different crossbar sizes, nanowire interconnect resistances, ON and OFF resistances, rectification ratios under different read-schemes. Outcomes of this study allow improved understanding of the trade-off between read margin, power consumption and read-schemes. Most importantly, this study provides a guideline for circuit designers to improve the performance of oxide-based resistive memory (RRAM) based cross-point arrays. Overall, self-rectifying behavior of the memristive device efficiently improves the read operation performance of large-scale selectorless cross-point arrays. 相似文献
14.
This paper presents a finite state analytical model and supporting simulation for performance analysis of a partially blocking, packet-switched, multistage communication network whose crossbar switches are output queued, non-lossy, and have an internal bandwidth (BW) such that 1⩽BW⩽a, where a is the number of inputs to the crossbar. To the knowledge of the authors, this is the only analytical model in the current literature that addresses this problem without making at least one of the following simplifying assumptions: (1) infinite number of inputs, (2) infinite number of buffers, (3) BW=a, (4) use of only a single crossbar (as opposed to multiple stages). The analytical model presented herein gives a set of closed-form equations which lead to an iterative solution for normalized bandwidth and normalized delay. The model provides results which are quite accurate (as shown by simulation) over a large range of parameter values (e.g., crossbar size, number of buffers in each queue, etc) 相似文献
15.
The buffered crossbar switch is a promising switching architecture that plays a crucial role for providing quality of service (QoS) in computer networks. Sufficient amount of resources—bandwidth and buffer space—must be allocated in buffered crossbar switches for QoS provision. Resource allocation based on deterministic QoS objectives might be too conservative in practical network operations. To improve resource utilization in buffered crossbar switches, we study the problem of resource allocation for statistical QoS provision in this paper. First, we develop a model and techniques for analyzing the probabilistic delay performance of buffered crossbar switches, which is described by the delay upper bound with a prescribed violation probability. Then, we determine the required amounts of bandwidth and buffer space to achieve the probabilistic delay objectives for different traffic classes in buffered crossbar switches. In our analysis, we apply the effective arrival envelope to specify traffic load in a statistical manner and characterize switch service capacity by using the service curve technique. Instead of just focusing on one specific type of scheduler, the model and techniques developed in this paper are very flexible and can be used for analyzing buffered crossbar switches with a wide variety of scheduling algorithms. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献
16.
该文针对与非锥(And-Inverter Cone, AIC)簇架构FPGA开发中面临的簇面积过大的瓶颈问题,对其输入交叉互连设计优化进行深入研究,在评估优化流程层次,首次创新性提出装箱网表统计法对AIC簇输入和反馈资源占用情况进行分析,为设计及优化输入交叉互连结构提供指导,以更高效获得优化参数。针对输入交叉互连模块,在结构参数设计层次,首次提出将引脚输入和输出反馈连通率分离独立设计,并通过大量的实验,获得最优连通率组合。在电路设计实现层次,有效利用AIC逻辑锥电路结构特点,首次提出双相输入交叉互连电路实现。相比于已有的AIC簇结构,通过该文提出的优化方法所得的AIC簇自身面积可减小21.21%,面积制约问题得到了明显改善。在实现MCNC和VTR应用电路集时,与Altera公司的FPGA芯片Stratix IV(LUT架构)相比,采用具有该文所设计的输入交叉互连结构的AIC架构FPGA,平均面积延时积分别减小了48.49%和26.29%;与传统AIC架构FPGA相比,平均面积延时积分别减小了28.48%和28.37%,显著提升了FPGA的整体性能。 相似文献
17.
一种雷达信号处理系统新体系结构的设计 总被引:2,自引:2,他引:0
为了适应雷达信号处理大带宽的要求 ,现代雷达信号处理系统结构中广泛采用了基于开关的、点对点的互连结构。基于互连开关结构的信号处理系统具有可扩展性好、性能优越、成本较低的优点。由于采用低电压差分传输 ,在交叉开关之间的数据传输速率可以达到G比特 ,它成为今后先进雷达信号处理系统发展的方向。依据RapidIO互连协议规范 ,文中提出了基于多数字信号处理器和开关互连的雷达信号处理系统体系结构 ,并对其中的交叉开关模型及其性能进行了分析 ,最后对该信号处理系统的软件和硬件实现方法进行了讨论。 相似文献
18.
J. Martin-Martinez C.G. Almudever A. Crespo-Yepes R. Rodriguez M. Nafria A. Rubio 《Microelectronics Reliability》2014
In this work, a new technologic strategy that allows implementing large crossbars formed with memFETs, a new device concept, is introduced. This memFET is an electrically reconfigurable field effect and resistive switching device that can be used to implement logic functions and memory blocks into a crossbar structure, allowing the dynamic logic configuration of the crossbar and simplifying both the design and the implementation of computing hardware. Moreover, taking the advantage of reconfiguration capability of such a technology and architecture we introduce a novel technique to design evolvable hardware where not only the logic functions are changeable (as is the case of the Field Programmable Gate Array, FPGA) but also the physical position of the components on the surface of the integrated circuit. This technology and principle leads towards a new computing paradigm based on what we name Shape Shifting Digital Hardware (SSDH). 相似文献
19.
Fujita S. Nomura K. Abe K. Lee T.H. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(11):2472-2479
We have proposed 3 nanoarchitectures with carbon nanotube-based nano-electromechanical systems (CNT-NEMS) switch with a floating gate. It is shown that logic based on them has the potential to replace CMOS using process technology of less than 45 nm. Furthermore, CNT-NEMS-based 3-D circuits realize extremely high bandwidth of over 10 petabyte/s with very low latency of less than several 10 ps. The most effective applications are 3D on-chip crossbar bus and future on-chip network, which will largely determine the performance of future microchips. The performance of 3-D on-chip crossbar based on CNT-NEMS is also compared with that based on CNT-transistors. 相似文献
20.
Input-buffered replicated networks are considered for broadband switching applications. They are characterized by many design parameters such as the replication factor, the traffic management policy, and input buffer location and length. To show the influence of these parameters on switching performance, an analytical model is defined based on a Markov chain representation of the input buffer. This model is suitable for application to input buffered architecture having different routing network choices. The results, expressed in terms of throughput, packet delay, and packet loss probability, outline the performance improvements with respect to other well-known networks with input buffers, such as banyan and crossbar, reached through the flexibility offered by this architectural solution 相似文献