首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Asynchronous switching is proposed to achieve low power Network on Chip. Asynchronous switching reduces the power dissipation of the network if the activity factor of the data transfer between two ports αdata is less than Aαc+Bαclk. Closed form expressions for power dissipation of different network topologies are provided for both synchronous and asynchronous switching. The expressions are technology independent and are used for power estimation. Asynchronous switching is compared with synchronous switching for different network densities N/LcXLc. The area of the asynchronous switch is 50% greater than the area of the synchronous switch. However, the power dissipation of asynchronous switching decreased by up to 70.8% as compared to the power dissipation of the conventional synchronous switching for Butter-Fly Fat Tree (BFT) topology. Asynchronous switching is more efficient in CLICHE topology than in both BFT and Octagon topologies achieving higher power reduction 75.7%. Asynchronous switching becomes more efficient as technology advances and network density increases. A reduction in power dissipation reaches 82.3% for 256 IPs with the same chip size. Even with clock gating, asynchronous switching achieves significant power reduction 77.7% for 75% clock activity factor.  相似文献   

2.
Real-time streaming signal processing systems typically desire high throughput and low latency. Many such systems can be modeled as synchronous data flow graphs. In this paper, we address the problem of multi-objective mapping of SDF graphs onto heterogeneous multiprocessor platforms, where we account for the overhead of bus-based inter-processor communication. The primary contributions include (1) an integer linear programming (ILP) model that globally optimizes throughput, latency and cost; (2) low-complexity two-stage heuristics based on a combination of an evolutionary algorithm with an ILP to generate either a single sub-optimal mapping solution or a Pareto front for design space optimization. In our simulations, the proposed heuristic shows up to 12x run-time efficiency compared to the global ILP while maintaining a 10 − 6 optimality gap in throughput.  相似文献   

3.
Modern Networks-on-Chip (NoCs) must accommodate a diversity of temporal requirements, e.g., providing guarantees for real-time senders while reducing adverse performance impact on best-effort (BE) traffic. In this work, we propose a protocol-based adaptive congestion control. By selectively detouring real-time or BE traffic (i.e load balancing) and dynamic throttling of BE, we allow improving the NoC performance without costly hardware extensions. The introduced method offers safe and efficient integration of mixed-critical workloads through the coupling of flow control mechanisms with the path selection based on the current NoC state. The requested real-time reliability of the network is achieved through a predictable synchronization with control messages supported by a formal analysis and an experimental evaluation.  相似文献   

4.
5.
Mortazavi  Seyed Hassan  Akbar  Reza  Safaei  Farshad  Rezaei  Amin 《Wireless Networks》2019,25(6):3675-3687
Wireless Networks - The combination of traditional wired links for regular transmissions and express wireless paths for long distance communications is a promising solution to prevent multi-hop...  相似文献   

6.
Networks-on-Chip (NoCs) for real-time systems require solutions for safe and predictable sharing of network resources between transmissions with different quality of service requirements. In this work, we present a mechanism for a global and dynamic admission control in NoCs dedicated to real-time systems. It introduces an overlay network to synchronize transmissions using arbitration units called Resource Managers (RMs), which allows a global and work-conserving scheduling. We present a formal worst-case timing analysis for the proposed mechanism and demonstrate that this solution not only exposes higher performance in simulation but, even more importantly, consistently reaches smaller formally guaranteed worst-case latencies than TDM for realistic levels of system's utilization. Our mechanism does not require the modification of routers and therefore can be used together with any architecture utilizing non-blocking routers.  相似文献   

7.
This paper presents two high-throughput, low-latency converters that can be used to convert synchronous communication protocol to asynchronous one and vice versa. We have designed these two hardware components to be used in a Globally Asynchronous Locally Synchronous clusterized Multi-Processor System-on-Chip communicating by a fully asynchronous Network-on-Chip. The proposed architecture is rather generic, and allows the system designer to make various trade-offs between latency and robustness, depending on the selected synchronizer. We have physically implemented the two converters with portable ALLIANCE CMOS standard cell library and evaluated the architectures by SPICE simulation for a 90 nm CMOS fabrication process.  相似文献   

8.
In this paper, we investigate the problem of providing efficient communication primitives across domains of wireless sensor network (WSN) applications. We argue both qualitatively and quantitatively that group communication among sensors of geographic proximity is one of the basic building blocks of many WSN applications. Furthermore, group communication awareness needs to be embedded and implemented at the MAC layer due to the broadcast nature of wireless medium. We devise a MAC protocol, called LGC-MAC to enable efficient single-hop one-to-many and many-to-one communication. We present case studies of two example applications, acoustic target tracking and propagation of information with feedback using LGC-MAC and demonstrate that LGC-MAC can improve the response time, alleviate channel contention and provide better fault tolerance to packet collisions and wireless errors.
Rong ZhengEmail:
  相似文献   

9.
赵运筹  贾浩  丁建峰  张磊  付鑫  杨林 《半导体学报》2016,37(11):114008-6
With the continuous development of integrated circuits, the performance of the processor has been improved steadily. To integrate more cores in one processor is an effective way to improve the performance of the processor, while it is impossible to further improve the property of the processor by only increasing the clock frequency. For a processor with integrated multiple cores, its performance is determined not only by the number of cores, but also by communication efficiency between them. With more processor cores integrated on a chip, larger bandwidths are required to establish the communication among them. The traditional electrical interconnect has gradually become a bottleneck for improving the performance of multiple-core processors due to its limited bandwidth, high power consumption, and long latency. The optical interconnect is considered as a potential way to solve this issue. The optical router is the key device for realizing the optical interconnect. Its basic function is to achieve the data routing and switching between the local node and the multi-node. In this paper we present a five-port optical router for Mesh photonics network-on-chip. A five-port optical router composed of eight thermally tuned silicon Mach-Zehnder optical switches is demonstrated. The experimental spectral responses indicate that the optical signal-to-noise ratios of the optical router are over 13 dB in the wavelength range of 1525-1565 nm for all of its 20 optical links. Each optical link can manipulate 50 wavelength channels with the channel spacing of 100 GHz and the data rate of 32 Gbps for each wavelength channel in the same wavelength range. The lowest energy efficiency of the optical router is 43.4 fJ/bit.  相似文献   

10.
The power overhead of Networks-on-Chip (NoCs) becomes tremendous in high density Multiprocessor Systems-on-Chip (MPSoCs). Especially in hard real-time and safety-critical systems, power management mechanisms must be developed and efficiently adhered to real-time requirements. However, state-of-the-art solution typically induces a high timing overhead, thus challenging safety, or has limited power saving capabilities. Additionally, current power-gating mechanisms do not provide an upper bound of the latency overhead, and thus no timing guarantees. We propose a safe and enhanced approach for power-gating that allows a global and dynamic power management under timing guarantees, i.e., all deadlines of critical tasks are met. It introduces a control-layer to save power on the NoC data layer using multiple Power-Aware Traffic-Monitor (PATM) units, which apply knowledge of the global state of the system to efficiently save power on NoC routers even at high NoCs utilizations. To safely apply the PATMs in hard real-time systems while meeting the deadlines, we provide a formal worst-case timing analysis to derive PATMs upper bound latency overhead. Experimental results show that our approach efficiently reduces static power consumption, and provides scalability inducing very small area overhead.  相似文献   

11.
Reducing the NoC power is critical for scaling up the number of nodes in future many-core systems. Most NoC designs adopt packet-switching to benefit from its high throughput and excellent scalability. These benefits, however, come at the price of the power consumption and latency overheads of routers. Circuit-switching, on the other hand, enjoys a significant reduction in power and latency of communication by directing data over pre-established circuits, but the relatively large circuit setup time and low resource utilization of this switching mechanism is often prohibitive. In this paper, we address one of the major problems of circuit-switching, i.e. the circuit setup time overhead, by an efficient and fast algorithm based on the time-division multiplexing (TDM) scheme. We then further improve the performance by reserving circuits for anticipated messages, and hence completely hide circuit setup time. To address the low resource utilization problem, we integrate the proposed circuit-switching into a packet switched NoC and use unused circuit resources to transfer packet-switched data. Evaluation results show considerable reduction in NoC power consumption and packet latency.  相似文献   

12.
基于硅基波导、十字状波导交叉和基于波导微环的光交换器件的损耗特性,对 Torus结构的芯片上光互连网络建立了损耗模型,利用该模型来对芯片上光互连网络进行光器件级、光路由器级和网络级的损耗特性分析,同时建立芯片上光互连网络损耗自动分析系统。依据该系统可以得到不同网络规模下的最大损耗,并分别分析了基于crossbar、cygnus和crux路由器的torus结构网络的损耗特性。可以得到,传输损耗随着网络规模的扩展而增加,最小的传输损耗出现在M=N时。同时,可以得到采用Crux路由器构成的芯片上光互连网络的传输损耗最小,小于Cygnus构成的芯片上光互连网络约5dB。  相似文献   

13.
The authors introduce a circuit partitioning method based on analysis of reconvergent fan-out. A corolla is defined as a set of overlapping reconvergent fan-out regions. The authors partition the circuit into a set of disjoint corollas and use the corollas to resynthesize the circuit. The authors develop the notion of resynthesis potential of a logic circuit and use it to select corollas that resynthesize with most gain. It is shown that resynthesis of large benchmark circuits using the corollas consistently reduces transistor pairs and layout area while improving delay and testability. The use of don't cares to further minimize the corollas in the local context and the global context is explored  相似文献   

14.
For real-time communication services to achieve widespread usage, it is important that network managers be allowed to control the services effectively. An important management capability concerns resource partitioning. Resource partitioning is useful for a number of applications, including the creation of virtual private subnetworks and of mechanisms for advance reservation of real-time network services, fast establishment of real-time connections, and mobile computing with real-time communication. In previous work, the authors presented a scheme for resource partitioning in a guaranteed performance networking environment with EDD-based packet scheduling disciplines. The present paper gives the results of research in resource partitioning, with admission control tests for resource partitioned servers for four representative scheduling disciplines, FIFO, WFQ, RCSP and EDD. The simulations confirm the intuition that resource fragmentation losses due to resource partitioning are small and that resource partitioning reduces the admission control computation overhead. An interesting result from the simulation experiments is that, under circumstances that arise naturally in multi-party communication scenarios, resource partitioning results in higher overall connection acceptance rate. The authors also present experiences with implementing resource partitioning in the second generation of Tenet real-time protocols; this implementation, with resource partitioned servers, runs on multiple platforms, including Sun workstations under SunOS, DEC workstations under Ultrix, and PCs under BSDI Unix  相似文献   

15.
Kasik  D.J. 《Multimedia, IEEE》2004,11(1):32-41
Graphic display technologies have traditionally targeted devices with midsize screens. However, devices with small and large screen sizes are gaining popularity, with users increasingly attempting to access complex images using small-screen devices. To display high-quality images irrespective of screen size, new methods of visualization become necessary. This article starts to address the difficult problem of finding acceptable methods to deal with screen-size variations. Instead of approaching the problem from a technology perspective, my techniques preserve graphic communication impact for specific visual analysis tasks without extensively modifying existing images or overly constraining authors of new images. Adapting visual content to retain graphic communication impact is essential to maximize the effectiveness of new devices.  相似文献   

16.
An architecture-synthesis technique for the low-power implementation of real-time applications is presented. The technique uses algorithm partitioning to preserve locality in the assignment of operations to hardware units. This results in reduced usage of long high-capacitance buses, fewer accesses to multiplexors and buffers, and more compact layouts. Experimental results show average reductions in bus and multiplexor power of 57.8 and 56.0%, respectively, resulting in an average reduction of 25.8% in total power. In addition, we analyze the effect of varying levels of partitioning on power consumption and present models for estimating bus capacitance  相似文献   

17.
The authors introduce a new bit-serial algorithm for stack filtering, designated as the bit-serial window partitioning algorithm. It is shown that the proposed algorithm can achieve very important savings over the conventional bit-serial binary-tree search algorithm, in terms of the computational speed. This improved efficiency is obtained by evaluating the Boolean function at thresholds corresponding to the sample values within the filter window, and by taking advantage of the ordering information associated with the threshold sequences  相似文献   

18.
Partitioning is an important step in the top-down design of large complicated integrated circuits. In this paper, a simple yet effective partitioning technique is described. It is based on the clustering of “closely” connected cells and the gradual enforcement of size-constraints. At the beginning, clusters are formed in the bottom-up fashion to reduce the problem size. Then the clusters are partitioned using several different parameters to find a good starting point. The best result achieved during the cluster partitioning is used as the initial solution for the lower level partitioning. The gradual constraint enforcement technique is used to cope with the local minimum problems. It allows cells or clusters to move with more freedom among the subsets during earlier iterations and thus may effectively find a near optimum solution. Several experimental results show that the new partitioning technique produces favorable results. In particular, the method outperforms the F&M method by more than 60% in the number of crossing nets on average  相似文献   

19.
In this paper, we demonstrate the use of finite-dimension linear programming to maximize the number of partial good multicore processor chips in a symmetric multiprocessing (SMP) node of a given logical size and physical footprint. It is asserted that to the first order the cost of a productized processor chip will be proportional to the scrap of a processor chip containing good cores but being unusable for the implementation of an SMP node. Therefore, the tradeoff between the number of processing units (PUs) on a chip and the total number of PUs on an SMP node is examined. This paper shows that an optimized SMP offering can be found so that the total chip cost of a high-end system can be minimized. However, such cost reduction will limit the SMP node size for a given processor chip yield. It will also be shown that as the chip yield improves the SMP node size that can be profitably implemented will increase.  相似文献   

20.
In a typical distributed/parallel database system, a request mostly accesses a subset of the entire database. It is, therefore, natural to organize commonly accessed data together and to place them on nearby, preferably the same, machine(s)/site(s). For this reason, data partitioning and data allocation are performance critical issues in distributed database application design. We are dealing with data partitioning. Data partitioning requires the use of clustering. Although many clustering algorithms have been proposed, their performance has not been extensively studied. Moreover, the special problem structure in clustering is rarely exploited. We explore the use of a genetic search-based clustering algorithm for data partitioning to achieve high database retrieval performance. By formulating the underlying problem as a traveling salesman problem (TSP), we can take advantage of this particular structure. Three new operators for GAs are also proposed and experimental results indicate that they outperform other operators in solving the TSP. The proposed GA is applied to solve the data-partitioning problem. Our computational study shows that our GA performs well for this application.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号