首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
In the evolving submicron technology, making it particularly attractive to use decentralized designs. A common form of decentralization adopted in processors is to partition the execution core into multiple clusters. Each cluster has a small instruction window, and a set of functional units. A number of algorithms have been proposed for distributing instructions among the clusters. The first part of this paper analyzes (qualitatively as well as quantitatively) the effect of various hardware parameters such as the type of cluster interconnect, the fetch size, the cluster issue width, the cluster window size, and the number of clusters on the performance of different instruction distribution algorithms. The study shows that the relative performance of the algorithms is very sensitive to these hardware parameters and that the algorithms that perform relatively better with four or fewer clusters are generally not the best ones for a larger number of clusters. This is important, given that with an imminent increase in the transistor budget, more clusters are expected to be integrated on a single chip. The second part of the paper investigates alternate interconnects that provide scalable performance as the number of clusters is increased. In particular, it investigates two hierarchical interconnects - a single ring of crossbars and multiple rings of crossbars - as well as instruction distribution algorithms to take advantage of these interconnects. Our study shows that these new interconnects with the appropriate distribution techniques achieve an IPC (instructions per cycle) that is 15-20 percent better than the most scalable existing configuration, and is within 2 percent of that achieved by a hypothetical ideal processor having a 1-cycle latency crossbar interconnect. These results confirm the utility and applicability of hierarchical interconnects and hierarchical distribution algorithms in clustered processors.  相似文献   

A delay model for router microarchitectures   总被引:1,自引:0,他引:1  
This article introduces a router delay model that takes into account the pipelined nature of contemporary routers and proposes pipelines matched to the specific flow control method employed. Given the type of flow control and router parameters, the model returns router latency in technology-independent units and the number of pipeline stages as a function of cycle time. We apply this model to derive realistic pipelines for wormhole and virtual-channel routers and compare their performance. Contrary to the conclusions of previous models, our results show that the latency of a virtual channel router doesn't increase as we scale the number of virtual channels up to 8 per physical channel. Our simulation results also show that a virtual-channel router gains throughput of up to 40 % over a wormhole router  相似文献   

Complete formal verification has thus far never been achieved for a state-of-the-art, high-performance commercial microprocessor. However, this article presents a completion functions methodology, based on theorem proving, that has been applied successfully to a large variety of example pipelined architectures.  相似文献   

The vertical migration of complex application code into horizontal microcode makes traditional methods of handwritten and hand-optimized microcode with primitive assembly languages impractical. Higher-level languages that permit abstraction from low-level timing and concurrency details are considered a major step toward alleviating the problem. This approach is feasible only if compilers for these languages exist that can produce high-quality microcode and that can be targeted to new machines with modest effort and high reliability. An overview is provided of the Horizon retargetable microcode compiler, which facilitates the production of highly optimized microcode and the targeting of the compiler to specific machines  相似文献   

We present a Quality of Service (QoS)-supported on-chip communication that increases the shared communication resources for multi-processor systems on chip. Time-critical embedded systems require tight guaranteed services in terms of throughput, latency etc. in order to comply to hard real-time constraints. Typically, guaranteed-service schemes require dedicated/reserved resources (i.e. links) for communication and thus suffer from low resource utilization. So improving the bandwidth utilization by using the unused bandwidth among the other competing transactions in a fair fashion is an important issue. To the best of our knowledge, we are presenting the first approach for on-chip communication that provides a high resource utilization under a transaction-specific, flexible communication scheme. It provides tight time-related guarantees through our bounded arbitration scheme considering the lower and the upper bounds for each type of transactions. We demonstrate its advantages by means of a complete MPEG4 video decoder case study analysis and achieve under certain constraints a bandwidth utilization of up to 100% and 97% on average with a guaranteed 100% bandwidth. Thus, we provide an on-chip communication scheme that provides high bandwidth utilization while providing tight guarantee. Large parts of this work have previously been published at the IEEE International Conference on Hardware/Software Codesign and System Synthesis (Codes+ISSS), 2006.  相似文献   

进行片上网络的架构、映射、流控与服务质量(Quality of Service,QoS)等研究时,迫切需要一个准确的业务量模型用于延时分析与测试验证,以保证设计的性能。而现有的基于马尔科夫模型和回归模型的短程相关模型无法准确地描述业务量的突发性和分形特性,不适用于基于流水的通信信号处理片上系统(System on Chip,SoC)芯片。为了解决这个问题,通过理论与实验相结合的方法,研究了网络拓扑、任务流图、映射对业务量自相似性的影响,根据通信系统的信号处理特点建立了多处理器片上系统(Multi-core Processing System on Chip,MPSoC)数据关联模型,利用典型DSP系统进行建模实验,用实测的业务量Hurst参数拟合数据关联模型参数与Hurst参数的经验函数关系式,建立了用MPSoC数据关联模型预测和估计业务量Hurst参数的方法。实验表明,采用该业务量模型估计的Hurst参数与其真实值误差较小,能较准确地描述业务量的自相似性。  相似文献   

The authors describe the two architectures they have developed that use and demonstrate free-space optical interconnects for digital logic: a high-performance optoelectronic computing module and a second-generation digital optoelectronic computer. The low-power, high-performance optoelectronic computing (HPOC) module was designed for switching and data processing applications. Its architecture uses global, free-space, smart optical interconnects rather than electronic interconnects. The HPOC modules, which incorporate arrays of microlasers, diffractive optical interconnect elements and detectors, are currently undergoing prototype fabrication. By extending the high fan-in and fan-out capabilities of free-space optical interconnects, HPOC modules-when properly configured-offer a significant reduction in power, while maintaining good algorithmic efficiency and high noise margin  相似文献   

微处理器向多核发展是一种趋势,而且正在向许多核发展。连接结构的性能很大程度上决定着整个系统性能,因此有必要研究多核连接的高效机制。常用的总线在连接核心的数目比较多时,会因竞争而成为整个系统性能的瓶颈。而交叉开关竞争最少,是核间连接最高效的方法之一。文章设计一个用于核间连接的五级流水的交叉开关。它的输入和输出端均带有缓冲...  相似文献   

针对四轮转向(4WS)无人车辆路径跟踪中的过约束问题, 本文提出一种前后轮转向解耦的双点跟踪控制策略. 建立4WS车辆单轨运动学模型, 约束前后轮转向角速度, 规划曲率连续的回旋曲线参考位姿序列, 将其解耦为前后轴中心的双点参考轨迹; 以前后轮中心点为控制点, 采用非线性反馈控制的预瞄方法分别获得转向控制率, 双点跟踪误差指数收敛于0. 仿真和实车验证结果表明, 所提出的双点跟踪控制策略横向误差标准差减少0.2 m, 横摆角误差标准差减小3.0?, 具有更大的前后轮转角控制域和较高的跟踪精度  相似文献   

谢长生  李博  陆晨  王芬 《计算机科学》2010,37(7):296-300
SSD逐渐成为了存储业界研究的热点.提出基于片内SRAM的flash转换层设计--SBAST,通过SRAM缓存更新的页提高了SSD随机写的效率,并减少了不必要的擦除操作.通过SSDsim的仿真实验,论证了该设计的有效性,给出了后续的计划.  相似文献   

文章提出了一种嵌入式微处理器的在线调试模块。这个模块可以用较少的硬件开支实现一些强大的调试功能:响应硬件和软件触发,提供开始/停止调试模试;单步调试操作;程序执行的跟踪;代码内存、外部数据存储器、SFR、内部数据存储器的读和写。文章首先介绍了嵌入式微处理器可调试模块设计的原理,其次介绍了在线调试的结构设计,最后给出结论和分析。  相似文献   

针对嵌入式系统日益严峻的调试挑战,提出并实现了一种基于32 bit超标量DSP内核的片上调试与实时跟踪架构。该架构通过设计专用的跟踪接口与其他硬件资源,并扩展JTAG端口、存储器保护逻辑与流水线控制逻辑,以较低的硬件开销实现对内核的实时运行控制、内部寄存器与存储器的非侵入访问、带复杂触发条件的断点与观察点设置、硬件单步以及程序流的实时跟踪等典型特征的支持,可满足绝大部分嵌入式系统的开发与调试需求。  相似文献   

为了方便软件与应用系统的开发与调试,提出一种可复用的微处理器片上调试方法.通过设计通用的调试指令集和增加调试模块,并扩展处理器内核功能,实现了断点设置与取消、内核运行的流水级精确控制、内核资源访问、任意程序段运行中特殊事件的统计等片上调试功能.该方法已在自主研发的SuperV_EF01 DSP上实现.在CMOS 90 nm 工艺下的综合结果表明,新增的片上调试功能不影响SuperV_EF01 DSP的关键路径时序,而芯片总面积仅增加了3.87%.  相似文献   

The paper discusses the clustered failover configuration which connects two independent file-server appliances via a nonuniform-memory-access network. Combining NUMA interconnects and a proprietary, log-structured file system results in file service that survives hardware faults with minimal disruption to clients  相似文献   

Louri  A. Hongki Sung 《Computer》1994,27(10):27-37
Metal-based communications between subsystems and chips has become the limiting factor in high-speed computing. Maturing optics-based technologies offer advantages that may unplug this bottleneck. Optical interconnects offer high-speed computers key advantages over metal interconnects. These include (1) high spatial and temporal bandwidths, (2) high-speed transmission, (3) low crosstalk independent of data rates, and (4) high interconnect densities. Although faster device switching speeds will eventually be necessary for future massively parallel computing systems, the deciding factor in determining system performance and cost will be subsystem communications rather than device speed. Free-space optical interconnects, by virtue of their inherent parallelism, high data bandwidth, small size and power requirement, and relative freedom from mutual interference of signals, already show great promise in replacing metal interconnects to solve communication problems  相似文献   

This paper reviews several optical connecting devices that are based on microelectromechanical systems (MEMS) components. In this paper, we divide optical connecting devices into two categories. The first category includes MEMS-based optical switches developed for optical fiber communication, which perform optical switching, wavelength division multiplexing (WDM) routing, and/or optical cross connection. The other category consists of MEMS-based optical interconnects that have been constructed primarily for use in rack-to-rack, board-to-board, chip-to-chip, card-to-card and/or intra-chip interface connections. Working principles of these MEMS optical connecting devices will also be discussed in this paper.  相似文献   

This paper presents a Lie group setting for the problem of control of formations, as a natural outcome of the analysis of a planar two-vehicle formation control law. The vehicle trajectories are described using the planar Frenet–Serret equations of motion, which capture the evolution of both the vehicle position and orientation for unit-speed motion subject to curvature (steering) control. The set of all possible (relative) equilibria for arbitrary G-invariant curvature controls is described (where G=SE(2) is a symmetry group for the control law), and a global convergence result for the two-vehicle control law is proved. An n-vehicle generalization of the two-vehicle control law is also presented, and the corresponding (relative) equilibria for the n-vehicle problem are characterized. Work is on-going to discover stability and convergence results for the n-vehicle problem.  相似文献   


In a conventional steering system for a multi-axle crane, the steering angle of each axle is determined according to Ackermann’s steering principle, which minimizes the slip angles of the tires. The role of optimal steering control in improving a driver’s steering efficiency is hardly considered in Ackermann’s principle. To address this problem, this paper proposes a control strategy for determining the optimal steering angles for a multi-axle crane and thereby improving a driver’s steering efficiency by applying the model predictive control (MPC) algorithm and defining a driver’s intentions. A simplified crane model for the steering system was developed using a bicycle model, and a comparative study was carried out via simulation to analyze steering performance for the conventional (Ackermann) and proposed steering control systems for the cases of all-wheel steering and road steering modes. The simulation results show that both the minimum turning radius and the driver’s steering effort are decreased more by the proposed steering control system than by conventional system and that the proposed control strategy therefore yields better steering performance.


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号