首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
随着互联网与云计算的发展,越来越多的应用被从本地迁移到云端,这些应用最终被运行在共享的数据中心.受到数据中心应用复杂并且需求多变特征的影响,传统体系结构中的部分硬件部件(如共享末级缓存、内存控制器、I/O控制器等)固定功能的设计不能很好地满足这些混合多应用的场景需求.为满足这类应用场景的需求,计算机体系结构需要提供一种可编程硬件机制,使得硬件功能能够根据应用需求的变化进行调整.提出了一种可编程数据平面方法:通过在现有硬件部件中增加可编程处理器,使用执行固件代码的方式对硬件的请求进行处理,并通过更新数据平面处理器固件的方式实现硬件功能的扩展.该方法在FPGA原型系统中进行验证,其结果表明,该方法并没有给系统性能带来严重的影响,只使用有限的资源即可为硬件增加更为灵活的可编程能力,使其能够适应应用需求复杂多变的场景.  相似文献   

2.
A single-chip multiprocessor for multimedia: the MVP   总被引:2,自引:0,他引:2  
The multimedia video processor (MVP) architecture, which incorporates a variety of parallel processing techniques to deliver very high performance to a wide range of imaging and graphics applications, is described. The MVP combines, on a single semiconductor chip, multiple fully programmable processors with multiple data streams connected to shared RAMs through a crossbar network. Each of the independent processors can execute many operations in parallel every cycle. The architecture is scalable and supports different numbers of processors to meet the cost and performance requirements of different markets. MVP's target environment and the development of MVP are outlined  相似文献   

3.
数字信号处理器(DSP)结构设计及发展趋势   总被引:4,自引:0,他引:4  
高速信息化的时代需要更高性能的数字信号处理器(DSP),以满足网络通信和3G移动通信等方面的要求。该文分析了早期DSP处理器的结构特点和当今最先进的体系结构,结合应用背景着重探讨了不同DSP体系结构和它们各自的优势和劣势,在研究了数字信号处理新应用领域的特点后,根据今后的半导体制造工艺和微处理器体系结构设计的发展,指出了DSP处理器在微结构设计方面的发展趋势。  相似文献   

4.
实时微处理器体系结构综述   总被引:1,自引:0,他引:1       下载免费PDF全文
实时应用已经成为嵌入式应用中一类快速崛起的典型应用。作为实时系统的核心部件,实时微处理器体系结构是微处理器领域的一个重要研究方向。与通用处理器追求最大吞吐量不同,实时处理器要求具有紧凑且可计算的最坏执行时间。传统的实时处理器往往采用较为简单的处理器结构,避免复杂结构引入执行时间的不确定性。随着实时应用对处理器性能需求越来越高,实时处理器正逐渐向多线程与多核结构发展。在多线程与多核处理器中,共享资源竞争导致实时系统的确定性变差,对实时处理器体系结构带来了更大挑战。对实时微处理器体系结构进行综述,首先从指令集、微体系结构、存储、I/O、任务调度等多个方面对传统实时处理器进行分析;然后分别对采用多线程与多核结构的高性能实时处理器展开分析;最后对几种商用实时处理器结构进行比较,总结实时处理器发展现状与未来发展趋势。  相似文献   

5.
赵姗  杨秋松  李明树 《软件学报》2019,30(4):1164-1190
为了满足应用程序的多样化需求,异构多核处理器出现并逐渐进入市场,其中的处理核心(core)具有不同的微架构或者指令集架构(ISA),为应用提供多样化特性支持,比如指令级并行(ILP)、内存级并行(MLP),这些核心协同工作满足整个计算系统的优化目标,比如高性能、低功耗或者良好的能效.然而,目前主流的调度技术主要是针对传统同构处理器架构设计,没有考虑异构硬件能力的差异性.在异构多核处理器环境下,调度技术如何感知硬件的异构特性,为不同类型的应用程序提供更加合适和匹配的硬件资源,这是值得探索的问题.对近年来在该研究领域的成果进行了综述研究,特别是在性能非对称多核处理器架构下,异构调度技术面临的优化目标、分析模型、调度决策和算法评估等主要问题进行了分析和描述,并依次对相关技术进行了系统的总结,最后从软硬件融合的角度对今后的研究工作进行了展望.  相似文献   

6.
嵌入式系统对处理器功耗开销有严格的限制,异步电路技术可以作为设计低功耗处理器的有效方法之一。针对嵌入式多媒体应用,本文设计实现了一款低功耗异步微处理器——腾越-Ⅱ。处理器中包含一个异步TTA微处理器内核、一个同步TTA微处理器内核、两个存储控制器和多个外部通信接口。异步内核通过基于宏单元的异步电路设计方法实现,其它部分通过基于标准单元的半定制设计流程实现。处理器芯片采用UMC0.18μmCMOS工艺实现,基片面积为4.89×4.89mm2,工作电压为1.8V。经测试,处理器工作主频达到200MHz,且异步内核的功耗开销低于同步内核的50%。  相似文献   

7.
配置流驱动计算体系结构指导下的ASIP设计   总被引:1,自引:0,他引:1  
为了兼顾嵌入式处理器设计中的灵活性与高效性,提出配置流驱动计算体系结构.在体系结构设计中将软/硬件界面下移,使功能单元之间的互连网络对编译器可见,并由编译器来完成传输路由,从而支持复杂但更为高效的互连网络.在该体系结构指导下,提出一种支持段式可重构互连网络的专用指令集处理器(ASIP)设计方法.该方法应用到密码领域的3类ASIP设计中表明,与简单总线互连相比,在不影响性能的前提下,可平均节约53%的互连功耗和38.7%的总线数量,从而达到减少总线数量、降低互连功耗的目的.  相似文献   

8.
异构多核处理器体系结构设计研究   总被引:2,自引:0,他引:2  
多核技术成为当今处理器发展的重要方向,异构多核处理器由于可将不同类型的计算任务分配到不同类型的处理器核上并行处理,从而为不同需求的应用提供更加灵活、高效的处理机制而成为当今研究的热点.本文从体系结构的角度探讨了异构多核处理器设计中的关键点,从内核结构、互连方式、存储系统、操作系统支持、测试与验证、动态电压调节等方面分析...  相似文献   

9.
New Products     
Michalopouios  D.A. 《Computer》1978,11(5):92-98
EAI's new hybrid computer system, called Hyshare, is aimed at the multi-user, multi-task application demands of larger- scale simulation and scientific computation laboratories. Consisting of an EAI 3200 digital computer and up to six EAI analog processors, the analog/digital and digital/ analog communications interface employs on-line, dynamic resource allocation techniques which allow analog processors to be assigned to separate tasks or linked together to meet specific application requirements.  相似文献   

10.
In this paper, a closed queuing network model with both single and multiple servers has been proposed to model dataflow in a multi-threaded architecture. Multi-threading is useful in reducing the latency by switching among a set of threads in order to improve the processor utilization. Two sets of processors, synchronization and execution processors exist. Synchronization processors handle load/store operations and execution processors handle arithmetic/logic and control operations. A closed queuing network model is suitable for large number of job arrivals. The normalization constant is derived using a recursive algorithm for the given model. State diagrams are drawn from the hybrid closed queuing network model, and the steady-state balance equations are derived from it. Performance measures such as average response times and average system throughput are derived and plotted against the total number of processors in the closed queuing network model. Other important performance measures like processor utilizations, average queue lengths, average waiting times and relative utilizations are also derived.  相似文献   

11.
For embedded applications with data-level parallelism, a vector processor offers high performance at low power consumption and low design complexity. Unlike superscalar and VLIW designs, a vector processor is scalable and can optimally match specific application requirements.To demonstrate that vector architectures meet the requirements of embedded media processing, we evaluate the Vector IRAM, or VIRAM (pronounced "V-IRAM"), architecture developed at UC Berkeley, using benchmarks from the Embedded Microprocessor Benchmark Consortium (EEMBC). Our evaluation covers all three components of the VIRAM architecture: the instruction set, the vectorizing compiler, and the processor microarchitecture. We show that a compiler can vectorize embedded tasks automatically without compromising code density. We also describe a prototype vector processor that outperforms high-end superscalar and VLIW designs by 1.5x to 100x for media tasks, without compromising power consumption. Finally, we demonstrate that clustering and modular design techniques let a vector processor scale to tens of arithmetic data paths before wide instruction-issue capabilities become necessary.  相似文献   

12.
以图计算为代表的数据密集型应用获得越来越广泛的关注,而传统的高性能计算机处理这类应用的效率较低.面向未来高性能计算机体系结构要有效支持数据密集型计算,深入研究以广度优先搜索(breadth-first search, BFS)算法为代表的图计算的典型特征,设计实现轻量级启发式切换BFS算法,该算法通过基本搜索方式的自动切换,避免冗余内存访问,提高搜索效率;针对BFS算法的离散随机数据访问特征以及众核处理器执行机制,建立面向BFS算法的众核处理器体系结构分析模型;全面、深入研究了BFS算法在典型众核处理器上的运行特征和性能变化趋势.测试结果表明:Cache命中率、内存带宽、流水线利用效率等相关参数均处于较低水平,无法完全满足BFS算法的需求,因此需要能够支持大量离散随机访问和简单执行机制的新型众核处理器体系结构.  相似文献   

13.
Advances in computer technology are now so profound that the arithmetic capability and repertoire of computers can and should be expanded. Nowadays the elementary floating-point operations +, −, ×, / give computed results that coincide with the rounded exact result for any operands. Advanced computer arithmetic extends this accuracy requirement to all operations in the usual product spaces of computation: the real and complex vector spaces as well as their interval correspondents. This enhances the mathematical power of the digital computer considerably. A new computer operation, the scalar product, is fundamental to the development of advanced computer arithmetic.This paper studies the design of arithmetic units for advanced computer arithmetic. Scalar product units are developed for different kinds of computers like personal computers, workstations, mainframes, super computers or digital signal processors. The new expanded computational capability is gained at modest cost. The units put a methodology into modern computer hardware which was available on old calculators before the electronic computer entered the scene. In general the new arithmetic units increase both the speed of computation as well as the accuracy of the computed result. The circuits developed in this paper show that there is no way to compute an approximation of a scalar product faster than the correct result.A collection of constructs in terms of which a source language may accommodate advanced computer arithmetic is described in the paper. The development of programming languages in the context of advanced computer arithmetic is reviewed. The simulation of the accurate scalar product on existing, conventional processors is discussed. Finally the theoretical foundation of advanced computer arithmetic is reviewed and a comparison with other approaches to achieving higher accuracy in computation is given. Shortcomings of existing processors and standards are discussed.  相似文献   

14.
Traditionally, mechanically steered dishes or analog phased array beamforming systems have been used for radio frequency receivers, where strong directivity and high performance were much more important than low-cost requirements. Real-time controlled digital phased array beamforming could not be realized due to the high computational requirements and the implementation costs. Today, digital hardware has become powerful enough to perform the massive number of operations required for real-time digital beamforming. With the continuously decreasing price per transistor, high performance signal processing has become available by using multi-processor architectures. More and more applications are using beamforming to improve the spatial utilization of communication channels, resulting in many dedicated digital architectures for specific applications. By using a reconfigurable architecture, a single hardware platform can be used for different applications with different processing needs.In this article, we show how a reconfigurable multi-processor system-on-chip based architecture can be used for phased array processing, including an advanced tracking mechanism to continuously receive signals with a mobile satellite receiver. An adaptive beamformer for DVB-S satellite reception is presented that uses an Extended Constant Modulus Algorithm to track satellites. The receiver consists of 8 antennas and is mapped on three reconfigurable Montium TP processors. With a scenario based on a phased array antenna mounted on the roof of a car, we show that the adaptive steering algorithm is robust in dynamic scenarios and correctly demodulates the received signal.  相似文献   

15.
随着工艺技术的发展以及嵌入式实时应用范围的不断扩大和需求的不断提升,多核处理器必将凭其高性能和低功耗特性应用到嵌入式实时领域中。但是,多核处理器体系结构很难甚至无法满足实时系统的实时限制和对WCET的可预测性要求。从多核中的共享资源入手,分析多核中的片上共享资源(共享Cache、片上互连)和片外共享资源(片外存储)对WCET分析的影响,探讨了各种干扰下的WCET分析方法。介绍了两种多核WCET分析模型:多核静态WCET分析模型和多核混合WCET分析模型;同时,针对嵌入式实时应用提出了多核设计原则。  相似文献   

16.
《Real》2002,8(5):345-356
In this article, we present a new reconfigurable parallel architecture oriented to video-rate computer vision applications. This architecture is structured with a two-dimensional (2D) array of FPGA/DSP-based reprogrammable processors Pij. These processors are interconnected by means of a systolic 2D array of FPGA-based video-addressing units which allow video-rate links between any two processors in the net to overcome the associated restrictions in classic crossbar systems such as those which occur with butterfly connections. This architecture has been designed to deal with parallel/pipeline procedures, performing operations which handle various simultaneous input images, and cover a wide range of real-time computer vision applications from pre-processing operations to low-level interpretation. This proposed open architecture allows the host to deal with final high-level interpretation tasks. The exchange of information between the linked processorsPij of the 2D net lies in the transfer of complete images, pixel by pixel, at video-rate. Therefore, any kind of processor satisfying such a requirement can be integrated. Furthermore, the whole architecture has been designed host-independent.  相似文献   

17.
李诚  李华伟 《计算机工程》2007,33(2):252-254
随着网络带宽的飞速增长和各种新的网络应用不断涌现,原有的基于通用处理器和ASIC的互联网架构已经不能满足新的需求。兼具强大处理能力和灵活可编程配置能力的网络处理器逐渐得到广泛的应用。高性能的网络处理器通常采用多个并发的处理单元进行数据平面的快速处理,这些处理单元在网络处理器中居于核心的地位。该文讨论了网络处理器中处理单元设计需要考虑的因素,设计了一种较为灵活有效的处理单元架构,并进行了FPGA原型验证,证实了该结构的可行性。  相似文献   

18.
熊庭刚  马中 《计算机工程与设计》2005,26(9):2400-2403,2406
为了满足应用对性能、可靠性和成本的严格要求,适度并行计算机系统必须具有很高的系统可用度.CPCI总线热切换规范作为一个工业标准,为实现高可用的适度并行计算机系统提供了良好的基础.提出了一种基于热切换技术实现高可用适度并行计算机系统的体系结构.运用该结构设计的适度并行计算机系统能够实现高效的容错并行计算,系统的可用性服务机制具有标准结构,较好地满足了应用的要求.  相似文献   

19.
现场可编程门阵列(FPGA)在计算机视觉应用领域有着广阔的前景,然而FPGA有限的片上存储器资源难以满足应用场景下性能、尺寸和功率的需求。针对这个问题,研究片上存储器的资源分配,在最小化片上资源使用和整体功耗的前提下提出一种易于实现的分区平衡算法。实验结果表明,与商用FPGA高级综合工具相比,本文算法的利用率提高达60%,且动态功耗降低了约70%。在高级算法MeanShift跟踪的实验中,实验结果显示,分区算法可以在不影响关键性能的前提下降低总功耗高达30%。  相似文献   

20.
Designing highly efficient embedded programs requires efficient tools to support performance monitoring and tuning of embedded software. Several such tools are available for various embedded processors. To effectively meet the energy consumption requirements of embedded systems, programmers try to understand the energy and power consumption of embedded systems as a high-priority monitoring target. The paper discusses SES, a highly integrated tool that delivers cycle-by-cycle power consumption data for optimizing embedded programs  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号