首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
为了以最优的资源、最少的带宽,实现龙芯系统中可用的显示控制器,介绍一种集成了显示控制器的龙芯2E北桥及显示控制器的设计和实现,并讨论了测试数据分析和性能优化工作。测试结果表明,集成了简单的显示控制器的龙芯2E系统完全能满足低端应用的要求。  相似文献   

2.
龙芯2号同时多线程处理器的软硬件接口设计   总被引:1,自引:0,他引:1  
随着生产工艺的提高,芯片上能集成越来越多的晶体管,多线程技术也逐步成为一种主流的处理器体系结构技术,而多线程处理器的软硬件接口也就成为急需解决的问题.在分析同时多线程的软件需求的基础上,提出龙芯2号同时多线程处理器的软硬件接口协同设计解决方案,给出相应的操作系统实现方案.同时,在Linux 2.4.20的基础上实现了龙芯2号同时多线程处理器相应的操作系统.通过运行SPEC CPU2000等测试程序进行性能评测,充分说明实现软硬件接口的龙芯2号同时多线程处理器极大地提高了多进程负载的性能.分析和设计方案不仅适用于同时多线程处理器,而且对于片内多核处理器的设计也有借鉴作用.  相似文献   

3.
Although the design of many kinds of microprocessors has been under developing for several decades,the computer architecture R&D community lacks well documented lessons and experiences about design decisions in the research literature.In this paper,we systematically present the design decisions we made during the designing and prototyping of Godson-2 series processors.The 250MHz Godson-2B,450MHz Godson-2C,and 1GHz Godson-2E processors that implement 64-bit,four-issue,out-of-order architecture were taped out in 2003,2004,and 2005,respectively.Each processor triples its predecessor in the SPEC CPU2000 rates.Our first-hand experiences and lessons gained from these designs would provide unique perspectives and insights that are not available in any existing text books and/or published papers.We summarize 10 critical lessons and experiences based on hundreds of our attempts at architectural and design optimizations for performance improvement of Godson-2 series processors.The issues include silicon-simulation correlation,design balancing,performance optimizing,and pico-architecture tuning.We conclude that persistent improvement,attitude towards work-on-silicon design, and insightful understanding of software and fabrication process are the three most important factors for designing a high performance processor with low energy consumption.  相似文献   

4.
计算机系统整体性能的提高不仅仅依赖于处理器计算能力的提升也需要高性能芯片组的有力支持.芯片组承担着CPU和外围设备通信的重任,而且目前大多数系统中采用把内存控制器集成在北桥中的方法,这更加突出了北桥在访存性能以至于在整个系统中的关键作用.以高性能为目标,龙芯2C处理器配套北桥芯片NB2005的设计和优化采用了很多新的方法和技术,其中包括根据程序行为进行动态Page管理的内存控制电路,一种与内存控制电路状态相结合的预取策略和具备高吞吐量低延迟的PCI通道设计等.性能测试和分析表明,搭配NB2005的龙芯2C系统访存带宽要比搭配Marvell GT64240北桥的系统提高40%以上,运行SPEC CPU2000浮点和定点程序的性能分别提高了12.2%和2.5%,磁盘I/O的性能也提高了30%.  相似文献   

5.
This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the aggressive out-of-order execution and memory hierarchy techniques help Godson-2E to achieve high performance. The Godson-2E processor has been physically designed in a 7-metal 90nm CMOS process using the cell-based methodology with some bitsliced manual placement and a number of crafted cells and macros. The processor can be run at 1GHz and achieves a SPEC CPU2000 rate higher than 500.  相似文献   

6.
龙芯2号处理器的同时多线程设计   总被引:1,自引:0,他引:1  
提出了适合龙芯2号处理器的同时多线程处理器模型,并介绍了具体的微体系结构设计以及相应的Linux操作系统的实现方案.通过在设计的龙芯2号同时多线程处理器上启动Linux操作系统,并运行应用程序,例如SPEC CPU2000,进行性能评测.结果表明,龙芯2号同时多线程处理器通过挖掘线程级并行性,将龙芯2号处理器的性能提高了31.1%.  相似文献   

7.
基于SimpleScalar的龙芯CPU模拟器Sim-Godson   总被引:6,自引:1,他引:6  
现代高性能通用处理器的设计越来越复杂,模拟器在处理器设计中所起的作用越来越大.龙芯2号是中国科学院计算技术研究所研制的高性能通用处理器.最早开发的龙芯2号的模拟器ICT-Godson是信号级模拟器,它模拟了处理器的所有细节,十分准确,但速度和灵活性有较大限制.文章基于SimpleScalar工具集,设计并实现了龙芯2号的模拟器Sim-Godson.Sim-Godson具有高速度和高灵活性的优点,且准确性也很高.在3.0GHz的Pentium4微机上,Sim-Godson速度约为500K指令/s.大部份测试程序在Sim-Godson上的IPC(Instruction Per Cycle)与ICT-Godson相差不到5%,达到了很高的准确性.Sim-Godson在龙芯2号的性能分析工作中发挥了重要作用.  相似文献   

8.
The Godson-3B processor is a powerful processor designed for high performance servers including Dawning Servers.It offers significantly improved performance over previous Godson-3 series CPUs by incorporating eight CPU cores and vector computing units.It contains 582.6 M transistors within 300 mm2 area in 65 nm technology and is implemented in parallel with full hierarchical design flows.In Godson-3B,advanced clock distribution mechanisms including GALS (Globally Asynchronous Locally Synchronous) and clock mesh are adopted to obtain an OCV tolerable clock network.Custom-designed de-skew modules are also implemented to afford further latency balance after fabrication.The power reduction of Godson-3B is maintained by MLMM (Multi Level Multi Mode) clock gating and multi-threshold-voltage cells substitution schemes.The highest frequency of Godson-3B is 1.05 GHz and the peak performance is 128 GFlops (double-precision) or 256 GFlops (single-precision) with 40 W power consumption.  相似文献   

9.
Microarchitecture of the Godson-2 Processor   总被引:23,自引:3,他引:23       下载免费PDF全文
The Godson project is the first attempt to design high performance general-purpose microprocessors in China. This paper introduces the microarchitecture of the Godson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implements the 64-bit MlPS-like instruction set. The adoption of the aggressive out-of-order execution techniques (such as register mapping, branch prediction, and dynamic scheduling) and cache techniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps the Godson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processor has been physically implemented on a 6-metal 0.18μm CMOS technology based on the automatic placing and routing flow with the help of some crafted library cells and macros. The area of the chip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3ns.  相似文献   

10.
The Godson-3A microprocessor is a quad-core version of the scalable Godson-3 multi-core series.It is physically implemented based on the 65 nm CMOS process.This 174 mm~2 chip consists of 425 million transistors.The maximum frequency is 1GHz with a maximum power consumption of 15 W.The main challenges of Godson-3A physical implementation include very large scale,high frequency requirement,sub-micron technology effects and aggressive time schedule.This paper describes the design methodology of the physical...  相似文献   

11.
龙芯2号处理器设计和性能分析   总被引:16,自引:4,他引:16  
介绍龙芯2号处理器设计及其性能测试结果.龙芯2号采用四发射超标量超流水结构。片内一级指令和数据高速缓存各64KB,片外二级高速缓存最多可达8MB.为了充分发挥流水线的效率,龙芯2号实现了先进的转移猜测、寄存器重命名、动态调度等乱序执行技术以及非阻塞的Cache访问和load Speculation等动态存储访问机制.龙芯2号处理器采用0.18gm的CMOS工艺实现,在正常电压下的最高工作频率为500MHz,500MHz时的实测功耗为3~5W.龙芯2号单精度峰值浮点运算速度为20亿a/秒,双精度浮点运算速度为10亿a/秒,SPECCPU2000的实测性能是龙芯1号的8~10倍,综合性能已经达到PentiumⅢ的水平.目前芯片样机能流畅运行完整的64位中文Linux操作系统,全功能的Mozilla浏览器、多媒体播放器和OpenOffice办公套件,可以满足绝大多数桌面应用的要求.  相似文献   

12.
针对卫星数字电视接收的低成本应用,提出一种基于龙芯的DVB-S卫星数字电视接收系统方案。采用龙芯2E平台的PCI总线,充分应用龙芯处理器对MPEG-2的高效解码特性,结合特定前端调谐器和后端TS流捕获芯片设计整个系统。结果表明,该系统符合DVB- S/MPEG-2标准,且结构简单、便于实现、成本低廉,对拓展龙芯处理器产业化应用有重要工程应用价值。  相似文献   

13.
Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread.  相似文献   

14.
如何有效地利用处理器消耗的能量而得到尽可能高的性能成为了目前体系结构研究的热点,在研究中,结构级的功耗评估工具无疑具有重要的作用.在现有的结构级功耗模拟器中,往往只考虑了动态电路以及全定制实现方法下的功耗刻画,而忽略了以静态电路和标准单元设计为主的ASIC设计方法对处理器功耗带来的影响.由此,结合一款高性能、低功耗通用处理器--龙芯2号的具体实现,对其设计特点和功耗特性进行分析,实现了以龙芯2号处理器为基本研究对象的结构级功耗评估方法.该评估方法充分考虑了CMOS静态电路的结构级功耗刻画方法,因此更加适合目前以ASIC设计方法为主的高性能处理器结构的功耗评估.该结构级功耗评估方法与RTL级的功耗评估方法相比,具有速度快和灵活性好的优点.在2.4GHz的Intel Xeon上,该功耗评估方法的速度约为300K/s,是RTL级的评估方法的5000倍,而且误差很小.  相似文献   

15.
多核处理器的性能与系统软件有着密切的联系:操作系统是处理器与应用程序之间的接口,对于充分利用处理器特性和提高应用程序的性能起着极其重要的作用;编译器与处理器体系结构密切相关,一方面要产生处理器支持的二进制代码,另一方面还要结合处理器特性产生高效运行的代码,其性能好坏直接影响着系统的整体性能.为了提高龙芯3A系统的实际性能,从操作系统和编译器着手,结合龙芯3A微结构特征,进行了一系列有效的优化.这些措施包括CC-NUMA多核操作系统的实现、操作系统二级Cache锁机制、操作系统调度共享二级Cache分配、自动向量化编译和支持预取机制的编译等.实验结果表明,在系统软件中增加对处理器特性的支持,能够充分挖掘体系结构的优势,对系统性能有较大的好处.其性能优化技术对于其他处理器的优化也有一定的借鉴价值.  相似文献   

16.
矩阵乘法作为高性能计算中的关键组成部分,是一种具有计算和访存密集特点的典型应用,因此优化矩阵乘法的性能对通用处理器是非常重要的.为了提高矩阵乘法的性能,本文提出了一种性能模型,用于预测通用处理器上矩阵乘法的执行时间.该模型反映了矩阵乘法执行时间与通用处理器的运算部件、访存带宽、寄存器个数等结构参数之间的关系,可以指导处理器结构的优化来平衡计算和访存能力、提高执行速度.基于该模型本文给出了在一个优化的通用处理器结构中,寄存器个数和访存带宽应满足的理论下界.本文在Godson-3B处理器平台上对该性能模型进行了验证,实验结果表明矩阵乘法执行时间的预测精确度达到95%以上.基于该模型,本文还提出了一种对Godson-3B结构进行优化的方法,使矩阵乘法的执行时间减少了50%左右.  相似文献   

17.
随着生产工艺的提高,芯片上能集成越来越多的晶体管,多线程技术也逐步成为一种主流的处理器体系结构技术.提出一种融合同时多线程技术和微线程技术的新型体系结构同时多微线程(simultaneous multi-microthreading,SMMT),并给出同时多微线程体系结构的实现方案.SMMT有效结合同时多线程技术硬件代价小和微线程技术能够加速单进程应用的优点,通过软硬件协同的方式充分挖掘单进程程序的微线程级并行性.通过在设计的龙芯2号同时多微线程处理器上进行性能评测,结果表明,同时多微线程体系结构能够有效地加速单进程的程序,以很小的硬件代价显著地提高了处理器的性能.  相似文献   

18.
Godson-3: A Scalable Multicore RISC Processor with x86 Emulation   总被引:2,自引:0,他引:2  
The Godson-3 microprocessor aims at high-throughput server applications, high-performance scientific computing, and high-end embedded applications. It offers a scalable network on chip, hardware support for x86 emulation, and a reconfigurable architecture. The four-core Godson-3 chip is fabricated with 65-nm CMOS technology. Eight- and 16-core Godson-3 chips are in development.  相似文献   

19.
根据龙芯2号处理器体系结构的特点,引入浮点乘加、条件move和预取等一系列特殊指令,并且对开源编译器GCC进行修改使其支持这些特殊指令,同时对生成对应指令的算法进行了调整和优化.实践中已经证明,特殊指令的引入和相应的优化比较好的提升了应用程序的性能,达到了预期的效果.  相似文献   

20.
龙芯2号微处理器的功能验证   总被引:12,自引:0,他引:12  
开发龙芯2号这样的高性能通用处理器是一项极其复杂的艰巨任务.龙芯2号处理器的设计规模和复杂度比龙芯1号增加了许多倍,如何保证设计的正确性是一个重大挑战.简单的系统级测试已经不能满足设计的需要,这就要求采用多种有效的、先进的验证方法和工具帮助设计者尽可能早的发现和改正设计错误.主要介绍了在龙芯2号处理器的设计开发过程中采用的功能验证流程和主要验证方法.模拟仿真是主要的验证手段,新的形式化验证方法也应用到了验证流程当中.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号