首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
众核处理器设计在芯片面积上受到了巨大挑战,如何将有限的芯片面积投入到运算能力中,是众核处理器体系结构研究的热点。聚焦众核处理器的指令缓存结构设计,研究通过在多核核心之间共享一级指令缓存,以获取指令系统及处理器流水线性能的提升。给出了共享指令缓存的结构设计,对该结构进行了节拍级精确的性能模拟,并通过RTL级代码的综合得到了面积开销和时序指标。测试结果表明,共享指令缓存可以降低11%~27%的缓存脱靶率,提升4%~7%的流水线性能。  相似文献   

2.
提出了一种超椭圆曲线密码处理器并行结构设计.处理器由多个具有相同结构的核组成,每个核由一个控制器、一个寄存器文件、一个运算单元组成.多个独立的核之间通过寄存器共享进行通信来协作完成复杂运算.每个运算单元执行自定义多操作数指令A(B+C)+D,并在指令产生过程和执行时对指令进行灵活配置.该设计可以实现核之间的指令级并行处理和不同指令执行阶段的流水线处理.在FPGA上的实验结果表明,与以往研究相比,该设计可以实现对超椭圆曲线密码点乘运算更高的加速.  相似文献   

3.
Pipelining and bypassing in a VLIW processor   总被引:1,自引:0,他引:1  
This short note describes issues involved in the bypassing mechanism for a very long instruction word (VLIW) processor and its relation to the pipeline structure of the processor. The authors first describe the pipeline structure of their processor and analyze its performance and compare it to typical RISC-style pipeline structures given the context of a processor with multiple functional units. Next they study the performance effects of various bypassing schemes in terms of their effectiveness in resolving pipeline data hazards and their effect on the processor cycle time  相似文献   

4.
为解决嵌入式领域对处理器不同性能面积的需求,以及对重排序缓冲区阻塞,保留站派遣长短周期指令时导致的吞吐率不平衡及堵塞问题,设计并优化了一种简便配置的参数化流水线超标量处理器.通过定制化流水线中的分支预测,缓存与运算单元,将RISC-V指令划分5大类处理,对不同周期的执行单元采用级联与并行的混合分布方式,将充当排序缓存中...  相似文献   

5.
基于MIPS32 4k系列的处理器架构,提出一种AES算法的SIMD指令集扩展方法,利用处理器流水线对齐级和AES数据访问单元,实现64 bit数据位宽的并行处理操作。对不同实现方式的性能进行比较,结果表明,该方法的加解密运算性能有较大提高,硬件代价相对较小,且具有编程灵活性。  相似文献   

6.
In this paper we present a design for a general-purpose fuzzy processor, the core of which is based on an analog-numerical approach combining the inherent advantages of analog and digital implementations, above all as regards noise margins. The architectural model proposed was chosen in such a way as to obtain a processor capable of working with a considerable degree of parallelism. The internal structure of the processor is organized as a cascade of pipeline stages which perform parallel execution of the processes into which each inference can be decomposed. A particular feature of the project is the definition of a `fuzzy-gate', which executes elementary fuzzy computations, on which construction of the whole core of the processor is based. Designed using CMOS technology, the core can be integrated into a single chip and can easily be extended. The performance obtainable, in the order of 50 Mega fuzzy rules per second, is of a considerable level  相似文献   

7.
Implementation of a RISC microprocessor for programmable logic controllers   总被引:2,自引:0,他引:2  
A special purpose RISC (reduced instruction set computer) microprocessor for programmable logic controllers (PLC), named PLCRISC, is proposed. To develop an optimal PLCRISC, we analysed existing PLC programs currently used in factories, with special attention to the instruction execution characteristics and features required for a high performance PLC processor. Based on this analysis, an optimal RISC-style instruction set and an architecture suitable for the required features are suggested. In particular, the instruction format, the instruction pipeline, and the detailed internal architecture are the significant characteristics of the proposed PLCRISC. The performance enhancement achieved with a PLCRISC is seen from a straightforward evaluation. ASIC implementation with VHDL is also discussed. The PLCRISC is under fabrication in a 0.8 μm CMOS technology.  相似文献   

8.
This paper presents the design of a VLSI fuzzy processor, which is capable of dealing with complex fuzzy inference systems, i.e., fuzzy inferences that include rule chaining. The architecture of the processor is based on a computational model whose main features are: the capability to cope effectively with complex fuzzy inference systems; a detection phase of the rule with a positive degree of activation to reduce the number of rules to be processed per inference; parallel computation of the degree of activation of active rules; and representation of membership functions based on α-level sets. As the fuzzy inference can be divided into different processing phases, the processor is made up of a number of stages which are pipelined. In each stage several inference processing phases are performed parallelly. Its performance is in the order of 2 MFLIPS with 256 rules, eight inputs, two chained variables, and four outputs and 5.2 MFLIPS with 32 rules, three inputs, and one output with a clock frequency of 66 MHz  相似文献   

9.
Fuzzy inference, a data processing method based on the fuzzy theory that has found wide use in the control field, is reviewed. Consumer electronics, which accounts for most current applications of this concept, does not require very high speeds. Although software running on a conventional microprocessor can perform these inferences, high-speed control applications require much greater speeds. A fuzzy inference date processor that operates at 200000 fuzzy logic inferences per second and features 12-b input and 16-b output resolution is described  相似文献   

10.
Miki  T. Yamakawa  T. 《Micro, IEEE》1995,15(4):8-18
Our analog fuzzy processor features an inference speed of more than 1 million fuzzy logical inferences per second, excluding defuzzification. A rule chip processes fuzzy inferences while a second chip handles defuzzification, a functional division that facilitates flexible system configuration. The chips are compact fuzzy systems that save chip area and are suitable for built-in applications. They process high-speed fuzzy logic operations in parallel mode and, during execution of fuzzy inferences, feature an adaptable fuzzy system based on a rule set  相似文献   

11.
能效比是未来高性能计算机需要解决的重要问题.众核处理器作为高性能计算机的重要实现手段,其微结构的优化设计对能效比提升尤为关键.提出了1种面向众核处理器的流水线紧耦合的指令循环缓存设计,以较小的L0指令缓存提供更加高能效的指令取指.作为体系结构研究同硬件可实现性紧密结合的1次尝试,设计始终考虑了硬件实现代价这一关键约束.为了控制L0指令缓存对流水线性能的影响,指令缓存采用了循环出口预取技术,以此保证指令缓存提供的低功耗的指令取指能够最终转化为流水线能效比的提升.在gem5模拟器上实现了对指令循环缓存的模拟.对SPEC2006的测试结果表明,在不影响流水线性能的前提下,设计的典型配置可以减少27%的指令取指功耗以及31.5%的流水线前段部件动态功耗.  相似文献   

12.
This paper discusses the inherent parallelism limits on several applications for vector computers, the parallel capabilities of several architectures and two ways (traditional instruction control flow and data control flow) by which the capabilities can be used. Then a scheme for a pipelined vector processor of multi-processing units is presented. The basic system structure and its function on highly sparse vector processing are described. A vector cache system and a distributed main memory are also considered, which are intended to sustain extremely high access rates for the processor. A microprocessor based vector processor is constructed, which can simulate the high performance version of the processor.  相似文献   

13.
Sharangpani  H. Arora  H. 《Micro, IEEE》2000,20(5):24-43
The Itanium processor is the first implementation of the IA-64 instruction set architecture (ISA). The design team optimized the processor to meet a wide range of requirements: high performance on Internet servers and workstations, support for 64-bit addressing, reliability for mission-critical applications, full IA-32 instruction set compatibility in hardware, and scalability across a range of operating systems and platforms. The processor employs EPIC (explicitly parallel instruction computing) design concepts for a tighter coupling between hardware and software. In this design style the hardware-software interface lets the software exploit all available compilation time information and efficiently deliver this information to the hardware. It addresses several fundamental performance bottlenecks in modern computers, such as memory latency, memory address disambiguation, and control flow dependencies  相似文献   

14.
YHFT-DX是国防科技大学设计的一款高性能定点DSP。论文设计并实现了YHFT-DX指令控制流水线,提出了在YHFT-DX 超长指令字结构中跨取指包边界派发和指令预取的方法,有效提升了流水线的性能。对指令流水线进行了高频结构优化,将派发部件的关键路径延时压缩40%,满足了600 MHz频率的设计目标。  相似文献   

15.
In the high-speed free-form surface machining, the real-time motion planning and interpolation is a challenging task. This paper presents the design and implementation of a dedicated processor for the interpolation task in computerized numerical control (CNC) machine tools. The jerk-limited look-ahead motion planning and interpolation algorithm has been integrated in the interpolation processor to achieve smooth motion in the high-speed machining. The processor features a compactly designed floating-point parallel computing architecture, which employs a 3-stage pipelined reduced instruction set computer (RISC) core and a very long instruction word (VLIW) floating-point arithmetic unit. A new asynchronous execution mechanism has been employed in the processor to allow multi-cycle instructions to be performed in parallel. The proposed processor has been verified on a low-cost field programmable gate array (FPGA) chip in a prototype controller. Experimental result has demonstrated the significant improvement of the computing performance with the interpolation processor in the free-form surface machining.  相似文献   

16.
李勇  胡慧俐  杨焕荣 《计算机应用》2014,34(4):1005-1009
数字信号处理软件中循环程序在执行时间上占有很大比例,用指令缓冲器暂存循环代码可以减少程序存储器的访问次数,提高处理器性能。在VLIW处理器指令流水线中增加一个支持循环指令的缓冲器,该缓冲器能够缓存循环程序指令,并以软件流水的形式向功能部件派发循环程序指令。这样循环程序代码只需访存一次而执行多次,大大减少了访存次数。在循环指令运行期间,缓冲器发出信号使程序存储器进入睡眠状态可以降低处理器功耗。典型的应用程序测试表明,使用了循环缓冲后,取指流水线空闲率可达90%以上,处理器整体性能提高10%左右,而循环缓冲的硬件面积开销大约占取指流水线的9%。  相似文献   

17.
This paper presents a possible solution for the text inference problem-extracting information unstated in a text, but implied. Text inference is central to natural language applications such as information extraction and dissemination, text understanding, summarization, and translation. Our solution takes advantage of a semantic English dictionary available in electronic form that provides the basis for the development of a large linguistic knowledge base. The inference algorithm consists of a set of highly parallel search methods that, when applied to the knowledge base, find contexts in which sentences are interpreted. These contexts reveal information relevant to the text. Implementation, results, and parallelism analysis are discussed  相似文献   

18.
This paper presents the structural design and the functional characteristics of a RISC processor called Hermes-RISC. The design of the Hermes-RISC processor is based on the study and evaluation of a variety of assembly instruction sets. The Hermes-RISC is a pipeline superscalar RISC processor with four superscalar units. The first stage evaluation of the Hermes-RISC performance is also presented here. This evaluation is based on the execution of a set of primitive processing tasks, such as summation, multiplication of numbers, multiplication of matrices, sorting, finding maximum values among a set of numbers, procedure calls, etc. Moreover, the performance of Hermes-RISC is compared with a variety of RISC processors.  相似文献   

19.
Japan's fifth generation computer systems (FGCS) project aims at the research and development of new computer technology for knowledge information processing system (KIPS) that will be required in 1990s. In this project, logic programming is adopted for the base for software and hardware system to be developed. As a primitive operation of logic programming is syllogistic inference, machines studied and built in the project are called inference machines.

One of the project's target machines is a parallel inference machine (PIM) having about 1000 processing elements. Smaller scale PIMs are also planned as intermediate targets. In addition to PIMs, sequential inference machines (SIMs) have been developed for a software development tool. A personal type SIM is called PSI which is a logic programming workstation. For research and development of parallel software systems, especially, an operating system for PIM (PIMOS), a multi-PSI system which consists of several CPUs of PSI connected with a high-speed network, is also under development. In the intermediate stage plan of the project, parallel software research is emphasized and conducted more systematically.

This paper describes research and development plans for the parallel inference machine in conjunction with the parallel software research.  相似文献   


20.
Crawford  J.H. 《Micro, IEEE》1990,10(1):27-36
The author discusses the design goals of the i486 development program, which were to ensure binary compatibility with the 386 microprocessor and the 387 math coprocessor, increase performance by two to three times over a 386/387 processor system at the same clock rate, and extend the IBM PC standard architecture of the 386 CPU with features suitable for minicomputers. A cache integrated into the instruction pipeline lets this 386-compatible processor achieve minicomputer performance levels. The design and performance of the on-chip cache and the instruction pipeline are examined in detail  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号