期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张昆刘骁郑方谢向辉《计算机工程与科学》2017,39(5):834-840

众核处理器设计在芯片面积上受到了巨大挑战,如何将有限的芯片面积投入到运算能力中,是众核处理器体系结构研究的热点。聚焦众核处理器的指令缓存结构设计,研究通过在多核核心之间共享一级指令缓存,以获取指令系统及处理器流水线性能的提升。给出了共享指令缓存的结构设计,对该结构进行了节拍级精确的性能模拟,并通过RTL级代码的综合得到了面积开销和时序指标。测试结果表明,共享指令缓存可以降低11%~27%的缓存脱靶率,提升4%~7%的流水线性能。相似文献

2.

一种超椭圆曲线密码处理器并行结构设计

方跃坚沈晴霓吴中海《计算机研究与发展》2013,50(11):2383-2388

提出了一种超椭圆曲线密码处理器并行结构设计.处理器由多个具有相同结构的核组成,每个核由一个控制器、一个寄存器文件、一个运算单元组成.多个独立的核之间通过寄存器共享进行通信来协作完成复杂运算.每个运算单元执行自定义多操作数指令A(B+C)+D,并在指令产生过程和执行时对指令进行灵活配置.该设计可以实现核之间的指令级并行处理和不同指令执行阶段的流水线处理.在FPGA上的实验结果表明,与以往研究相比,该设计可以实现对超椭圆曲线密码点乘运算更高的加速. 相似文献

3.

Pipelining and bypassing in a VLIW processor 总被引：1，自引：0，他引：1

Abnous A. Bagherzadeh N. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(6):658-664

This short note describes issues involved in the bypassing mechanism for a very long instruction word (VLIW) processor and its relation to the pipeline structure of the processor. The authors first describe the pipeline structure of their processor and analyze its performance and compare it to typical RISC-style pipeline structures given the context of a processor with multiple functional units. Next they study the performance effects of various bypassing schemes in terms of their effectiveness in resolving pipeline data hazards and their effect on the processor cycle time 相似文献

4.

基于RISC-V参数化超标量处理器的优化设计

下载免费PDF全文

刘有耀潘宇晨《计算机工程与应用》2022,58(5):66-74

为解决嵌入式领域对处理器不同性能面积的需求,以及对重排序缓冲区阻塞,保留站派遣长短周期指令时导致的吞吐率不平衡及堵塞问题,设计并优化了一种简便配置的参数化流水线超标量处理器.通过定制化流水线中的分支预测,缓存与运算单元,将RISC-V指令划分5大类处理,对不同周期的执行单元采用级联与并行的混合分布方式,将充当排序缓存中... 相似文献

5.

AES算法的SIMD指令集扩展方法与实现

下载免费PDF全文

卢仕听王帅韩军曾晓洋《计算机工程》2011,37(6):121-123

基于MIPS32 4k系列的处理器架构,提出一种AES算法的SIMD指令集扩展方法,利用处理器流水线对齐级和AES数据访问单元,实现64 bit数据位宽的并行处理操作。对不同实现方式的性能进行比较,结果表明,该方法的加解密运算性能有较大提高,硬件代价相对较小,且具有编程灵活性。相似文献

6.

A VLSI fuzzy inference processor based on a discrete analogapproach

Catania V. Puliafito A. Russo M. Vita L. 《Fuzzy Systems, IEEE Transactions on》1994,2(2):93-106

In this paper we present a design for a general-purpose fuzzy processor, the core of which is based on an analog-numerical approach combining the inherent advantages of analog and digital implementations, above all as regards noise margins. The architectural model proposed was chosen in such a way as to obtain a processor capable of working with a considerable degree of parallelism. The internal structure of the processor is organized as a cascade of pipeline stages which perform parallel execution of the processes into which each inference can be decomposed. A particular feature of the project is the definition of a `fuzzy-gate', which executes elementary fuzzy computations, on which construction of the whole core of the processor is based. Designed using CMOS technology, the core can be integrated into a single chip and can easily be extended. The performance obtainable, in the order of 50 Mega fuzzy rules per second, is of a considerable level 相似文献

7.

Implementation of a RISC microprocessor for programmable logic controllers 总被引：2，自引：0，他引：2

Gab Seon Rho Kyeong-hoon Koo Naehyuck Chang Jaehyun Park Yeong-gi Kim Wook Hyun Kwon 《Microprocessors and Microsystems》1995,19(10):599-608

A special purpose RISC (reduced instruction set computer) microprocessor for programmable logic controllers (PLC), named PLCRISC, is proposed. To develop an optimal PLCRISC, we analysed existing PLC programs currently used in factories, with special attention to the instruction execution characteristics and features required for a high performance PLC processor. Based on this analysis, an optimal RISC-style instruction set and an architecture suitable for the required features are suggested. In particular, the instruction format, the instruction pipeline, and the detailed internal architecture are the significant characteristics of the proposed PLCRISC. The performance enhancement achieved with a PLCRISC is seen from a straightforward evaluation. ASIC implementation with VHDL is also discussed. The PLCRISC is under fabrication in a 0.8 μm CMOS technology. 相似文献

8.

VLSI hardware architecture for complex fuzzy systems

Ascia G. Catania V. Russo M. 《Fuzzy Systems, IEEE Transactions on》1999,7(5):553-570

This paper presents the design of a VLSI fuzzy processor, which is capable of dealing with complex fuzzy inference systems, i.e., fuzzy inferences that include rule chaining. The architecture of the processor is based on a computational model whose main features are: the capability to cope effectively with complex fuzzy inference systems; a detection phase of the rule with a positive degree of activation to reduce the number of rules to be processed per inference; parallel computation of the degree of activation of active rules; and representation of membership functions based on α-level sets. As the fuzzy inference can be divided into different processing phases, the processor is made up of a number of stages which are pipelined. In each stage several inference processing phases are performed parallelly. Its performance is in the order of 2 MFLIPS with 256 rules, eight inputs, two chained variables, and four outputs and 5.2 MFLIPS with 32 rules, three inputs, and one output with a clock frequency of 66 MHz 相似文献

9.

Fuzzy inference and fuzzy inference processor

Nakamura K. Sakashita N. Nitta Y. Shimomura K. Tokuda T. 《Micro, IEEE》1993,13(5):37-48

Fuzzy inference, a data processing method based on the fuzzy theory that has found wide use in the control field, is reviewed. Consumer electronics, which accounts for most current applications of this concept, does not require very high speeds. Although software running on a conventional microprocessor can perform these inferences, high-speed control applications require much greater speeds. A fuzzy inference date processor that operates at 200000 fuzzy logic inferences per second and features 12-b input and 16-b output resolution is described 相似文献

10.

Fuzzy inference on an analog fuzzy chip

Miki T. Yamakawa T. 《Micro, IEEE》1995,15(4):8-18

Our analog fuzzy processor features an inference speed of more than 1 million fuzzy logical inferences per second, excluding defuzzification. A rule chip processes fuzzy inferences while a second chip handles defuzzification, a functional division that facilitates flexible system configuration. The chips are compact fuzzy systems that save chip area and are suitable for built-in applications. They process high-speed fuzzy logic operations in parallel mode and, during execution of fuzzy inferences, feature an adaptable fuzzy system based on a rule set 相似文献

11.

众核处理器的流水线紧耦合指令循环缓存设计

张昆过锋郑方谢向辉《计算机研究与发展》2017,54(4):813-820

能效比是未来高性能计算机需要解决的重要问题.众核处理器作为高性能计算机的重要实现手段,其微结构的优化设计对能效比提升尤为关键.提出了1种面向众核处理器的流水线紧耦合的指令循环缓存设计,以较小的L0指令缓存提供更加高能效的指令取指.作为体系结构研究同硬件可实现性紧密结合的1次尝试,设计始终考虑了硬件实现代价这一关键约束.为了控制L0指令缓存对流水线性能的影响,指令缓存采用了循环出口预取技术,以此保证指令缓存提供的低功耗的指令取指能够最终转化为流水线能效比的提升.在gem5模拟器上实现了对指令循环缓存的模拟.对SPEC2006的测试结果表明,在不影响流水线性能的前提下,设计的典型配置可以减少27%的指令取指功耗以及31.5%的流水线前段部件动态功耗. 相似文献

12.

Design of a vector processor

下载免费PDF全文

Qi Lin 《计算机科学技术学报》1986,1(1):26-34

This paper discusses the inherent parallelism limits on several applications for vector computers, the parallel capabilities of several architectures and two ways (traditional instruction control flow and data control flow) by which the capabilities can be used. Then a scheme for a pipelined vector processor of multi-processing units is presented. The basic system structure and its function on highly sparse vector processing are described. A vector cache system and a distributed main memory are also considered, which are intended to sustain extremely high access rates for the processor. A microprocessor based vector processor is constructed, which can simulate the high performance version of the processor. 相似文献

13.

Itanium processor microarchitecture

Sharangpani H. Arora H. 《Micro, IEEE》2000,20(5):24-43

The Itanium processor is the first implementation of the IA-64 instruction set architecture (ISA). The design team optimized the processor to meet a wide range of requirements: high performance on Internet servers and workstations, support for 64-bit addressing, reliability for mission-critical applications, full IA-32 instruction set compatibility in hardware, and scalability across a range of operating systems and platforms. The processor employs EPIC (explicitly parallel instruction computing) design concepts for a tighter coupling between hardware and software. In this design style the hardware-software interface lets the software exploit all available compilation time information and efficiently deliver this information to the hardware. It addresses several fundamental performance bottlenecks in modern computers, such as memory latency, memory address disambiguation, and control flow dependencies 相似文献

14.

YHFT-DX高性能DSP指令控制流水线设计与优化

下载免费PDF全文

郭阳甄体智李勇《计算机工程与应用》2010,46(7):69-71

YHFT-DX是国防科技大学设计的一款高性能定点DSP。论文设计并实现了YHFT-DX指令控制流水线,提出了在YHFT-DX 超长指令字结构中跨取指包边界派发和指令预取的方法,有效提升了流水线的性能。对指令流水线进行了高频结构优化,将派发部件的关键路径延时压缩40%,满足了600 MHz频率的设计目标。相似文献

15.

An FPGA-based low-cost VLIW floating-point processor for CNC applications

《Microprocessors and Microsystems》2017

In the high-speed free-form surface machining, the real-time motion planning and interpolation is a challenging task. This paper presents the design and implementation of a dedicated processor for the interpolation task in computerized numerical control (CNC) machine tools. The jerk-limited look-ahead motion planning and interpolation algorithm has been integrated in the interpolation processor to achieve smooth motion in the high-speed machining. The processor features a compactly designed floating-point parallel computing architecture, which employs a 3-stage pipelined reduced instruction set computer (RISC) core and a very long instruction word (VLIW) floating-point arithmetic unit. A new asynchronous execution mechanism has been employed in the processor to allow multi-cycle instructions to be performed in parallel. The proposed processor has been verified on a low-cost field programmable gate array (FPGA) chip in a prototype controller. Experimental result has demonstrated the significant improvement of the computing performance with the interpolation processor in the free-form surface machining. 相似文献

16.

VLIW处理器循环指令缓冲器设计与实现

李勇胡慧俐杨焕荣《计算机应用》2014,34(4):1005-1009

数字信号处理软件中循环程序在执行时间上占有很大比例,用指令缓冲器暂存循环代码可以减少程序存储器的访问次数,提高处理器性能。在VLIW处理器指令流水线中增加一个支持循环指令的缓冲器,该缓冲器能够缓存循环程序指令,并以软件流水的形式向功能部件派发循环程序指令。这样循环程序代码只需访存一次而执行多次,大大减少了访存次数。在循环指令运行期间,缓冲器发出信号使程序存储器进入睡眠状态可以降低处理器功耗。典型的应用程序测试表明,使用了循环缓冲后,取指流水线空闲率可达90%以上,处理器整体性能提高10%左右,而循环缓冲的硬件面积开销大约占取指流水线的9%。相似文献

17.

A parallel system for text inference using marker propagations

Harabagiu S.M. Moldovan D.I. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(8):729-747

This paper presents a possible solution for the text inference problem-extracting information unstated in a text, but implied. Text inference is central to natural language applications such as information extraction and dissemination, text understanding, summarization, and translation. Our solution takes advantage of a semantic English dictionary available in electronic form that provides the basis for the development of a large linguistic knowledge base. The inference algorithm consists of a set of highly parallel search methods that, when applied to the knowledge base, find contexts in which sentences are interpreted. These contexts reveal information relevant to the text. Implementation, results, and parallelism analysis are discussed 相似文献

18.

Design of the Hermes-RISC Processor

《Journal of Microcomputer Applications》1995,18(3):233-259

This paper presents the structural design and the functional characteristics of a RISC processor called Hermes-RISC. The design of the Hermes-RISC processor is based on the study and evaluation of a variety of assembly instruction sets. The Hermes-RISC is a pipeline superscalar RISC processor with four superscalar units. The first stage evaluation of the Hermes-RISC performance is also presented here. This evaluation is based on the execution of a set of primitive processing tasks, such as summation, multiplication of numbers, multiplication of matrices, sorting, finding maximum values among a set of numbers, procedure calls, etc. Moreover, the performance of Hermes-RISC is compared with a variety of RISC processors. 相似文献

19.

Parallel inference machines at ICOT

Shunichi Uchida 《Future Generation Computer Systems》1987,3(4):245-252

Japan's fifth generation computer systems (FGCS) project aims at the research and development of new computer technology for knowledge information processing system (KIPS) that will be required in 1990s. In this project, logic programming is adopted for the base for software and hardware system to be developed. As a primitive operation of logic programming is syllogistic inference, machines studied and built in the project are called inference machines.

One of the project's target machines is a parallel inference machine (PIM) having about 1000 processing elements. Smaller scale PIMs are also planned as intermediate targets. In addition to PIMs, sequential inference machines (SIMs) have been developed for a software development tool. A personal type SIM is called PSI which is a logic programming workstation. For research and development of parallel software systems, especially, an operating system for PIM (PIMOS), a multi-PSI system which consists of several CPUs of PSI connected with a high-speed network, is also under development. In the intermediate stage plan of the project, parallel software research is emphasized and conducted more systematically.

This paper describes research and development plans for the parallel inference machine in conjunction with the parallel software research. 相似文献

20.

The i486 CPU: executing instructions in one clock cycle

Crawford J.H. 《Micro, IEEE》1990,10(1):27-36

The author discusses the design goals of the i486 development program, which were to ensure binary compatibility with the 386 microprocessor and the 387 math coprocessor, increase performance by two to three times over a 386/387 processor system at the same clock rate, and extend the IBM PC standard architecture of the 386 CPU with features suitable for minicomputers. A cache integrated into the instruction pipeline lets this 386-compatible processor achieve minicomputer performance levels. The design and performance of the on-chip cache and the instruction pipeline are examined in detail 相似文献