期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Gather/scatter hardware support for accelerating Fast Fourier Transform

Anderson Kuei-An Ku Jingling Xue Yong Guan 《Journal of Systems Architecture》2010,56(12):667-684

As we enter the multi-core era, seeking methods to boost the performance of single-threaded applications remains critical. Achieving gains in processor performance by increasing the operating frequency has begun to meet more obstacles. However, significant performance improvements can be achieved by extending the capability of the processor with the addition of hardware support, which makes much more effective use of the available transistors. This paper presents a novel hardware support called, DistTree, to speed up processor performance. The DistTree hardware automates gather and scatter operations for applications with complex but predictable memory access patterns like the Fast Fourier Transform (FFT). With this hardware support integrated with a modern microprocessor (the Alpha architecture in our experiments), the FFT performance can reap a more than twofold increase when compared against the FFTW library, a state-of-the-art implementation. The DistTree hardware support enables the processor to spend the majority of processor cycles on executing the computations of an algorithm by reducing both the arithmetic and address computation overhead. Therefore, the performance of many single-threaded applications can be significantly increased. 相似文献

2.

便携式矿石电磁波检测装置的设计

苏维嘉宋宇宁许琢《计算机系统应用》2011,20(12):75-78

岩体破裂过程中,电磁辐射(EME)是岩体受载变形破裂过程中向外辐射电磁能量的过程,与岩体的受载情况及变形破裂过程密切相关.在此基础上设计了一套便携式矿石电磁波检测装置,硬件系统主要由LPC2132处理器,电磁波检测模块,信号强度处理模块,强度指示模块等组成.软件系统由固件程序,信号自动收发模块等构成. 相似文献

3.

硬件流水处理器设计

下载免费PDF全文

刘文波潘雪增《计算机工程》2011,37(8):278-280

利用核内空闲资源加速单线程程序执行的方法,将可并行的代码安排在核内空闲单元上执行,实现代码块在核内的流水操作,从而设计一种具有循环加速能力的硬件流水处理器,可通过改变取值结构和寄存器分配逻辑获得编译器的支持.结果表明,应用该处理器后的spec2000测试程序执行性能提升了40%. 相似文献

4.

Fuzzy logic microcontroller

Costa A. De Gloria A. Giudici F. Olivieri M. 《Micro, IEEE》1997,17(1):66-74

We propose an architecture dedicated mainly to medium-range applications that demand computational power combined with low cost for the resulting hardware system (chip and board). Our architecture is a 16-bit processor with dedicated instructions and hardware for efficient support of fuzzy logic. To make the architecture effective for control applications developed with a traditional approach or with fuzzy logic, we equipped the processor with a microcontroller's general features. Our design accounts for application characteristics to provide efficient hardware support for fuzzy logic. To achieve this we first analyzed fuzzy control algorithms and derived a general model for fuzzy computation. In defining the model, we considered the large spectrum of possible inference methods, fuzzification and defuzzification mechanisms, and the operators used in control applications. On this basis, we defined the instruction set that supports this computational model and a proper architectural solution. We tested the system (composed of the software model and its hardware support) by simulating different sets of general-purpose and fuzzy control benchmarks 相似文献

5.

Itanium processor microarchitecture

Sharangpani H. Arora H. 《Micro, IEEE》2000,20(5):24-43

The Itanium processor is the first implementation of the IA-64 instruction set architecture (ISA). The design team optimized the processor to meet a wide range of requirements: high performance on Internet servers and workstations, support for 64-bit addressing, reliability for mission-critical applications, full IA-32 instruction set compatibility in hardware, and scalability across a range of operating systems and platforms. The processor employs EPIC (explicitly parallel instruction computing) design concepts for a tighter coupling between hardware and software. In this design style the hardware-software interface lets the software exploit all available compilation time information and efficiently deliver this information to the hardware. It addresses several fundamental performance bottlenecks in modern computers, such as memory latency, memory address disambiguation, and control flow dependencies 相似文献

6.

基于MIPS内核的SoC软硬件协同仿真

下载免费PDF全文

王江刘佩林陈颖琪《计算机工程》2006,32(16):247-249

针对基于MIPS系列处理器内核的高清电视解码SoC，构建了一个软硬件协同仿真环境。连接MIPS处理器内核的VMC模型和SoC的RTL模型，利用VMC模型支持MIPS指令集的特性运行测试汇编程序，实现了SoC软硬件的同步调试，有效地提高了系统验证的效率。相似文献

7.

OpenSparc T2处理器虚拟化技术研究

下载免费PDF全文

冯华唐宏伟卢凯刘勇鹏《计算机工程与科学》2010,32(7):72-75

OpenSparc T2处理器是Sun UltraSparc T2处理器的开放版本,它提供了虚拟化的硬件支持。本文从T2处理器的硬件特性开始,分别描述了T2处理器支持虚拟化的硬件机制,介绍了T2平台系统软件结构,并从存储、中断、设备等几个方面详细描述了T2平台虚拟化实现原理。相似文献

8.

基于ARM9内核的SoC软硬件协同仿真

DONG Ke-wen LIN Ping-fen 《数字社区&智能家居》2008,(1)

针对基于ARM9系列的处理器内核的WiMAX终端SoC,构建了一个软硬件协同仿真环境。连接ARM926ejs处理器内核的仿真模型和SoC的RTL模型,利用仿真模型支持的ARM指令集的特性运行WiMAX终端SoC中的MAC层firmware程序,实现了SoC软硬件的同步调试,有效的提高了系统集成和验证的效率,有效地缩短了系统开发时间。相似文献

9.

uc/OS-Ⅱ在ARM处理器上的移植

蒋利军陈庆荣《现代计算机》2006,(8):88-90

由于ARM处理器本身的硬件结构特点,使得其对操作系统在其上的移植得到了极大的硬件支持.本文就uC/OS-Ⅱ操作系统是如何在ARM处理器上移植的,及其移植上的应用做一个简单的论述. 相似文献

10.

Guest Editors' Introduction: Interaction of Many-Core Computer Architecture and Operating Systems

Cho Sangyeun Li Tao Mutlu Onur 《Micro, IEEE》2008,28(3):2-5

Rapid changes in platform hardware resources with the evolution of many-core architectures will require a fundamental reexamination of mainstream system-software design decisions to support multiple cores and to efficiently manage on-chip hardware resources shared among the multiple cores. In turn, the evolution of many-core processor architectures will be successfully sustained by the new capabilities and features added to the system software, perhaps while requiring substantial support from hardware. The guest editors introduce five articles on the interaction of computer architecture and operating systems for this special issue of IEEE Micro. 相似文献

11.

针对嵌入式系统的存储器管理单元设计

朱贺飞陆超周晓方闵昊周电《计算机工程与应用》2007,43(1):96-99

针对Linux操作系统,实现了面向32位RSIC嵌入式处理器的存储器管理单元。通过在指令快表中增加预比较电路,提高了处理器连续访问同一虚拟页面时的地址转换效率。快表失效时,设计了专门的硬件来实现页表查询及快表填充,处理速度明显优于软件。论文设计的MMU能够很好地和Linux配合,完成地址映射及存储权限管理。相似文献

12.

基于DSP的声效处理器硬件设计与时序研究

战岳祥宋占伟徐刚《电子技术应用》2007,33(10):52-55

针对特定环境(车内空间)内的声场重建,给出了声效处理器的硬件设计方案。以TI公司的数字信号处理器TMS320VC5402为硬件中心,对声音信号进行算法处理;串行16位模数转换芯片AD1870实现对音频信号的采集;串行数模转换芯片AD1858完成对处理后的音频信号的输出。详细叙述了TMS320VC5402和AD1870、AD1858之间的连接和时序。实验表明,此系统可靠、稳定,为完成音频信号的实时编解码提供了硬件支持,使在非规范空间播放高品质立体声音效成为可能。相似文献

13.

面向小波变换的体系结构设计

欧阳玲《数字社区&智能家居》2010,6(10):2529-2530

小波变换以其优越的性能得到了越来广泛的应用,但是目前在处理器中对小波算法的支持,主要采用专门的硬件加速部件完成,这样即增加了设计开销,也不利于应用的扩展。而随着集成电路工艺的不断发展,在单片处理器的体系结构级考虑对小波的支持已经成为可能。该文就是基于这样的一种观点,通过分析9/7提升小波变换以及9/7整形小波变换在核心操作,数据流,并行性等方面的特点,提出了在处理器级支持小波变换的体系结构设计。相似文献

14.

Research Demonstration of a Hardware Reference-Counting Heap

David S. Wise Brian Heck Caleb Hess Willie Hunt Eric Ost 《LISP and Symbolic Computation》1997,10(2):159-181

A hardware self-managing heap memory (RCM) for languages like Lisp, Smalltalk, and Java has been designed, built, tested and benchmarked. On every pointer write from the processor, reference-counting transactions are performed in real time within this memory, and garbage cells are reused without processor cycles. A processor allocates new nodes simply by reading from a distinguished location in its address space. The memory hardware also incorporates support for off-line, multiprocessing, mark-sweep garbage collection.Performance statistics are presented from a partial implementation of Scheme over five different memory models and two garbage collection strategies, from main memory (no access to RCM) to a fully operational RCM installed on an external bus. The performance of the RCM memory is more than competitive with main memory. 相似文献

15.

iWatcher: simple, general architectural support for software debugging

Pin Zhou Feng Uin Wei Liu Yuanyuan Zhou Torrellas J. 《Micro, IEEE》2004,24(6):50-56

We propose Intelligent Watcher (iWatcher), a combination of hardware and software support that can detect large variations of software bugs with only modest hardware changes to current processor implementations. iWatcher lets programmers associate specified functions to "watched" memory locations or objects. Access to any such location automatically triggers the monitoring function in the hardware. Relative to other approaches, iWatcher detects many real bugs at a fraction of the execution-time overhead 相似文献

16.

PipeRench: a reconfigurable architecture and compiler 总被引：1，自引：0，他引：1

Goldstein S.C. Schmit H. Budiu M. Cadambi S. Moe M. Taylor R.R. 《Computer》2000,33(4):70-77

With the proliferation of highly specialized embedded computer systems has come a diversification of workloads for computing devices. General-purpose processors are struggling to efficiently meet these applications' disparate needs, and custom hardware is rarely feasible. According to the authors, reconfigurable computing, which combines the flexibility of general-purpose processors with the efficiency of custom hardware, can provide the alternative. PipeRench and its associated compiler comprise the authors' new architecture for reconfigurable computing. Combined with a traditional digital signal processor, microcontroller or general-purpose processor, PipeRench can support a system's various computing needs without requiring custom hardware. The authors describe the PipeRench architecture and how it solves some of the pre-existing problems with FPGA architectures, such as logic granularity, configuration time, forward compatibility, hard constraints and compilation time 相似文献

17.

A High-Performance Workstation Using a Closely Coupled Architecture

Hamilton B.E. Fischer M.A. 《Computer Graphics and Applications, IEEE》1984,4(4):67-70

The Syte workstation architecture closely couples the graphics system and the processor to improve interactive performance and reduce hardware and software overhead without added support mechanisms. 相似文献

18.

Efficient pipelined architecture for competitive learning

Hui-Ya LiAuthor VitaeChia-Lung HungAuthor Vitae Wen-Jyi Hwang^{Author Vitae} Yi-Tsan HungAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(2):236-244

This paper presents a novel pipelined architecture for competitive learning (CL). The architecture is implemented by the field programmable gate array (FPGA). It is used as a hardware accelerator in a system on programmable chip (SOPC) for reducing the computation time. In the architecture, a novel codeword swapping scheme is adopted so that neuron competitions for different training vectors can be operated concurrently. The neuron updating process is based on a hardware divider with simple table lookup operations. The divider performs finite precision calculations for area cost reduction at the expense of slight degradation in training performance. The CPU time of the NIOS processor executing the CL training with the proposed architecture as an accelerator is measured. Experimental results show that the NIOS processor with the proposed architecture as an accelerator can achieve up to a speedup of 254 over its software counterpart running on a general purpose processor Pentium IV without hardware support. 相似文献

19.

The QC-2 parallel Queue processor architecture

Ben A. Abderazek Arquimedes CanedoAuthor VitaeTsutomu YoshinagaAuthor Vitae Masahiro SowaAuthor Vitae 《Journal of Parallel and Distributed Computing》2008

Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2)—an improved and optimized version of the produced order parallel Queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor. 相似文献

20.

Multiprocessor performance-measurement instrumentation 总被引：1，自引：0，他引：1

Mink A. Carpenter R.J. Nacht G.G. Roberts J.W. 《Computer》1990,23(9):63-75

Performance measurement for loosely and tightly coupled multiple-instruction multiple-data multiprocessor systems is addressed. For the paradigm of multiple processors solving a single problem faster, a taxonomy of hardware-supported measurement approaches is presented and critiqued. A hybrid measurement system that is, software with hardware support, is presented. The system, called the trace measurement system (Trams), initially consisted of a memory-mapped device using software triggering and hardware sampling of time and processor identification. A VLSI chip set that integrates the Trams functions of software triggering and hardware sampling with hardware counters is described, and its application is discussed. The tool introduces little perturbation and provides physically small and affordable performance-measurement support 相似文献