共查询到20条相似文献,搜索用时 515 毫秒
1.
As we enter the multi-core era, seeking methods to boost the performance of single-threaded applications remains critical. Achieving gains in processor performance by increasing the operating frequency has begun to meet more obstacles. However, significant performance improvements can be achieved by extending the capability of the processor with the addition of hardware support, which makes much more effective use of the available transistors. This paper presents a novel hardware support called, DistTree, to speed up processor performance. The DistTree hardware automates gather and scatter operations for applications with complex but predictable memory access patterns like the Fast Fourier Transform (FFT). With this hardware support integrated with a modern microprocessor (the Alpha architecture in our experiments), the FFT performance can reap a more than twofold increase when compared against the FFTW library, a state-of-the-art implementation. The DistTree hardware support enables the processor to spend the majority of processor cycles on executing the computations of an algorithm by reducing both the arithmetic and address computation overhead. Therefore, the performance of many single-threaded applications can be significantly increased. 相似文献
2.
3.
4.
We propose an architecture dedicated mainly to medium-range applications that demand computational power combined with low cost for the resulting hardware system (chip and board). Our architecture is a 16-bit processor with dedicated instructions and hardware for efficient support of fuzzy logic. To make the architecture effective for control applications developed with a traditional approach or with fuzzy logic, we equipped the processor with a microcontroller's general features. Our design accounts for application characteristics to provide efficient hardware support for fuzzy logic. To achieve this we first analyzed fuzzy control algorithms and derived a general model for fuzzy computation. In defining the model, we considered the large spectrum of possible inference methods, fuzzification and defuzzification mechanisms, and the operators used in control applications. On this basis, we defined the instruction set that supports this computational model and a proper architectural solution. We tested the system (composed of the software model and its hardware support) by simulating different sets of general-purpose and fuzzy control benchmarks 相似文献
5.
The Itanium processor is the first implementation of the IA-64 instruction set architecture (ISA). The design team optimized the processor to meet a wide range of requirements: high performance on Internet servers and workstations, support for 64-bit addressing, reliability for mission-critical applications, full IA-32 instruction set compatibility in hardware, and scalability across a range of operating systems and platforms. The processor employs EPIC (explicitly parallel instruction computing) design concepts for a tighter coupling between hardware and software. In this design style the hardware-software interface lets the software exploit all available compilation time information and efficiently deliver this information to the hardware. It addresses several fundamental performance bottlenecks in modern computers, such as memory latency, memory address disambiguation, and control flow dependencies 相似文献
6.
7.
OpenSparc T2处理器是Sun UltraSparc T2处理器的开放版本,它提供了虚拟化的硬件支持。本文从T2处理器的硬件特性开始,分别描述了T2处理器支持虚拟化的硬件机制,介绍了T2平台系统软件结构,并从存储、中断、设备等几个方面详细描述了T2平台虚拟化实现原理。 相似文献
8.
针对基于ARM9系列的处理器内核的WiMAX终端SoC,构建了一个软硬件协同仿真环境。连接ARM926ejs处理器内核的仿真模型和SoC的RTL模型,利用仿真模型支持的ARM指令集的特性运行WiMAX终端SoC中的MAC层firmware程序,实现了SoC软硬件的同步调试,有效的提高了系统集成和验证的效率,有效地缩短了系统开发时间。 相似文献
9.
由于ARM处理器本身的硬件结构特点,使得其对操作系统在其上的移植得到了极大的硬件支持.本文就uC/OS-Ⅱ操作系统是如何在ARM处理器上移植的,及其移植上的应用做一个简单的论述. 相似文献
10.
Rapid changes in platform hardware resources with the evolution of many-core architectures will require a fundamental reexamination of mainstream system-software design decisions to support multiple cores and to efficiently manage on-chip hardware resources shared among the multiple cores. In turn, the evolution of many-core processor architectures will be successfully sustained by the new capabilities and features added to the system software, perhaps while requiring substantial support from hardware. The guest editors introduce five articles on the interaction of computer architecture and operating systems for this special issue of IEEE Micro. 相似文献
11.
针对Linux操作系统,实现了面向32位RSIC嵌入式处理器的存储器管理单元。通过在指令快表中增加预比较电路,提高了处理器连续访问同一虚拟页面时的地址转换效率。快表失效时,设计了专门的硬件来实现页表查询及快表填充,处理速度明显优于软件。论文设计的MMU能够很好地和Linux配合,完成地址映射及存储权限管理。 相似文献
12.
13.
欧阳玲 《数字社区&智能家居》2010,6(10):2529-2530
小波变换以其优越的性能得到了越来广泛的应用,但是目前在处理器中对小波算法的支持,主要采用专门的硬件加速部件完成,这样即增加了设计开销,也不利于应用的扩展。而随着集成电路工艺的不断发展,在单片处理器的体系结构级考虑对小波的支持已经成为可能。该文就是基于这样的一种观点,通过分析9/7提升小波变换以及9/7整形小波变换在核心操作,数据流,并行性等方面的特点,提出了在处理器级支持小波变换的体系结构设计。 相似文献
14.
David S. Wise Brian Heck Caleb Hess Willie Hunt Eric Ost 《LISP and Symbolic Computation》1997,10(2):159-181
A hardware self-managing heap memory (RCM) for languages like Lisp, Smalltalk, and Java has been designed, built, tested and benchmarked. On every pointer write from the processor, reference-counting transactions are performed in real time within this memory, and garbage cells are reused without processor cycles. A processor allocates new nodes simply by reading from a distinguished location in its address space. The memory hardware also incorporates support for off-line, multiprocessing, mark-sweep garbage collection.Performance statistics are presented from a partial implementation of Scheme over five different memory models and two garbage collection strategies, from main memory (no access to RCM) to a fully operational RCM installed on an external bus. The performance of the RCM memory is more than competitive with main memory. 相似文献
15.
We propose Intelligent Watcher (iWatcher), a combination of hardware and software support that can detect large variations of software bugs with only modest hardware changes to current processor implementations. iWatcher lets programmers associate specified functions to "watched" memory locations or objects. Access to any such location automatically triggers the monitoring function in the hardware. Relative to other approaches, iWatcher detects many real bugs at a fraction of the execution-time overhead 相似文献
16.
PipeRench: a reconfigurable architecture and compiler 总被引:1,自引:0,他引:1
With the proliferation of highly specialized embedded computer systems has come a diversification of workloads for computing devices. General-purpose processors are struggling to efficiently meet these applications' disparate needs, and custom hardware is rarely feasible. According to the authors, reconfigurable computing, which combines the flexibility of general-purpose processors with the efficiency of custom hardware, can provide the alternative. PipeRench and its associated compiler comprise the authors' new architecture for reconfigurable computing. Combined with a traditional digital signal processor, microcontroller or general-purpose processor, PipeRench can support a system's various computing needs without requiring custom hardware. The authors describe the PipeRench architecture and how it solves some of the pre-existing problems with FPGA architectures, such as logic granularity, configuration time, forward compatibility, hard constraints and compilation time 相似文献
17.
The Syte workstation architecture closely couples the graphics system and the processor to improve interactive performance and reduce hardware and software overhead without added support mechanisms. 相似文献
18.
Hui-Ya LiAuthor VitaeChia-Lung HungAuthor Vitae Wen-Jyi HwangAuthor Vitae Yi-Tsan HungAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(2):236-244
This paper presents a novel pipelined architecture for competitive learning (CL). The architecture is implemented by the field programmable gate array (FPGA). It is used as a hardware accelerator in a system on programmable chip (SOPC) for reducing the computation time. In the architecture, a novel codeword swapping scheme is adopted so that neuron competitions for different training vectors can be operated concurrently. The neuron updating process is based on a hardware divider with simple table lookup operations. The divider performs finite precision calculations for area cost reduction at the expense of slight degradation in training performance. The CPU time of the NIOS processor executing the CL training with the proposed architecture as an accelerator is measured. Experimental results show that the NIOS processor with the proposed architecture as an accelerator can achieve up to a speedup of 254 over its software counterpart running on a general purpose processor Pentium IV without hardware support. 相似文献
19.
Ben A. Abderazek Arquimedes CanedoAuthor VitaeTsutomu YoshinagaAuthor Vitae Masahiro SowaAuthor Vitae 《Journal of Parallel and Distributed Computing》2008
Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2)—an improved and optimized version of the produced order parallel Queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor. 相似文献
20.
Multiprocessor performance-measurement instrumentation 总被引:1,自引:0,他引:1
Performance measurement for loosely and tightly coupled multiple-instruction multiple-data multiprocessor systems is addressed. For the paradigm of multiple processors solving a single problem faster, a taxonomy of hardware-supported measurement approaches is presented and critiqued. A hybrid measurement system that is, software with hardware support, is presented. The system, called the trace measurement system (Trams), initially consisted of a memory-mapped device using software triggering and hardware sampling of time and processor identification. A VLSI chip set that integrates the Trams functions of software triggering and hardware sampling with hardware counters is described, and its application is discussed. The tool introduces little perturbation and provides physically small and affordable performance-measurement support 相似文献