期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Low-Power, High-Performance Architecture of the PWRficient Processor Family

Tse-Yu Yeh 《Micro, IEEE》2007,27(2):69-78

The dual-core PA6T-1682M system on chip (SoC) is the first design in the PWRficient family of high-performance, low-power processor designs that target server-class performance with low power consumption. the heart of the PA6T-1682M is the PA6T core, which implements the 64-bit IBM power architecture. The SoC implements extensive features that support embedded and mobile low-power applications. 相似文献

2.

A High-Performance, Pipelined, FPGA-Based Genetic Algorithm Machine

Barry Shackleford Greg Snider Richard J. Carter Etsuko Okushi Mitsuhiro Yasuda Katsuhiko Seo Hiroto Yasuura 《Genetic Programming and Evolvable Machines》2001,2(1):33-60

Accelerating a genetic algorithm (GA) by implementing it in a reconfigurable field programmable gate array (FPGA) is described. The implemented GA features: random parent selection, which conserves selection circuitry; a steady-state memory model, which conserves chip area; survival of fitter child chromosomes over their less-fit parent chromosomes, which promotes evolution. A net child chromosome generation rate of one per clock cycle is obtained by pipelining the parent selection, crossover, mutation, and fitness evaluation functions. Complex fitness functions can be further pipelined to maintain a high-speed clock cycle. Fitness functions with a pipeline initiation interval of greater than one can be plurally implemented to maintain a net evaluated-chromosome throughput of one per clock cycle. Two prototypes are described: The first prototype (c. 1996 technology) is a multiple-FPGA chip implementation, running at a 1 MHz clock rate, that solves a 94-row × 520-column set covering problem 2,200× faster than a 100 MHz workstation running the same algorithm in C. The second prototype (Xilinx XVC300) is a single-FPGA chip implementation, running at a 66 MHZ clock rate, that solves a 36-residue protein folding problem in a 2-d lattice 320× faster than a 366 MHz Pentium II. The current largest FPGA (Xilinx XCV3200E) has circuitry available for the implementation of 30 fitness function units which would yield an acceleration of 9,600× for the 36-residue protein folding problem. 相似文献

3.

A Software-Configurable Processor Architecture

《Micro, IEEE》2006,26(5):42-51

A software-configurable processor combines a traditional RISC processor with a field-programmable instruction extension unit that lets the system designer tailor the processor to a particular application. To add application-specific instructions to the processor, the programmer adds a pragma before a C or C++ function declaration, and the compiler then turns the function into a single instruction 相似文献

4.

Rock: A High-Performance Sparc CMT Processor

Chaudhry S. Cypher R. Ekman M. Karlsson M. Landin A. Yip S. Zeffer H. Tremblay M. 《Micro, IEEE》2009,29(2):6-16

Rock, Sun's third-generation chip-multithreading processor, contains 16 high-performance cores, each of which can support two software threads. Rock uses a novel checkpoint-based architecture to support automatic hardware scouting under a load miss, speculative out-of-order retirement of instructions, and aggressive dynamic hardware parallelization of a sequential instruction stream. It is also the first processor to support transactional memory in hardware. 相似文献

5.

适用于SIMD体系结构的FPGA分页仿真模型研究

何义任巨文梅杨乾明伍楠张春元郭敏《计算机研究与发展》2011,48(1)

SIMD结构能有效地开发多媒体和复杂科学计算的并行性,成为产业应用和研究的热点.在大规模SIMD体系结构研究中,为缓解FPGA芯片容量对仿真系统规模的限制,提出了适用于SIMD体系结构的FPGA分页仿真模型,有效降低了SIMD结构对FPGA计算资源和存储资源的需求,提高了SIMD结构的可验证规模.对MASA流处理器的仿真实验结果表明,不采用任何仿真优化技术,FPGA芯片EP2S180可支持的最大仿真规模为8个cluster的MASA,采用分页仿真模型,EP2S180的最大仿真规模可增加至256个cluster的MASA,而且仿真时间的增量是可接受的. 相似文献

6.

一种高性能分簇式超标量微处理器结构

甘初晖杨兵喻明艳《微处理机》2008,29(6)

随着超标量微处理器指令发射宽度的增大,流水线中各个部件的硬件复杂度以及连线长度迅速增加,特别是当工艺线宽越来越小时,连线延迟成为阻碍处理器性能提高的瓶颈。我们提出了一种分簇式超标量处理器结构,在维持发射宽度不变的前提下能够有效降低硬件复杂度,缩短连线长度,减小延迟时间。通过对该分簇的处理器进行模拟并估算它们的物理寄存器组的延迟和面积,我们发现,对于2×4分簇结构,在寄存器组面积减少12%的同时,处理器性能至少可获得16%的提升。相似文献

7.

An FPGA-Based Network Intrusion Detection Architecture 总被引：1，自引：0，他引：1

Das A. Nguyen D. Zambreno J. Memik G. Choudhary A. 《Information Forensics and Security, IEEE Transactions on》2008,3(1):118-132

Network intrusion detection systems (NIDSs) monitor network traffic for suspicious activity and alert the system or network administrator. With the onset of gigabit networks, current generation networking components for NIDS will soon be insufficient for numerous reasons; most notably because the existing methods cannot support high-performance demands. Field-programmable gate arrays (FPGAs) are an attractive medium to handle both high throughput and adaptability to the dynamic nature of intrusion detection. In this work, we design an FPGA-based architecture for anomaly detection in network transmissions. We first develop a feature extraction module (FEM) which aims to summarize network information to be used at a later stage. Our FPGA implementation shows that we can achieve significant performance improvements compared to existing software and application-specific integrated-circuit implementations. Then, we go one step further and demonstrate the use of principal component analysis as an outlier detection method for NIDSs. The results show that our architecture correctly classifies attacks with detection rates exceeding 99% and false alarms rates as low as 1.95%. Moreover, using extensive pipelining and hardware parallelism, it can be shown that for realistic workloads, our architectures for FEM and outlier analysis achieve 21.25- and 23.76-Gb/s core throughput, respectively. 相似文献

8.

Anteater: A Service-Oriented Architecture for High-Performance Data Mining 总被引：1，自引：0，他引：1

《Internet Computing, IEEE》2006,10(4):36-43

Data mining focuses on extracting useful information from large volumes of data, and thus has been the center of much attention in recent years. Building scalable, extensible,and easy-to-use data mining systems,however,has proved to be difficult. In response, the authors developed Anteater, a service-oriented architecture for data mining that relies on Web services to achieve extensibility and interoperability, offers simple abstractions for users, and supports computationally intensive processing on large amounts of data through massive parallelism. 相似文献

9.

SODA: A High-Performance DSP Architecture for Software-Defined Radio 总被引：1，自引：0，他引：1

Lin Y. Lee H. Woh M. Harel Y. Mahlke S. Mudge T. Chakrabarti C. Flautner K. 《Micro, IEEE》2007,27(1):114-123

Software-defined radio (SDR) belongs to an emerging class of applications with the processing requirements of a supercomputer but the power constraints of a mobile terminal. The authors developed the signal-processing on-demand architecture (SODA), a fully programmable architecture that supports SDR, by examining two widely differing protocols, W-CDMA and 802.11A. It meets power-performance requirements by separating control and data processing and by employing ultrawide SIMD execution 相似文献

10.

Array Processor Architecture

Theis D.J. 《Computer》1981,14(9):8-9

Today's array processors provide a cost-effective tool for increasing the speed at which highly computation-bound processing jobs can be carried out. They are maturing and expanding with greatly improved hardware and software—improvements that are primarily a result of the accumulated experience of the vendors and users of these machines. All in all, it is a very competitive environment. 相似文献

11.

A High-Performance Architecture with a Macroblock-Level-Pipeline for MPEG-2 Coding

《Real》1996,2(6):331-340

A high-performance parallel and pipelined architecture (MViP) has been proposed for MPEG-2 coding. A macrocell for use in an ASIC has been designed and implemented using ES2 0.7 μm dual-layer-metal CMOS technology. This macrocell consists of about 120,000 equivalent gates and is able to execute, in real time, the Loop of an MPEG-2 coder for main profile/main level (MP@ML) resolution when running at 40 MHz. MViP is made up of several specific-purpose units (SPUs), an RISC core processor, banks of internal memory and an optimized crossbar network which lets these pipelined SPUs and RISC core work in parallel at a macroblock-level-pipeline, greatly increasing silicon efficiency. 相似文献

12.

A High-Performance Workstation Using a Closely Coupled Architecture

Hamilton B.E. Fischer M.A. 《Computer Graphics and Applications, IEEE》1984,4(4):67-70

The Syte workstation architecture closely couples the graphics system and the processor to improve interactive performance and reduce hardware and software overhead without added support mechanisms. 相似文献

13.

一种高性能网络游戏服务器架构设计 总被引：1，自引：0，他引：1

杨玲《网络安全技术与应用》2010,(4):59-61

网络游戏一般采用C/S结构,服务器架构设计是成功开发一款网络游戏的关键,本文对高性能网络游戏服务器架构设计进行了深入地研究。网络游戏服务器的物理结构一般分成区和组,一个区会包括多组服务器,本文设计出的服务器组架构包括LoginGate、LoginServer、GameGate、GameServer、DBServer和MServer等服务器。相似文献

14.

一种神经网络并行处理器的体系结构

钱艺李占才李昂王沁《小型微型计算机系统》2007,28(10):1902-1906

神经网络处理系统所能实现神经网络模型的种类越多其通用性越好,应用范围就越广泛.提出了一种神经网络并行处理器的体系结构,能以较高的并行度实现典型的前馈网络-BP网络和典型的反馈网络-Hopfield网络的算法.该处理器以SIMD(Single Instruction Multiple Data)为主要计算结构,并结合这两种网络算法的特点设计了一维脉动阵列和全联通的互连网络,能够方便灵活地实现处理单元之间的数据共享.实验结果表明该体系结构有效地提高了神经网络的运行速度. 相似文献

15.

An Energy-Efficient Processor Architecture for Embedded Systems 总被引：1，自引：0，他引：1

Balfour James Dally William Black-Schaffer David Parikh Vishal Park JongSoo 《Computer Architecture Letters》2008,7(1):29-32

We present an efficient programmable architecture for compute-intensive embedded applications. The processor architecture uses instruction registers to reduce the cost of delivering instructions, and a hierarchical and distributed data register organization to deliver data. Instruction registers capture instruction reuse and locality in inexpensive storage structures that are located near to the functional units. The data register organization captures reuse and locality in different levels of the hierarchy to reduce the cost of delivering data. Exposed communication resources eliminate pipeline registers and control logic, and allow the compiler to schedule efficient instruction and data movement. The architecture keeps a significant fraction of instruction and data bandwidth local to the functional units, which reduces the cost of supplying instructions and data to large numbers of functional units. This architecture achieves an energy efficiency that is 23× greater than an embedded RISC processor. 相似文献

16.

基于SystemC的网络处理器体系结构建模方法

杨勃航张晓明王勇军《计算机工程与应用》2006,42(7):98-101

随着各种新的网络业务不断出现,网络处理器得到了日益广泛的应用。文章采用基于SystemC的系统设计方法,构建一个用于网络处理器体系结构建模的平台。这个平台由一个可扩展的异构资源库和一个体系结构构造器组成。设计者只需提交配置信息表,体系结构构造器就能自动生成模型实例,得到的模型可以方便地进行细化和性能评价。这种方法能够便捷地完成各种网络处理器的体系结构建模,便于网络处理器的优化设计。相似文献

17.

嵌入式处理器的Cache结构研究 总被引：5，自引：0，他引：5

陈章龙《小型微型计算机系统》2004,25(7):1204-1206

针对嵌入式处理嚣结构的特点，探讨虚拟Cache的结构、性能及实施方法等进行，讨论了Cache的锁定来改进Cache的循环淘汰置换算法的可行性，并对基于ARM架构的嵌入式处理器的Cache结构特点作了介绍。相似文献

18.

Design of New Optimized Architecture Processor for DWT

《Real》2000,6(4):297-312

This paper presents a VLSI implementation of One Dimensional Direct Discrete Wavelet transform (1-D DWT). The DDWT can be viewed as a multi-resolution decomposition of a signal. This means that it decomposes a signal into its components in different frequency bands (octave bands). We propose a new architecture using parallel filters. We consider the implementation of 1-D three levels DWT. The proposed architecture is simple and offers 16-bit precision on input and output data. It is constituted of three basic units: one register bank, four filters, and a control unit. The filters are of different lengths and with new coefficients derived from Daubechies filter coefficients. The designed processor architecture requires no interface circuitry for interconnection to a standard communication bus. The architecture can compute DWT at a data rate of 12×10⁶samples/s corresponding to a typical clock speed of 12 MHz. The architecture is simulated at the gate level in VLSI. 相似文献

19.

多核处理器构架的高速JPEG解码算法

章承科《单片机与嵌入式系统应用》2006,(1):44-47

实现基于多核处理器构架的JPEG解码算法;通过将JPEG算法并行化,在多个处理器核上并行处理,并针对多核处理器构架进行内存读取等方面的优化,可极大地提高JPEG解码算法的解码速度。实测表明,在4核集成的多核处理器上,JPEG图像的平均解码周期为单核处理器上的28％左右。相似文献

20.

一种基于传输触发架构的中值滤波处理器

白松辉史再峰郭炜魏继增《微处理机》2012,33(1):71-74

针对中值滤波算法速度慢的缺点,设计了一款基于传输触发架构的专用处理器,使得中值滤波的速度得到了大幅度的提升。其中数据存取单元采用二维寻址方式,与通用处理器相比,寻址时减少了加法指令和乘法指令的使用,提高了数据存取速度;设计了专用排序功能单元,与通用处理器相比减少了比较和跳转指令的使用。仿真和验证结果表明,在图像中值滤波处理中,该处理器比传统RISC架构通用处理器的效率有较大的提高。相似文献