共查询到20条相似文献,搜索用时 15 毫秒
Tse-Yu Yeh 《Micro, IEEE》2007,27(2):69-78
The dual-core PA6T-1682M system on chip (SoC) is the first design in the PWRficient family of high-performance, low-power processor designs that target server-class performance with low power consumption. the heart of the PA6T-1682M is the PA6T core, which implements the 64-bit IBM power architecture. The SoC implements extensive features that support embedded and mobile low-power applications. 相似文献
Barry Shackleford Greg Snider Richard J. Carter Etsuko Okushi Mitsuhiro Yasuda Katsuhiko Seo Hiroto Yasuura 《Genetic Programming and Evolvable Machines》2001,2(1):33-60
Accelerating a genetic algorithm (GA) by implementing it in a reconfigurable field programmable gate array (FPGA) is described. The implemented GA features: random parent selection, which conserves selection circuitry; a steady-state memory model, which conserves chip area; survival of fitter child chromosomes over their less-fit parent chromosomes, which promotes evolution. A net child chromosome generation rate of one per clock cycle is obtained by pipelining the parent selection, crossover, mutation, and fitness evaluation functions. Complex fitness functions can be further pipelined to maintain a high-speed clock cycle. Fitness functions with a pipeline initiation interval of greater than one can be plurally implemented to maintain a net evaluated-chromosome throughput of one per clock cycle. Two prototypes are described: The first prototype (c. 1996 technology) is a multiple-FPGA chip implementation, running at a 1 MHz clock rate, that solves a 94-row × 520-column set covering problem 2,200× faster than a 100 MHz workstation running the same algorithm in C. The second prototype (Xilinx XVC300) is a single-FPGA chip implementation, running at a 66 MHZ clock rate, that solves a 36-residue protein folding problem in a 2-d lattice 320× faster than a 366 MHz Pentium II. The current largest FPGA (Xilinx XCV3200E) has circuitry available for the implementation of 30 fitness function units which would yield an acceleration of 9,600× for the 36-residue protein folding problem. 相似文献
《Micro, IEEE》2006,26(5):42-51
A software-configurable processor combines a traditional RISC processor with a field-programmable instruction extension unit that lets the system designer tailor the processor to a particular application. To add application-specific instructions to the processor, the programmer adds a pragma before a C or C++ function declaration, and the compiler then turns the function into a single instruction 相似文献
Chaudhry S. Cypher R. Ekman M. Karlsson M. Landin A. Yip S. Zeffer H. Tremblay M. 《Micro, IEEE》2009,29(2):6-16
Rock, Sun's third-generation chip-multithreading processor, contains 16 high-performance cores, each of which can support two software threads. Rock uses a novel checkpoint-based architecture to support automatic hardware scouting under a load miss, speculative out-of-order retirement of instructions, and aggressive dynamic hardware parallelization of a sequential instruction stream. It is also the first processor to support transactional memory in hardware. 相似文献
SIMD结构能有效地开发多媒体和复杂科学计算的并行性,成为产业应用和研究的热点.在大规模SIMD体系结构研究中,为缓解FPGA芯片容量对仿真系统规模的限制,提出了适用于SIMD体系结构的FPGA分页仿真模型,有效降低了SIMD结构对FPGA计算资源和存储资源的需求,提高了SIMD结构的可验证规模.对MASA流处理器的仿真实验结果表明,不采用任何仿真优化技术,FPGA芯片EP2S180可支持的最大仿真规模为8个cluster的MASA,采用分页仿真模型,EP2S180的最大仿真规模可增加至256个cluster的MASA,而且仿真时间的增量是可接受的. 相似文献
An FPGA-Based Network Intrusion Detection Architecture 总被引:1,自引:0,他引:1
Das A. Nguyen D. Zambreno J. Memik G. Choudhary A. 《Information Forensics and Security, IEEE Transactions on》2008,3(1):118-132
Network intrusion detection systems (NIDSs) monitor network traffic for suspicious activity and alert the system or network administrator. With the onset of gigabit networks, current generation networking components for NIDS will soon be insufficient for numerous reasons; most notably because the existing methods cannot support high-performance demands. Field-programmable gate arrays (FPGAs) are an attractive medium to handle both high throughput and adaptability to the dynamic nature of intrusion detection. In this work, we design an FPGA-based architecture for anomaly detection in network transmissions. We first develop a feature extraction module (FEM) which aims to summarize network information to be used at a later stage. Our FPGA implementation shows that we can achieve significant performance improvements compared to existing software and application-specific integrated-circuit implementations. Then, we go one step further and demonstrate the use of principal component analysis as an outlier detection method for NIDSs. The results show that our architecture correctly classifies attacks with detection rates exceeding 99% and false alarms rates as low as 1.95%. Moreover, using extensive pipelining and hardware parallelism, it can be shown that for realistic workloads, our architectures for FEM and outlier analysis achieve 21.25- and 23.76-Gb/s core throughput, respectively. 相似文献
Lin Y. Lee H. Woh M. Harel Y. Mahlke S. Mudge T. Chakrabarti C. Flautner K. 《Micro, IEEE》2007,27(1):114-123
Software-defined radio (SDR) belongs to an emerging class of applications with the processing requirements of a supercomputer but the power constraints of a mobile terminal. The authors developed the signal-processing on-demand architecture (SODA), a fully programmable architecture that supports SDR, by examining two widely differing protocols, W-CDMA and 802.11A. It meets power-performance requirements by separating control and data processing and by employing ultrawide SIMD execution 相似文献
《Internet Computing, IEEE》2006,10(4):36-43
Data mining focuses on extracting useful information from large volumes of data, and thus has been the center of much attention in recent years. Building scalable, extensible,and easy-to-use data mining systems,however,has proved to be difficult. In response, the authors developed Anteater, a service-oriented architecture for data mining that relies on Web services to achieve extensibility and interoperability, offers simple abstractions for users, and supports computationally intensive processing on large amounts of data through massive parallelism. 相似文献
Today's array processors provide a cost-effective tool for increasing the speed at which highly computation-bound processing jobs can be carried out. They are maturing and expanding with greatly improved hardware and software—improvements that are primarily a result of the accumulated experience of the vendors and users of these machines. All in all, it is a very competitive environment. 相似文献
A high-performance parallel and pipelined architecture (MViP) has been proposed for MPEG-2 coding. A macrocell for use in an ASIC has been designed and implemented using ES2 0.7 μm dual-layer-metal CMOS technology. This macrocell consists of about 120,000 equivalent gates and is able to execute, in real time, the Loop of an MPEG-2 coder for main profile/main level (MP@ML) resolution when running at 40 MHz. MViP is made up of several specific-purpose units (SPUs), an RISC core processor, banks of internal memory and an optimized crossbar network which lets these pipelined SPUs and RISC core work in parallel at a macroblock-level-pipeline, greatly increasing silicon efficiency. 相似文献
受浮点操作的长流水线延迟及FPGA片上RAM端口数目的限制,传统H可处理器的吞吐率通常只能达到每周期输出一个复数结果。本文用FPGA设计并实现了一种高吞吐率的IEEE754标准单精度浮点FFT处理器,通过改进蝶形计算单元的结构并重新组织FPGA片上RAM的访问,该处理器每周期平均可输出约两个复数计算结果,吞吐率约为传统FFT处理器吞吐率的两倍。对于1024点FFT变换,可在(512+10)*10=5220周期内完成。 相似文献
一种高性能网络游戏服务器架构设计 总被引:1,自引:0,他引:1
杨玲 《网络安全技术与应用》2010,(4):59-61
网络游戏一般采用C/S结构,服务器架构设计是成功开发一款网络游戏的关键,本文对高性能网络游戏服务器架构设计进行了深入地研究。网络游戏服务器的物理结构一般分成区和组,一个区会包括多组服务器,本文设计出的服务器组架构包括LoginGate、LoginServer、GameGate、GameServer、DBServer和MServer等服务器。 相似文献
The Syte workstation architecture closely couples the graphics system and the processor to improve interactive performance and reduce hardware and software overhead without added support mechanisms. 相似文献
神经网络处理系统所能实现神经网络模型的种类越多其通用性越好,应用范围就越广泛.提出了一种神经网络并行处理器的体系结构,能以较高的并行度实现典型的前馈网络-BP网络和典型的反馈网络-Hopfield网络的算法.该处理器以SIMD(Single Instruction Multiple Data)为主要计算结构,并结合这两种网络算法的特点设计了一维脉动阵列和全联通的互连网络,能够方便灵活地实现处理单元之间的数据共享.实验结果表明该体系结构有效地提高了神经网络的运行速度. 相似文献
An Energy-Efficient Processor Architecture for Embedded Systems 总被引:1,自引:0,他引:1
Balfour James Dally William Black-Schaffer David Parikh Vishal Park JongSoo 《Computer Architecture Letters》2008,7(1):29-32
We present an efficient programmable architecture for compute-intensive embedded applications. The processor architecture uses instruction registers to reduce the cost of delivering instructions, and a hierarchical and distributed data register organization to deliver data. Instruction registers capture instruction reuse and locality in inexpensive storage structures that are located near to the functional units. The data register organization captures reuse and locality in different levels of the hierarchy to reduce the cost of delivering data. Exposed communication resources eliminate pipeline registers and control logic, and allow the compiler to schedule efficient instruction and data movement. The architecture keeps a significant fraction of instruction and data bandwidth local to the functional units, which reduces the cost of supplying instructions and data to large numbers of functional units. This architecture achieves an energy efficiency that is 23× greater than an embedded RISC processor. 相似文献
随着各种新的网络业务不断出现,网络处理器得到了日益广泛的应用。文章采用基于SystemC的系统设计方法,构建一个用于网络处理器体系结构建模的平台。这个平台由一个可扩展的异构资源库和一个体系结构构造器组成。设计者只需提交配置信息表,体系结构构造器就能自动生成模型实例,得到的模型可以方便地进行细化和性能评价。这种方法能够便捷地完成各种网络处理器的体系结构建模,便于网络处理器的优化设计。 相似文献
章承科 《单片机与嵌入式系统应用》2006,(1):44-47
实现基于多核处理器构架的JPEG解码算法;通过将JPEG算法并行化,在多个处理器核上并行处理,并针对多核处理器构架进行内存读取等方面的优化,可极大地提高JPEG解码算法的解码速度。实测表明,在4核集成的多核处理器上,JPEG图像的平均解码周期为单核处理器上的28%左右。 相似文献
This paper presents a VLSI implementation of One Dimensional Direct Discrete Wavelet transform (1-D DWT). The DDWT can be viewed as a multi-resolution decomposition of a signal. This means that it decomposes a signal into its components in different frequency bands (octave bands). We propose a new architecture using parallel filters. We consider the implementation of 1-D three levels DWT. The proposed architecture is simple and offers 16-bit precision on input and output data. It is constituted of three basic units: one register bank, four filters, and a control unit. The filters are of different lengths and with new coefficients derived from Daubechies filter coefficients. The designed processor architecture requires no interface circuitry for interconnection to a standard communication bus. The architecture can compute DWT at a data rate of 12×106samples/s corresponding to a typical clock speed of 12 MHz. The architecture is simulated at the gate level in VLSI. 相似文献