首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
本项目由Open-Silicon,GLOBALFOUNDRI ES和Amkor三家公司合作完成。两颗28nm的ARM处理器芯片,通过2.5D硅转接板实现集成。芯片的高性能集成通常由晶体管制程提高来实现,应用2.5D技术的Si P正成为传统芯片系统集成的有效替代。Open-Silicon负责芯片和硅转接板的设计,重点在于性能优化和成本降低。GLOBALFOUNDRI ES采用28nm超低能耗芯片工艺制造处理器芯片,而用65nm技术制造2.5D硅转接板。包括功耗优化和功能界面有效管理等概念得到验证。硅基板的高密度布线提供大量平行I/O,以实现高性能存储,并保持较低功耗。所开发的EDA设计参考流程可以用于优化2.5D设计。本文展示了如何将大颗芯片重新设计成较小的几颗芯片,通过2.5D硅转接板实现Si P系统集成,以降低成本,提高良率,增加设计灵活性和重复使用性,并减少开发风险。  相似文献   

2.
A high-throughput matching memory (MM) for a data-driven microprocessor is discussed. An MM can be constructed using a hashing memory. However, one of the biggest problems with hashing memory is the necessity for selective processing whenever hashed address conflicts occur. To eliminate this problem, the MM incorporated a small amount of associative memory (32 words×50 b) as well as the hashing memory (512 words×42 b). The matching operation is subdivided into three pipeline stages, all controlled by the elastic pipeline scheme. With this structure, an MM with a high throughput of 100-mega-access/s MM can be realized  相似文献   

3.
Software and hardware have been developed to create a powerful, inexpensive, compact digital signal processing system which in real-time extracts a low-bit rate linear predictive coding (LPC) speech system model. The model parameters derived include accurate spectral envelope, formant, pitch, and amplitude information. The system is based on the Texas Instruments TMS320 family, and the most compact realization requires only three chips (TMS320E17, A/D-D/A, op-amp), consuming a total of less than 0.5 W. The processor is part of programmable cochlear implant system under development by a multiuniversity Canadian team, but also has other applications in aids to the hearing handicapped.  相似文献   

4.
This paper describes a 51.2-GOPS video recognition processor, which achieves real-time multiple processing of in-vehicle video recognition applications in software, while at the same time satisfying power efficiency requirements of an in-vehicle device. The chip integrates 128 RISC microprocessors, each operating at 100 MHz, into a single chip. Hardware configurations of the chip are enhanced for supporting efficient execution of extended C language codes of algorithms based on four basic parallel methods. The results of a benchmark test using a weather-robust lane mark and vehicle detection application show that the processor achieves a four times better performance while it consumes less than 1/20 of peak power consumption compared with a 2.4-GHz general-purpose processor.  相似文献   

5.
A video DSP with macroblock-level-pipeline and a SIMD type vector-pipeline architecture (VDSP2) has been developed, using 0.5 μm triple-layer-metal CMOS technology. This 17.00 mm×15.00 mm chip consists of 2.5 M transistors, and operates at 100 MHz. The real-time encoder and decoder specified in the MPEG2 main profile at the main level can be realized with two VDSP2's and a motion estimation (ME) unit, and one VDSP2 respectively, at an 80 MHz clock rate, with a total power dissipation of 4.2 W at 3.3 V  相似文献   

6.
The authors have constructed and tested an oscillator using a grid amplifier with external feedback from a twist reflector. The twist reflector serves two functions; it changes the output polarization to match the input, and its position sets the feedback phase. This permits a wider tuning range than has been possible with previous grid oscillators. The source could be continuously tuned from 8.2 GHz to 11.0 GHz by moving the twist reflector. By moving the polarizer and mirror in the twist reflector independently, a 1.8-to-1 frequency range from 6.5 GHz to 11.5 GHz was achieved. The peak effective radiated power was 6.3 W at 9.9 GHz  相似文献   

7.
A 300-MOPS image digital signal processor (IDSP) including four pipelined date processing units and three parallel input-output (I/O) ports has been developed using a 0.8-μm BiCMOS technology. The IDSP integrates 910000 transistors in a 15.2-mm×15.2-mm area using a macrocell-oriented building-block design environment. The power dissipation was reduced to 1.0 W per 25-MHz instruction cycle, and a TTL-compatible I/O interface was retained by implementing two power supplies-one providing 3 V and the other 5 V. With this performance, a single-board 64/128-kb/s video codec was implemented with four IDSPs  相似文献   

8.
We have developed a 0.25-μm, 200-MHz embedded RISC processor for multimedia applications. This processor has a dual-issue superscalar datapath that consists of a 32-bit integer unit and a 64-bit single-instruction multiple-data (SIMD) function unit that together have a total of five multiply-adders. An on-chip concurrent Rambus DRAM (C-RDRAM) controller uses interleaved transactions to increase the memory bandwidth of the Rambus channel to 533 Mb/s. The controller also reduces latency by using the transaction interleaving and instruction prefetching. A 64-bit, 200-MHz internal bus transfers data among the CPU core, the C-RDRAM, and the peripherals. These high-data-rate channels improve CPU performance because they eliminate a bottleneck in the data supply. The datapath part of this chip was designed using a functional macrocell library that included placement information for leaf cells and resulted in the SIMD function unit of this chip's having 68000 transistors per square millimeter  相似文献   

9.
It is already known that a trellis code T, which is constructed by using the encoder of a convolutional code C with short constraint length followed by a delay processor and a signal mapper, is equivalent to a trellis code with large constraint length. In this paper, we derive a new lower bound on the free distance of T, which, in some cases, is better than the previously derived bound. Moreover, instead of the decoding used in earlier publications, we apply iterative decoding on both tailbiting and zero-tail representations of T to take advantage of the new lower bound and, in the meantime, to decrease the associated error coefficient caused by the decoding used in earlier publications. Comparisons among various designs of such a trellis code and some well-known coding methods are also provided.  相似文献   

10.
An integrated memory array processor (IMAP) ULSI with 64 processing elements and a 2-Mb SRAM has been developed for image processing. The chip attains a 3.84 GIPS peak performance through the use of SIMD parallel processing and a 1.28 GByte/s on-chip processor-memory bandwidth. The IMAP is capable of parallel indirect addressing, which increases applications for parallel algorithms. Large power consumption with the wide memory bandwidth is avoided by reducing the number of active sense amplifiers and adopting dynamic power control. Fabricated with a 0.55-μm BiCMOS double layer metal process technology, the IMAP contains 11 million transistors in a 15.1×15.6 mm2 die area  相似文献   

11.
12.
A 2.5-V, 72-Mbit DRAM based on packet protocol has been developed using (1) a rotated hierarchical I/O architecture to reduce power noise and to minimize the chip-size penalty associated with an 8-bit prefetch architecture implemented with 16 internal banks and 144 I/O lines, (2) a delay-locked-loop circuit using a high-speed and small-swing differential clock to achieve the peak bandwidth of 2.0 GByte/s in a single chip with low noise sensitivity, and (3) a flexible column redundancy scheme to efficiently increase redundancy coverage using a shifted I/O line scheme for multibank architecture  相似文献   

13.
A 1.2-million transistor, 33-MHz, 20-b dictionary search processor (DISP) ULSI has been developed using a 0.8-μm triple-layer-Al, CMOS fabrication technology. A 13.02×12.51-mm2 chip contains a specially developed 160-kb content addressable memory (CAM) and cellular automation processor (CAP). A single DISP chip can store a maximum of 2048 words, and performs dictionary search in various search modes, including an approximate word search. The character input rate for the dictionary search operation is 33 million characters per second. The DISP typically consumes 800 mW at a supply voltage of 5 V. A high-speed, functional 50000 word dictionary search system can be built with 25 DISP chips arranged in parallel, to play an important role in natural language processing  相似文献   

14.
This paper describes the main features and functions of the Pentium(R) 4 processor microarchitecture. We present the front-end of the machine, including its new form of instruction cache called the trace cache, and describe the out-of-order execution engine, including a low latency double-pumped arithmetic logic unit (ALU) that runs at 4 GHz. We also discuss the memory subsystem, including the low-latency Level 1 data cache that is accessed in two clock cycles. We then describe some of the key features that contribute to the Pentium(R) 4 processor's floating-point and multimedia performance. We provide some key performance numbers for this processor, comparing it to the Pentium(R) III processor  相似文献   

15.
Digital radio transmission systems use complex modulation schemes that require powerful signal processing techniques to correct channel distortions and to minimize bit-error rates (BERs). Combined analog and digital processors are investigated for minimizing the mean square error (MSE) of the radio receiver. The analog filters are implemented using acousto-optic (AO) processing since rapidly adaptable, inverse channel filters can be produced for either minimum or nonminimum phase channels. A specific architecture is identified and a laboratory system is tested to verify the ability of the processor to track and correct time-varying channels. Computer simulations are used to show that hybrid analog and digital equalization allows an increase in the modulation capacity of radio, relative to all digital equalization, while maintaining similar equipment signatures  相似文献   

16.
胡浩  陈星弼 《半导体学报》2012,33(3):034004-4
本文提出了一种新型的快速关断绝缘栅双极晶体管。在关断的时候,器件用一个自己驱动的P型晶体管来短路发射极PN结。在没有引入如折返电流电压曲线等副作用和工艺困难的情况下,器件实现了低导通压降和快速关断。数值仿真表明关断时间从120ns降到12纳秒,同时并没有增加导通压降。  相似文献   

17.
This paper presents a single-chip programmable platform that integrates most of hardware blocks required in the design of embedded system chips. The platform includes a 32-bit multithreaded RISC processor (MT-RISC), configurable logic clusters (CLCs), programmable first-in-first-out (FIFO) memories, control circuitry, and on-chip memories. For rapid thread switch, a multithreaded processor equipped with a hardware thread scheduling unit is adopted, and configurable logics are grouped into clusters for IP-based design. By integrating both the multithreaded processor and the configurable logic on a single chip, high-level language-based designs can be easily accommodated by performing the complex and concurrent functions of a target chip on the multithreaded processor and implementing the external interface functions into the configurable logic clusters. A 64-mm/sup 2/ prototype chip integrating a four-threaded MT-RISC, three CLCs, programmable FIFOs, and 8-kB on-chip memories is fabricated in a 0.35-/spl mu/m CMOS technology with four metal layers, which operates at 100-MHz clock frequency and consumes 370 mW at 3.3-V power supply.  相似文献   

18.
We present a hierarchical model for the analysis of proactive fault management in the presence of system resource leaks. At the low level of the model hierarchy is a degradation model in which we use a nonhomogeneous Markov chain to establish an explicit connection between resource leaks, and the failure rate. With the degradation model, we prove that the failure rate is asymptotically constant in the absence of resource leaks, and it is increasing as leaks occur & accumulate, which confirms the resource leaks as an aging source. The proactive fault management (PFM) is modeled at the higher level as a semi-Markov process. The PFM model takes as input the degradation analysis from the low-level model, and allows us to determine optimal rejuvenation schedules with respect to various system measures.  相似文献   

19.
《Microelectronics Journal》2015,46(7):637-655
This paper proposes a new processor architecture called VVSHP for accelerating data-parallel applications, which are growing in importance and demanding increased performance from hardware. VVSHP merges VLIW and vector processing techniques for a simple, high-performance processor architecture. One key point of VVSHP is the execution of multiple scalar instructions within VLIW and vector instructions on unified parallel execution datapaths. Another key point is to reduce the complexity of VVSHP by designing a two-part register file: (1) shared scalar–vector part with eight-read/four-write ports 64×32-bit registers (64 scalar or 16×4 vector registers) for storing scalar/vector data and (2) vector part with two-read/one-write ports 48 vector-registers, each stores 4×32-bit vector data. Moreover, processing vector data with lengths varying from 1 to 256 represents a key point for reducing the loop overheads. VVSHP can issue up to four scalar/vector operations in each cycle for parallel processing a set of operands and producing up to four results to be written back into VVSHP register file. However, it cannot issue more than one memory operation at a time, which loads/stores 128-bit scalar/vector data from/to data memory. The design of our proposed VVSHP processor is implemented using VHDL targeting the Xilinx FPGA Virtex-5 and its performance is evaluated.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号