期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core

Gangyong Jia Liang Shi Xi Li Dong Dai 《Journal of Signal Processing Systems》2016,84(1):139-150

In contemporary multi-core systems, memory is shared among a number of concurrent threads. Memory contention and interference are becoming increasingly severe incurring such problems as performance degradation, unfair resource sharing and priority inversion. In this paper, we aim at the challenge of improving performance and fairness for concurrent threads while minimizing energy consumption in main memory. Therefore, we propose PUMA, a novel solution that reduces memory contention and interference by judiciously partitioning threads among cores and allocating each core exclusive memory banks and bandwidth based on thread’s characteristics. Our results demonstrate that PUMA is able to improve both performance and fairness while reducing energy consumption significantly compared to existing memory management approaches. 相似文献

2.

A 32 kbyte integrated cache memory

Sawada K. Sakurai T. Nogami K. Shirotori T. Takayanagi T. Iizuka T. Maeda T. Matsunaga J. Fuji H. Maeguchi K. Kobayashi K. Ando T. Hayakashi Y. Miyoshi A. Sato K. 《Solid-State Circuits, IEEE Journal of》1989,24(4):881-888

The system, circuit, layout and device levels of an integrated cache memory (ICM), which includes 32 kbyte DATA memory with typical address to HIT delay of 18 ns and address to DATA delay of 23 ns, are described. The ICM offers the largest memory size and the fastest speed ever reported in a cache memory. The device integrates a 32 kbyte DATA INSTRUCTION memory, a 34 kbit TAG memory, an 8 kbit VALID flat, a 2 kbit least recently used (LRU) flag, comparators, and CPU interface logic circuits on a chip. The inclusion of the DATA memory is crucial in improving system cycle time. The device uses several novel circuit design technologies, including a double-word-line scheme, low-noise flush clear, a low-power comparator, noise immunity, and directly testable memory design. Its newly proposed way-slice architecture increases both flexibility and expandability 相似文献

3.

Design of a sense circuit for low-voltage flash memories 总被引：1，自引：0，他引：1

Tanzawa T. Takano Y. Taura T. Atsumi S. 《Solid-State Circuits, IEEE Journal of》2000,35(10):1415-1421

A new sense circuit directly sensing the bitline voltage is proposed for low-voltage flash memories. A simple reference voltage generation method and a dataline switching method with matching of the stray capacitance between the dataline pairs are also proposed. A design method for the bitline clamp load transistors is described, taking bitline charging speed and process margins into account. The sense circuit was implemented in a 32-Mb flash memory fabricated with a 0.25-μm flash memory process and successfully operated at a low voltage of 1.5 V 相似文献

4.

A new circuit configuration for a single-transistor cell using Al-gate technology with reduced dimensions

《Solid-State Circuits, IEEE Journal of》1977,12(3):253-257

A single-transistor memory cell in Al-gate technology with 2.5 /spl mu/m line width with a new circuit configuration is introduced. In this cell, the ground line of one cell and the word line of the cell opposite the bit line share the same line. This circuit configuration leads to memory cells having a bit density of 5720 bit/mm/SUP 2/ even though it uses a single layer metallization. The voltage conditions in this cell differ from those in conventional storage cells, but do not reduce the operation range of the new cell. As design and circuit studies have shown, a 32 kbit memory can be realized on a chip area of about 15.4 mm/SUP 2/, having an access time of 200 ns and a power dissipation of 500 mW. 相似文献

5.

A 250-Mb/s, 700-mW, 32-highway×8-b S/P converter LSI withcross-access memory

Ohtomo Y. Suzuki M. 《Solid-State Circuits, IEEE Journal of》1992,27(4):530-538

A multihighway serial/parallel (S/P) converted LSI chip suitable for the broadband Integrated Services Digital Network (B-ISDN) node interface is presented. The chip, fabricated with 0.8-μm BiCMOS technology, handles 32-highway×8 b of S/P, P/S conversion at up to 250 Mb/s and has a power dissipation of 700 mW. The chip features cross-access memory and a current-cut-type CMOS/ECL interface circuit. Each of these features is described and evaluated. A newly developed BiNMOS-type D-flip-flop (D-FF) is used to speed up the cross-access memory and is compared to a CMOS D-FF 相似文献

6.

一种嵌入式智能家居控制系统的设计

黄红霞《山西电子技术》2014,(1):38-40

提出智能家居控制系统及其应用意义,采用的是ARM9嵌入式处理器S3C2410进行核心控制电路的设计,完成了主控芯片电路、存储电路以及电源电路。同时LCD接口、USB接口以及无线通信模块等电路作为必不可少的部分,本文也给出了设计方案,另外还专门开发了针对煤气泄漏的报警模块。相似文献

7.

Dual-Data Rate Transpose-Memory Architecture Improves the Performance,Power and Area of Signal-Processing Systems

Mohamed El-Hadedy Xinfei Guo Martin Margala Mircea R. Stan Kevin Skadron 《Journal of Signal Processing Systems》2017,88(2):167-184

This paper presents a novel type of high-speed and area-efficient register-based transpose memory architecture enabled by reporting on both edges of the clock. The proposed new architecture, by using the double-edge triggered registers, doubles the throughput and increases the maximum frequency by avoiding some of the combinational circuit used in prior work. The proposed design is evaluated with both FPGA and ASIC flow in 28/32nm technology. The experimental results show that the proposed memory achieves almost 4X improvement in throughput while consuming 46 % less area with the FPGA implementations compared to prior work. For ASIC implementations, it achieves more than 60 % area reduction and at least 2X performance improvement while burning 60 % less power compared to other register-based designs implemented with the same flow. As an example, a proposed 8X8 transpose memory with 12-bit input/output resolution is able to achieve a throughput of 107.83Gbps at 647MHz by taking only 140 slices on a Virtex-7 Xilinx FPGA platform, and achieve a throughput of 88.2Gbps at 529MHz by taking 0.024mm ² silicon area for ASIC. The proposed transpose memory is integrated in both 2D-DCT and 2D-IDCT blocks for signal processing applications on the same FPGA platform. The new architecture allows a 3.5X speed-up in performance for the 2D-DCT algorithm, compared to the previous work, while consuming 28 % less area, and 2D-IDCT achieves a 3X speed-up while consuming 20 % less area. 相似文献

8.

A 7.5-ns 32 K×8 CMOS SRAM

Okuyama H. Nakano T. Nishida S. Aono E. Satoh H. Arita S. 《Solid-State Circuits, IEEE Journal of》1988,23(5):1054-1059

A 256 K (32 K×8) CMOS static RAM (SRAM) which achieves an access time of 7.5 ns and 50-mA active current at 50-MHz operation is described. A 32-block architecture is used to achieve high-speed access and low power dissipation. To achieve faster access time, a double-activated-pulse circuit which generates the word-line-enable pulse and the sense-amplifier-enable pulse has been developed. The data-output reset circuit reduces the transition time and the noise generated by the output buffer. A self-aligned contact technology reduces the diffused region capacitance. This RAM has been fabricated in a twin-tub CMOS 0.8-μm technology with double-level polysilicon (the first level is polycide) and double-level metal. The memory cell size is 6.0×11.0 μm² and the chip size is 4.38×9.47 mm ² 相似文献

9.

An 8 ns 4 Mb serial access memory

Kuriyama H. Hirose T. Murakami S. Wada T. Fujita K. Nishimura Y. Anami K. 《Solid-State Circuits, IEEE Journal of》1991,26(4):502-506

A new architecture for serial access memory is described that enables a static random access memory (SRAM) to operate in a serial access mode. The design target is to access all memory address serially from any starting address with an access time of less than 10 ns. This can be done by all initializing procedure and three new circuit techniques. The initializing procedure is introduced to start the serial operation at an arbitrary memory address. Three circuit techniques eliminate extra delay time caused by an internal addressing of column lines, sense amplifiers, word lines, and memory cell blocks. This architecture was successfully implemented in a 4-Mb CMOS SRAM using a 0.6 μm CMOS process technology. The measured serial access time was 8 ns at a single power supply voltage of 3.3 V 相似文献

10.

A 3.6 pJ/Access 480 MHz, 128 kb On-Chip SRAM With 850 MHz Boost Mode in 90 nm CMOS With Tunable Sense Amplifiers

《Solid-State Circuits, IEEE Journal of》2009,44(7):2065-2077

An extremely low energy per operation, single cycle 32 bit/word, 128 kb SRAM is fabricated in 90 nm CMOS. In the 850 $~$MHz boost mode, total energy consumption is 8.4 pJ/access. This reduces to 3.6 pJ/access in the normal 480 MHz mode and bottoms out at a very aggressive 2.7 pJ/access in the 240 MHz low power mode. Several techniques were combined to obtain these performance numbers. Short buffered local bit lines reduce the impact of the cell read current on memory delay. Extended global bitlines are used which improves delay and energy consumption and which reduces the number of sense amplifiers in the memory to 32. Cell stability and speed issues are avoided by applying selective voltage scaling. Novel, digitally tunable sense amplifiers and a tunable timing circuit cope gracefully with the stochastic variations in the periphery. 相似文献

11.

A sub-0.5-V operating embedded SRAM featuring a multi-bit-error-immune hidden-ECC scheme

Suzuki T. Yamagami Y. Hatanaka I. Shibayama A. Akamatsu H. Yamauchi H. 《Solid-State Circuits, IEEE Journal of》2006,41(1):152-160

The mobile multi-media applications require to lower the operating voltage of embedded SRAMs. The ECC circuit implementation for increasing soft-error and the access timing control that tracks access delay fluctuation in memory core should be considered for the low-voltage operation. A hidden error-check-and-correction (HECC) scheme compensated the access time penalty caused by the ECC logic on the output critical path. And a multi-column ECC word assignment (MCE) increased the multi-bit-error immunity while using only 1-bit-correctable ECC which minimized area penalty. A source-level-adjusted direct sense amplifier (SLAD) and a write-replica circuit with an asymmetrical replica memory cell (WRAM) for the device-fluctuation-tolerant access control were also designed. A 130-nm CMOS 32-Kbit SRAM-macro was fabricated with these circuit techniques, which demonstrated: 1) 0.3-V operation with 6.8 MHz; 2) 30-MHz operation which is feasible for mobile use even at 0.4 V, while keeping 960MHz at 1.5 V; and 3) a reduction by 3.6/spl times/10/sup 5/ in soft-error rate compared with that of conventional ECC. 相似文献

12.

Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip

Nawathe U.G. Hassan M. Yen K.C. Kumar A. Ramachandran A. Greenhill D. 《Solid-State Circuits, IEEE Journal of》2008,43(1):6-20

The second in the Niagara series of processors (Niagara2) from Sun Microsystems is based on the power-efficient chip multi-threading (CMT) architecture optimized for Space, Watts (Power), and Performance (SWaP) [SWap Rating = Performance/(Space _* Power) ]. It doubles the throughput performance and performance/watt, and provides >10times improvement in floating point throughput performance as compared to UltraSPARC T1 (Niagara1). There are two 10 Gb Ethernet ports on chip. Niagara2 has eight SPARC cores, each supporting concurrent execution of eight threads for 64 threads total. Each SPARC core has a floating point and graphics unit and an advanced cryptographic unit which provides high enough bandwidth to run the two 10 Gb Ethernet ports encrypted at wire speeds. There is a 4 MB Level2 cache on chip. Each of the four on-chip memory controllers controls two FBDIMM channels. Niagara2 has 503 million transistors on a 342 mm² die packaged in a flip-chip glass ceramic package with 1831 pins. The chip is built in Texas Instruments' 65 nm 11LM triple-Vt CMOS process. It operates at 1.4 GHz at 1.1 V and consumes 84 W. 相似文献

13.

AC characteristics of Cr/p⁺a-Si:H/V analog switchingdevices

Hu J. Hajto J. Snell A.J. Rose M.J. 《Electron Devices, IEEE Transactions on》2000,47(9):1751-1757

Experimental results on the ac characteristics of electro-formed Cr/p⁺ hydrogenated amorphous silicon (a-Si:H)/V thin film memory devices are presented. The impedance spectrum of the memory switching device has been measured over a wide frequency range from 1 Hz-32 MHz while keeping the ac voltage amplitude below 0.02 V. Simulation of the measured impedance spectrum using an equivalent circuit indicates that the capacitance associated with a conducting filament tends to increase as the memory resistance decreases. This is explained on the basis of an activated tunnelling mechanism. Charge transport is dominated by electron tunnelling via metallic particles in the filament, and hence small changes in interparticle spacing influences the tunnelling process considerately, leading to changes in both memory resistance and effective dielectric constant 相似文献

14.

JPEG2000中高速Tier1编码器的VLSI设计

梅魁志郑南宁吴奇曾强袁泽剑《固体电子学研究与进展》2006,26(3):404-409

提出并实现了一种用于JPEG2000编码芯片中高速Tier1编码器的并行流水结构。该编码器采用了双位平面并行编码、通道扫描的流水控制、状态变量实时产生电路以及列内并行上下文生成等技术,实现了一种0状态存储器的多并行流水位平面编码器;并行同步流水的多记号输入算术编码器以及不定算术编码周期下的多输入同步读取电路,使算术编码速度平均为1.3上下文编码记号对/时钟;对算术编码产生的压缩码流存储呈高效的宏流水线结构。该编码器在100MHz工作时钟下,最高编码速度为85M小波系数/s。用SMIC0.25μm工艺库综合时,门电路为6.3万门,片上存储器为26kb(码块大小32×32),关键路径为5.2ns。相似文献

15.

A PWM analog memory programming circuit for floating-gate MOSFETswith 75-μs programming time and 11-bit updating resolution

Kinoshita S. Morie T. Nagata M. Iwata A. 《Solid-State Circuits, IEEE Journal of》2001,36(8):1286-1290

This paper describes a programming circuit for analog memory using pulsewidth modulation (PWM) signals and the circuit performance obtained from measurements using a floating-gate EEPROM device. This programming circuit attains both high programming speed and high precision. We fabricated the programming circuit using standard 0.6-μm CMOS technology and constructed an analog memory using the programming circuit and a floating-gate MOSFET. The measurement results indicate that the analog memory attains a programming time of 75 μs, an updating resolution of 11 bit, and a memory setting precision of 6.5 bit. This programming circuit can be used for intelligent information processing hardware such as self-learning VLSI neural networks as well as multilevel flash memory 相似文献

16.

一种32位DSP cache存储器设计

杨向峰陶建中《电子与封装》2008,8(2):20-22,46

在一种DSP指令cache的设计中,采用全定制的设计方法,利用0．25μm的CMOS库设计了cache存储器。利用逻辑努力和分支努力的概念优化设计了译码电路,一方面保证了译码器的速度,另一方面减小系统的功耗。并且根据正反馈原理设计了一种差分灵敏放大器,有效地减小了存储器的功耗。电路工作在100MHz的时钟频率下,读写周期的平均动态功耗为25mW。相似文献

17.

Exploiting Thread‐Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

Jaegeun Oh Seok Joong Hwang Huong Giang Nguyen Areum Kim Seon Wook Kim Chulwoo Kim Jong‐Kook Kim 《ETRI Journal》2008,30(4):576-586

In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler‐hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write‐back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single‐instruction multiple‐data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32‐bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2‐way MLEP and 33.7% faster with a 4‐way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler. 相似文献

18.

A 700-MHz switched-capacitor analog waveform sampling circuit

Haller G.M. Wooley B.A. 《Solid-State Circuits, IEEE Journal of》1994,29(4):500-508

Analog switched-capacitor memory circuits are suitable for use in a wide range of applications where analog waveforms must be captured or delayed, such as the recording of pulse echo events and pulse shapes. Analog sampling systems based on switched-capacitor techniques offer performance superior to that of flash A/D converters and charge-coupled devices with respect to cost, density, dynamic range, sampling speed, and power consumption. This paper proposes an architecture with which sampling frequencies of several hundred megahertz can be achieved using conventional CMOS technology. Issues concerning the design and implementation of an analog memory circuit based on the proposed architecture are presented. An experimental two-channel memory with 32 sampling cells in each channel has been integrated in a 2-μm CMOS technology with poly-to-poly capacitors. The measured nonlinearity of this prototype is 0.03% for a 2.5 V input range, and the memory cell gain matching is 0.01% rms. The dynamic range of the memory exceeds 12 b for a sampling frequency of 700 MHz. The power dissipation for one channel operated from a single +5 V supply is 2 mW 相似文献

19.

绝热无比型动态触发器和同步时序电路综合 总被引：1，自引：0，他引：1

刘莹方振贤汪鹏君《电子与信息学报》2002,24(12):1967-1972

该文从电路三要素理论出发研究低功耗电路，定量描述绝热无比型动态记忆电路。绝热无比型动态触发器利用电容接收和保存信息，避免目前绝热电路中电容上的信息得而复失的现象，其中绝热D和T'触发器只用6管，带‘与或非’输入的绝热D触发器只用9管。在上述理论基础上该文提出绝热无比型动态同步时序电路综合方法，用此法设计出绝热5421BCD码十进制计数器，仅用32管，总功耗小于一个PAL－2N四位二进制计数器的功耗，计算机模拟验证该文方法正确。相似文献

20.

一种带有流水线追踪器的JTAG ICE调试电路设计 总被引：1，自引：1，他引：0

沈沙沈泊章倩苓《微电子学与计算机》2004,21(7):139-142

针对复旦大学自主开发的32位RISCCPU,设计了相应JTAG调试电路(In—Circuit Emulator)。为解决此RISCCPU中5级流水线导致的断点误停的问题,提出了一种新颖的带有分支预测功能的电路结构一“流水线追踪器”。此JTAG调试电路与IEEE1149．1标准兼容,具有设置断点、单步、查看或修改CPU寄存器／内存空间、在线FLASH编程等多种功能。相似文献