共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A high-speed low-power novel architecture of Dual Bit Content Addressable Memory (DB-CAM) is reported in this article. A low leakage, low power and high-speed memory has been developed using the novel architecture of DB-CAM that can store 2 bits in a single CAM block and Static Random Access Memory (SRAM). Data search operation is done by using CAM cells and SRAMs are used as data storage cells. The output of SRAM cells depend on the search result of CAM cells. To make the search operation more precise a priority detector circuit has been proposed. The new architecture of DB-CAM block reduces the power consumption, transistor count and the area on chip enormously. The functionality of these circuits was checked and performance parameters like propagation delay and dynamic power consumption were calculated by spice spectre (CADENCE) using standard 90?nm CMOS technology. 相似文献
3.
Described is a design for high-speed low-power-consumption fully parallel content-addressable memory (CAM) macros for CMOS ASIC applications. The design supports configurations ranging from 64 words by 8 bits to 2048 words by 64 bits and achieves around 7.5-ns search access times in CAM macros on a 0.35-μm 3.3-V standard CMOS ASIC technology. A new CAM cell with a pMOS match-line driver reduces search rush current and power consumption, allowing a NOR-type match-line structure suitable for high-speed search operations. It is also shown that the CAM cell has other advantages that lead to a simple high-speed current-saving architecture. A small signal on the match line is detected by a single-ended sense amplifier which has both high-speed and low-power characteristics and a latch function. The same type of sense amplifier is used for a fast read operation, realizing 5-ns access time under typical conditions. For further current savings in search operations, the precharging of the match line is controlled based on the valid bit status. Also, a dual bit switch with optimized size and control reduces the current. CAM macros of 256×54 configuration on test chips showed 7.3-ns search access time with a power-performance metric of 131 fJ/bit/search under typical conditions 相似文献
4.
Takashima D. Takeuchi Y. Miyakawa T. Itoh Y. Ogiwara R. Kamoshida M. Hoya K. Doumae S.M. Ozaki T. Kanaya H. Yamakawa K. Kunishima I. Oowaki Y. 《Solid-State Circuits, IEEE Journal of》2001,36(11):1713-1720
This paper demonstrates the first 8-Mb chain ferroelectric RAM (chain FeRAM) with 0,25-μm 2-metal CMOS technology. A small die of 76 mm2 and a high average cell/chip area efficiency of 57.4 % have been realized by introducing not only chain architecture but also four new techniques: 1) a one-pitch shift cell realizes small cell size of 5.2 μm2; 2) a new hierarchical wordline architecture reduces row-decoder and plate-driver areas without an extra metal layer; 3) a small-area dummy cell scheme reduces dummy capacitor size to 1/3 of the conventional one; and 4) a new array activation scheme reduces dataline and second amplifier areas. As a result, the chain architecture with these new techniques reduces die size to 65% of that of the conventional FeRAM. Moreover a ferroelectric capacitor overdrive scheme enables sufficient polarization switching, without overbias memory cell array. This scheme lowers the minimum operation voltage by 0.23 V, and enables 2.5-V Vdd operation. Thanks to fast cell plateline drive of chain architecture, the 8-Mb chain FeRAM has achieved the fastest random access time, 40 ns, and read/write cycle time, 70 ns, at 3.0 V so far reported 相似文献
5.
This paper presents a novel VLSI architecture for a fully parallel precomputation-based content-addressable memory (PB-CAM) with low-power, low-cost, and low-voltage features. This design is based on a precomputation approach that saves not only power consumption of the CAM system, but also reduces transistor count and operating voltage of the CAM cell. In addition, the proposed PB-CAM word structure adopts the static pseudo-nMOS circuit design to improve system performance. The whole design was fabricated with the TSMC 0.35-/spl mu/m single-poly quadruple-metal CMOS process. With a 128 words by 30 bits CAM size, the measurement results indicate that the proposed circuit works up to 100 MHz with power consumption of 33 mW at 3.3-V supply voltage and works up to 30 MHz under 1.5-V supply voltage. 相似文献
6.
Yeonbae Chung 《Electronics letters》2003,39(24):1706-1708
A new FRAM (ferroelectric RAM) design method, utilising a bit-plate parallel cell architecture is presented. This method is effective in reducing circuit and layout overhead caused by the on-pitch plate control circuitry. It also reduces the power consumption in the memory array. Implementation results for a 0.13 /spl mu/m CMOS technology, 512 kb FRAM prototype show that the memory block area in the proposed architecture is 15.6% less than that of a conventional structure. 相似文献
7.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(9):1297-1303
8.
Tsang K. Wei B.W.Y. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(3):360-364
Image compression applications use vector quantization (VQ) for its high compression ratio and image quality. The current VQ hardware employs static instead of dynamic code book generation as the latter demands intensive computation and corresponding expensive hardware even though it offers better image quality. This paper describes a VLSI architecture for a real-time dynamic code book generator and encoder of 512×512 images at 30 frames/s. The four-chip 0.8 μm CMOS design implements a tree of Kohonen self-organizing maps, and consists of two VQ processors and two image buffer memory chips. The pipelined VQ processor contains a computational core for both code book generation and encoding, and is scalable to processing larger frames 相似文献
9.
Chanho Lee 《ETRI Journal》2004,26(1):21-26
This paper proposes a new architecture for a Viterbi decoder with an efficient memory management scheme. The trace‐back operation is eliminated in the architecture and the memory storing intermediate decision information can be removed. The elimination of the trace‐back operation also reduces the number of operation cycles needed to determine decision bits. The memory size of the proposed scheme is reduced to 1/(5×constraint length) of that of the register exchange scheme, and the throughput is increased up to twice that of the trace‐back scheme. A Viterbi decoder complying with the IS‐95 reverse link specification is designed to verify the proposed architecture. The decoder has a code rate of 1/3, a constraint length of 9, and a trace‐forward depth of 45. 相似文献
10.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(12):1596-1608
11.
Mukherjee A. Ranganathan N. Flieder J. Acharya T. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1993,1(2):203-214
Describes the architecture and design of a CMOS VLSI chip for data compression and decompression using tree-based codes. The chip, called MARVLE, implements a memory-based architecture for variable length encoding and decoding based on tree-based codes. The architecture is based on an efficient scheme of mapping the tree representing any binary code onto a memory device. A prototype 2-mm CMOS VLSI chip has been designed, verified, and fabricated by the MOSIS facility. The chip has a 512×12 static RAM with an access time of 4 ns and logic circuitry for compression as well as decompression. The chip occupies a silicon area of 6.8 mm×6.9 mm and consists of 49695 transistors. The prototype chip yields a compression rate of 95.2 Mb/s and a decompression rate of 60.6 Mb/s with a clock rate of 83.3 MHz. The VLSI hardware can be used to implement the JPEG baseline compression scheme 相似文献
12.
Based on the analysis of typical hybrid-type content addressable memory (CAM) structures, a hybrid-type CAM architecture with lower power consumption and higher stability was proposed. This design changes the connection of a N-type metal-oxide-semiconductor (NMOS) transistor in the control circuit, which greatly reduces the power consumption during comparison by making the match line simply discharge to the NMOS threshold voltage. A comparative study was made between conventional and the proposed hybrid-type CAM architecture by semiconductor manufacturing international corporation (SMIC) 65 nm complementary metal-oxide-semiconductor (CMOS) technology. Simulation shows that the power consumption of the proposed structure is reduced by 23%. Furthermore, the proposed design also adjusts the match line (ML) discharge path. In case that, the not and type (NAND-type) block is matched and the not or type (NOR-type) block is mismatched, the jitter voltage on the match line can be decreased largely. 相似文献
13.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(12):1696-1707
14.
Design of Spin-Torque Transfer Magnetoresistive RAM and CAM/TCAM with High Sensing and Search Speed 总被引:1,自引:0,他引:1
15.
Mansour M.M. Shanbhag N.R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(6):976-996
A high-throughput memory-efficient decoder architecture for low-density parity-check (LDPC) codes is proposed based on a novel turbo decoding algorithm. The architecture benefits from various optimizations performed at three levels of abstraction in system design-namely LDPC code design, decoding algorithm, and decoder architecture. First, the interconnect complexity problem of current decoder implementations is mitigated by designing architecture-aware LDPC codes having embedded structural regularity features that result in a regular and scalable message-transport network with reduced control overhead. Second, the memory overhead problem in current day decoders is reduced by more than 75% by employing a new turbo decoding algorithm for LDPC codes that removes the multiple checkto-bit message update bottleneck of the current algorithm. A new merged-schedule merge-passing algorithm is also proposed that reduces the memory overhead of the current algorithm for low to moderate-throughput decoders. Moreover, a parallel soft-input-soft-output (SISO) message update mechanism is proposed that implements the recursions of the Balh-Cocke-Jelinek-Raviv (BCJR) algorithm in terms of simple "max-quartet" operations that do not require lookup-tables and incur negligible loss in performance compared to the ideal case. Finally, an efficient programmable architecture coupled with a scalable and dynamic transport network for storing and routing messages is proposed, and a full-decoder architecture is presented. Simulations demonstrate that the proposed architecture attains a throughput of 1.92 Gb/s for a frame length of 2304 bits, and achieves savings of 89.13% and 69.83% in power consumption and silicon area over state-of-the-art, with a reduction of 60.5% in interconnect length. 相似文献
16.
Arimoto K. Asakura M. Hidaka H. Matsuda Y. Fujishama K. 《Solid-State Circuits, IEEE Journal of》1991,26(4):560-565
An intelligent cache based on a distributed architecture that consists of a hierarchy of three memory sections-DRAM (dynamic RAM), SRAM (static RAM), and CAM (content addressable memory) as an on-chip tag-is reported. The test device of the memory core is fabricated in a 0.6 μm double-metal CMOS standard DRAM process, and the CAM matrix and control logic are embedded in the array. The array architecture can be applied to 16-Mb DRAM with less than 12% of the chip overhead. In addition to the tag, the array embedded CAM matrix supports a write-back function that provides a short read/write cycle time. The cache DRAM also has pin compatibility with address nonmultiplexed memories. By achieving a reasonable hit ratio (90%), this cache DRAM provides a high-performance intelligent main memory with a 12 ns(hit)/34 ns(average) cycle time and 55 mA (at 25 MHz) operating current 相似文献
17.
Subash Chandar Mahesh Mehendale R. Govindarajan 《The Journal of VLSI Signal Processing》2006,44(3):245-267
In embedded control applications, system cost and power/energy consumption are key considerations. In such applications, program
memory forms a significant part of the chip area. Hence reducing code size reduces the system cost significantly. A significant
part of the total power is consumed in fetching instructions from the program memory. Hence reducing instruction fetch power
has been a key target for reducing power consumption. To reduce the cost and power consumption, embedded systems in these
applications use application specific processors that are fine tuned to provide better solutions in terms of code density,
and power consumption. Further fine tuning to suit each particular application in the targeted class can be achieved through
reconfigurable architectures. In this paper, we propose a reconfiguration mechanism, called Instruction Re-map Table, to re-map the instructions to shorter length code words. Using this mechanism, frequently used set of instructions can be
compressed. This reduces code size and hence the cost. Secondly, we use the same mechanism to target power reduction by encoding
frequently used instruction sequences to Gray codes. Such encodings, along with instruction compression, reduce the instruction
fetch power. We enhance Texas Instruments DSP core TMS320C27x to incorporate this mechanism and evaluate the improvements
on code size and instruction fetch energy using real life embedded control application programs as benchmarks. Our scheme
reduces the code size by over 10% and the energy consumed by over 40%.
*A preliminary version of this paper has appeared in the International Conference on Computer Aided Design (ICCAD-2001), San
Jose, CA, November 2001. 相似文献
18.
A static random access memory (SRAM)-based novel hardware architecture for longest prefix match (LPM) search scheme has been proposed in this paper. The key concept of this architecture is to store the IP prefixes virtually in the forwarding table. This architecture reduces memory consumption by using a two-tier hierarchical SRAM-based memory structure for maintaining the next hop port information. Originally, next hop addresses are kept in the shared global memory called next hop global memory (NHGM) and its links are maintained in another memory, called next hop link memory (NHLM). This approximately reduces memory consumption by 50–62.5% compared to existing SRAM-based schemes. The proposed architecture consumes single memory write cycle to store an IP prefix and also takes single memory read cycle for LPM search. However, finding next hop information incurs two memory read cycles due to hierarchical next hop memory structure. The proposed scheme performs an LPM lookup operation in 1.05–1.31 ns in IPv4 and between 1.05 and 1.6 ns in IPv6. This results into LPM search throughput of 950 million lookups per second (MLPS) to 760 MLPS in IPv4 and between 620 and 950 MLPS in IPv6. The average search throughput achievable from this architecture is roughly 850 MLPS in IPv4 and 780 MLPS in IPv6. The numerical results show that this architecture significantly reduces memory requirement, power consumption, and transistor-count/bit requirement. 相似文献
19.
《Signal Processing: Image Communication》2014,29(1):96-106
Chain codes are the most size-efficient representations of rasterised binary shapes and contours. This paper considers a new lossless chain code compression method based on move-to-front transform and an adaptive run-length encoding. The former reduces the information entropy of the chain code, whilst the latter compresses the entropy-reduced chain code by coding the repetitions of chain code symbols and their combinations using a variable-length model. In comparison to other state-of-the-art compression methods, the entropy-reduction is highly efficient, and the newly proposed method yields, on average, better compression. 相似文献
20.
A new high-density multiple-valued content-addressable memory (CAM) is proposed to perform highly parallel search operations in a limited chip area. The number of cells in the CAM is reduced by the use of multiple-valued data representation. Moreover, multiple-valued stored data correspond to the threshold voltage of a floating-gate MOS transistor, so that the cell circuit can be designed using only a single transistor. As a result, the cell area of the proposed four-valued CAM is reduced to 14% of that of a conventional dynamic binary CAM, and its performance is about 5.4-times higher than that of the corresponding binary one under a 0.8-μm standard EEPROM technology 相似文献