首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
<正> 随着集成电路工艺的发展,器件及电路的潜力已得到了较充分的利用,靠挖掘器件的潜力或研究新的工艺来提高VLSI的性能变得越来越困难。因而,在进行高性能VLSI设计时,应更多地挖掘尚未充分利用的结构及运算模式上的潜力。  相似文献   

2.
As described in this paper, a real-time object detection system using a Histogram of Oriented Gradients (HOG) feature extraction accelerator VLSI is presented. The VLSI [1, 2] enables the system to achieve real-time performance and scalability for multiple object detection under limited power condition. The VLSI employs three techniques: a VLSI-oriented HOG algorithm with early classification in Support Vector Machine (SVM) classification, a dual-core architecture for parallel feature extraction, and a detection-window-size scalable architecture with a reconfigurable MAC array for processing objects of different shapes. The test chip was fabricated using 65 nm CMOS technology. The measurement result shows that the VLSI consumes 43 mW at 42.9 MHz and 1.1 V to process HDTV (1920?×?1080 pixels) at 30 frames per second (fps). A multiple object detection system and a multiple scale object detection system are presented to demonstrate the system flexibility and scalability realized by VLSI and applicability for versatile application of object detection. On the multiple object detection system, a real-time object detection for HDTV resolution video is achieved with 84 mW of power consumption on a task to detect 2 types of targets while keeping comparable detection accuracy as software-based system. On the multiple scale object detection system, a task to detect 5 scales of a target is accomplished using a single VLSI. The power consumption of the VLSI is estimated to 102 mW on the task.  相似文献   

3.
The major obstacle which has limited the use of Vector Quantization (VQ) for real-time speech coding is the computationally demanding codebook-search algorithm. The essential task of this algorithm, pattern matching, has several properties which make it amenable to VLSI realization using a highly concurrent processor architecture. A VLSI pattern-matching chip provides the essential building-block for a specialpurpose codebook-search processor (CSP). The CSP can serve as a generic architecture for a variety of VQ-based speech coding applications. This paper reports on a working VQ processor for speech coding based on a first generation VLSI chip that efficiently performs the essential pattern-matching operation needed for the codebook-search process. Furthermore, the CSP architecture, using this chip, has been successfully incorporated into a compact single-board Vector PCM implementation which operates at rates between 7 and 18 kbits/s. A real-time Adaptive Vector Predictive Coder system using the CSP and augmented by a TMS-32010 programmable signal processor has been designed and recently implemented. We describe the structure of these two VQ coders and present experimental results obtained using the single-board Vector PCM coder.  相似文献   

4.
This paper presents an enhanced multi-level filter algorithm and its Very Large Scale Integration (VLSI) architecture for infrared image processing. The modified multi-level filter algorithm resolves the splitting targets problem using Gaussian pyramid processing. Owning three filtering paths, the proposed VLSI architecture of the filter can simultaneously enhance small targets with different sizes in infrared images. Some design techniques in implementing hardwired multiplication, subsample and asynchronous FIFO have been presented. This VLSI architecture has been implemented using Semiconductor Manufacturing International Corporation (SMIC) 0.35?µm 4-layer CMOS technology. The simulation results show that it not only effectively suppresses background, eliminates noise and enhances small targets in an infrared image comparing with other small target detective methods, but also meets infrared image real-time processing requirements (5?M?~?10?M pixels/s). The implemented filter chip consists of 60,284 gates and 8?K Static Random Access Memory (SRAM), operates at 50?MHz.  相似文献   

5.
Chip design technology has been accelerating the advances of the communication technology in the past decades because a chip with larger computing capacity can support a communication system of higher transmission bandwidth. Since the communication transceivers are now in the multigiga bits/second range, the computing bandwidth requirement for a transceiver has grown into several hundreds of giga-FLOPs second range. To support such big computing tasks on a chip, nanometer technology and pure baud-rate computing without pipelining and oversampling overheads will be much more important. Meanwhile, baud-rate computing does not require extra-digital control for the digital-signal processing functions. This can greatly reduce the power consumption and chip area of a VLSI system. Yet, there are several design issues, such as the output signal-to-noise ratio, algorithmic mapping for computing model, and the critical path for the datapath design of the VLSI computing function, which need to be resolved under small silicon area requirements A novel baud-rate channel equalization architecture based on training coefficient relaxation techniques is presented in this paper to resolve these issues in nanotechnology such as 130- and 90-nm technologies. This design paradigm clearly demonstrates its advantage to enable multiport transceiver system-on-a-chip designs in nanometer technology. Trends for the baud-rate computing in smaller geometry are also explained.  相似文献   

6.
This paper presents a new multi-level filter algorithm and its corresponding VLSI architecture for infrared image processing. The algorithm eliminates the phenomena of splitting targets by inserting Gaussian pyramid processing. Owning three filter paths, the proposed filter VLSI architecture can enhance small targets with different size in infrared images. This architecture has been implemented using SMIC 0.35μm 4-layer CMOS technology. The test result shows that the filter chip not only effectively suppresses background, eliminates noise and enhances small targets in an infrared image, but also meets infrared image real-time processing requirement(5M ~ 10M pixels/s). The implemented filter chip consists of 60,284 gates and 8K SRAM, operates at 50MHz.  相似文献   

7.
Many VLSI architectures for computing the discrete wavelet transform (DWT) were presented, but the parallel input data sequence and the programmability of the 2-D DWT were rarely mentioned. In this paper, we present a parallel-processing VLSI architecture to compute the programmable 2-D DWT, including various wavelet filter lengths and various wavelet transform levels. The proposed architecture is very regular and easy for extension. To eliminate high frequency components, the pixel values outside the boundary of the image are mirror-extended as the symmetric wavelet transform (SWT) and the mirror-extension is realized via the routing network. Owing to the property of the parallel processing, we adopt the row-based recursive pyramid algorithm (RPA), similar to 1-D RPA, as the data scheduling. This design has been implemented and fabricated in a 0.35 m 1P4M CMOS technology and the working frequency is 50 MHz. The chip size is about 5200 m × 2500 m. For a 256 × 256 image, the chip can perform 30 frames per second with the filter length varying from 2 to 20 and with various levels. The proposed architecture is suitable for real-time applications such as JPEG 2000.  相似文献   

8.
Describes the architecture and design of a CMOS VLSI chip for data compression and decompression using tree-based codes. The chip, called MARVLE, implements a memory-based architecture for variable length encoding and decoding based on tree-based codes. The architecture is based on an efficient scheme of mapping the tree representing any binary code onto a memory device. A prototype 2-mm CMOS VLSI chip has been designed, verified, and fabricated by the MOSIS facility. The chip has a 512×12 static RAM with an access time of 4 ns and logic circuitry for compression as well as decompression. The chip occupies a silicon area of 6.8 mm×6.9 mm and consists of 49695 transistors. The prototype chip yields a compression rate of 95.2 Mb/s and a decompression rate of 60.6 Mb/s with a clock rate of 83.3 MHz. The VLSI hardware can be used to implement the JPEG baseline compression scheme  相似文献   

9.
In this paper, we describe a fully pipelined single chip VLSI architecture for implementing the JPEG baseline image compression standard. The architecture exploits the principles of pipelining and parallelism to the maximum extent in order to obtain high speed and throughput. The architecture for discrete cosine transform and the entropy encoder are based on efficient algorithms designed for high speed VLSI implementation. The entire architecture can be implemented on a single VLSI chip to yield a clock rate of about 100 MHz which would allow an input rate of 30 frames per second for 1024×1024 color images  相似文献   

10.
A VLSI architecture for the generalized bit-flipping decoding algorithm for non-binary low-density parity-check codes is proposed in this paper. The tentative decoding steps of the algorithm have been modified to avoid computing and storing a matrix of dimension N×2 q , for a code (N,K) over GF(2 q ), reducing its complexity with a minimal penalization of its performance, less than 0.05 dB compared with the original algorithm. The architecture was synthesized using a 90 nm standard cell library, for the (837,723) non-binary code over GF(25), requiring 590220 xor gates and achieving a throughput of 89 Mbps. Additionally, it was implemented in a Virtex-VI FPGA device with a cost of 4070 slices and a throughput of 44.6 Mbps.  相似文献   

11.
This article presents the VLSI design of a configurable RSA public key cryptosystem supporting the 512-bit, 1024-bit and 2048-bit based on Montgomery algorithm achieving comparable clock cycles of current relevant works but with smaller die size. We use binary method for the modular exponentiation and adopt Montgomery algorithm for the modular multiplication to simplify computational complexity, which, together with the systolic array concept for electric circuit designs effectively, lower the die size. The main architecture of the chip consists of four functional blocks, namely input/output modules, registers module, arithmetic module and control module. We applied the concept of systolic array to design the RSA encryption/decryption chip by using VHDL hardware language and verified using the TSMC/CIC 0.35 m 1P4 M technology. The die area of the 2048-bit RSA chip without the DFT is 3.9 × 3.9 mm2 (4.58 × 4.58 mm2 with DFT). Its average baud rate can reach 10.84 kbps under a 100 MHz clock.  相似文献   

12.
An operator correlation-based algorithm and its VLSI architecture For computing the 2D discrete wavelet transform is presented. The proposed discrete wavelet transform architecture was simulated in Verilog and synthesised with the FPGA compiler. The implementation for the 2D discrete wavelet transform on an FPGA-based design style is described  相似文献   

13.
Real-time implementation of an order-statistic filter (OSF) or ranked order filter requires the computation of the order statistic (ranked order) of the samples in a window which gets periodically updated with the arrival of a new sample(s). The authors give an algorithm for the computation of the running order statistic. A highly parallel architecture suitable for VLSI implementation is presented. The architecture is very versatile, with programmable window size and rank order. An expansion algorithm and its VLSI architecture, which permit the usage of two r-bit OSFs to implement an (r+1)-bit OSF, where r is the resolution of the input signal samples, are given. In a special case where one is satisfied with at most one LSB error, the hardware complexity of the proposed architecture can be reduced by almost one half. It is further shown how a VLSI chip incorporating the proposed architecture can be used as the basic building block in the real-time implementation of other forms of nonlinear filters  相似文献   

14.
A high speed analog VLSI image acquisition and low-level image processing system is presented. The architecture of the chip is based on a dynamically reconfigurable SIMD processor array. The chip features a massively parallel architecture enabling the computation of programmable mask-based image processing in each pixel. Each pixel include a photodiode, an amplifier, two storage capacitors, and an analog arithmetic unit based on a four-quadrant multiplier architecture. A 64 × 64 pixel proof-of-concept chip was fabricated in a 0.35 μm standard CMOS process, with a pixel size of 35 μm × 35 μm. The chip can capture raw images up to 10,000 fps and runs low-level image processing at a framerate of 2,000–5,000 fps.  相似文献   

15.
根据802.11i AES加密/解密算法的要求,配合给定的系统时钟频率,提出了较为节约面积的、极为规则的AES运算电路的实现方法.通过分析系统时钟与系统数据吞吐量的要求,给出了较为合理的面向HT(High Throughput)的802.11i CCMP AES算法系统架构,对其中的AES运算单元的实现方法进行分析比较,得出了较小面积的AES运算单元的实现方案.用Design Compiler做综合分析后发现,优化后的面积比现有的方法至少下降了31%,从而有效地降低了IC的成本.  相似文献   

16.
K-best Schnorr-Euchner (KSE) decoding algorithm is proposed in this paper to approach near-maximum-likelihood (ML) performance for multiple-input-multiple-output (MIMO) detection. As a low complexity MIMO decoding algorithm, the KSE is shown to be suitable for very large scale integration (VLSI) implementations and be capable of supporting soft outputs. Modified KSE (MKSE) decoding algorithm is further proposed to improve the performance of the soft-output KSE with minor modifications. Moreover, a VLSI architecture is proposed for both algorithms. There are several low complexity and low-power features incorporated in the proposed algorithms and the VLSI architecture. The proposed hard-output KSE decoder and the soft-output MKSE decoder is implemented for 4/spl times/4 16-quadrature amplitude modulation (QAM) MIMO detection in a 0.35-/spl mu/m and a 0.13-/spl mu/m CMOS technology, respectively. The implemented hard-output KSE chip core is 5.76 mm/sup 2/ with 91 K gates. The KSE decoding throughput is up to 53.3 Mb/s with a core power consumption of 626 mW at 100 MHz clock frequency and 2.8 V supply. The implemented soft-output MKSE chip can achieve a decoding throughput of more than 100 Mb/s with a 0.56 mm/sup 2/ core area and 97 K gates. The implementation results show that it is feasible to achieve near-ML performance and high detection throughput for a 4/spl times/4 16-QAM MIMO system using the proposed algorithms and the VLSI architecture with reasonable complexity.  相似文献   

17.
单片密码数据处理器系统级体系结构的研究   总被引:1,自引:0,他引:1  
提出了一种单片密码数据处理器系统结构的设计 ,这些系统结构涉及到微处理器的体系结构、数据接口、用户身份识别接口、密码算法的专用部件、密码算法 RSA和 CHES的实现 IP模块 [1,2 ]以及伪随机数发生器 ,这些模块是单片密码数据处理器系统所必须有的 ,单片密码数据处理器的体系结构不同于其它系统 ,在结构上具有一定的保密作用 ,同时具有密码专用部件和密码专用指令用于加速密码数据处理的速度 ,因此具有许多密码特色 ,是信息安全设备设计中有效的 So C芯片实现的系统设计。  相似文献   

18.
Fractional Motion Estimation (FME) in high-definition H.264 presents a significant design challenge in terms of memory bandwidth, latency and area cost as there are various modes and complex mode decision flow, which require over 45% of the computation complexity in the H.264 encoding process. In this paper, a new high-performance VLSI architecture for Fractional Motion Estimation (FME) in H.264/AVC based on the full-search algorithm is presented. This architecture is made up of three different pipeline processors to establish a trade-off between processing time and hardware utilization. The computing scheme based on a 4-pixel interpolation unit with a 10-pixel input bandwidth is capable of processing a macroblock (MB) in 870 clock cycles. The final VLSI implementation only requires 11.4 k gates and 4.4kBytes of RAM in a standard 180 nm CMOS technology operating at 290 MHz. Our design generates the residual image and the best MVs and mode in a high throughput and low area cost architecture while achieving enough processing capacity for 1080HD (1920 × 1088@30fps) real-time video streams.  相似文献   

19.
Many radar sensor systems demand high performance front-end signal processing. The high processing throughput is driven by the fast analog-to-digital conversion sampling rate, the large number of sensor channels, and stringent requirements on the filter design leading to a large number of filter taps. The computational demands range from tens to hundreds of billion operations per second (GOPS). Fortunately, this processing is very regular, highly parallel, and well suited to VLSI hardware. We recently fielded a system consisting of 100 GOPS designed using custom VLSI chips. The system can adapt to different filter coefficients as a function of changes in the transmitted radar pulse. Although the computation is performed on custom VLSI chips, there are important reasons to attempt to solve this problem using adaptive computing devices. As feature size shrinks and field programmable gate arrays become more capable, the same filtering operation will be feasible using reconfigurable electronics. In this paper we describe the hardware architecture of this high performance radar signal processor, technology trends in reconfigurable computing, and present an alternate implementation using emerging reconfigurable technologies. We investigate the suitability of a Xilinx Virtex chip (XCV1000) to this application. Results of simulating and implementing the application on the Xilinx chip is also discussed.  相似文献   

20.
Standard VLSI implementations of turbo decoding require substantial memory and incur a long latency, which cannot be tolerated in some applications. A parallel VLSI architecture for low-latency turbo decoding, comprising multiple single-input single-output (SISO) elements, operating jointly on one turbo-coded block, is presented and compared to sequential architectures. A parallel interleaver is essential to process multiple concurrent SISO outputs. A novel parallel interleaver and an algorithm for its design are presented, achieving the same error correction performance as the standard architecture. Latency is reduced up to 20 times and throughput for large blocks is increased up to six-fold relative to sequential decoders, using the same silicon area, and achieving a very high coding gain. The parallel architecture scales favorably: latency and throughput are improved with increased block size and chip area.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号