首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The use of D type flip-flops with a generalised arithmetic array increases its generality, speeds up processes involving divisions and enables the array to be used more efficiently. The calculation of ab/c is used as an example, but it is with large calculations involving a succession of such operations that the array is most useful.  相似文献   

2.
Describes an LSI adaptive array processor (AAP) for two-dimensional data processings. The AAP contains a large number of one-bit processing elements (PEs) arranged in a square array. The large degree of parallelism and control registers in each PE allow for high speed and flexible operations. High transfer capability is also obtained by a simple inter-PE connection network with hierarchical bypasses. The high applicability to various data processings is indicated by a matrix multiplication example, utilizing an algorithm similar to a systolic one. An AAP LSI composed of 8/spl times/8 PEs with powerful functions has been implemented in a 96.0 mm/SUP 2/ chip by using 2 /spl mu/m Si-gate p-well CMOS technology. A high-speed cycle time of 55 ns, low power dissipation of 1.1 W, and high packing density of 1170 transistors/mm/SUP 2/ has been achieved by a skilful manual design. Though the LSI contains as many as 111900 transistors, the design effort has only required one man-year due to cellular array regularity. This LSI is expected to realize a high-performance AAP compactly.  相似文献   

3.
An approximate expression for the resolution of the maximum entropy array processor is derived and compared with the resolution expression for the conventional linear array processor (beamformer).  相似文献   

4.
The authors consider the problem of fault tolerant inversion of triangular matrices based on a linear checksum approach. The iterative Shultz method adapted for parallel implementation on triangular processor arrays was used.<>  相似文献   

5.
6.
The general-purpose, highly parallel, cellular array processor (CAP) we developed features multiple-instruction stream, multiple-data stream (MIMD) processing and image display. Processor elements can number in several hundreds. The present system uses 256 processors. Each processor element consists of a general-purpose microprocessor, memory, and a special VLSI chip that performs parallel-processing-specific functions such as processor communication and synchronization. The VLSI has two 2M byte/s independent common bus interfaces for data broadcating and six 15M bit/s serial communication ports for local data communication. The chip also can process image data in real time for multiple processors. Use of the communication interfaces enables a variety of processor networks to be configured. One CAP application has been computer graphics, in which ray tracing is used to generate quality images.  相似文献   

7.
We present a single-chip integration of a CMOS image sensor with an embedded flexible processing array and dedicated analog-to-digital converter. The processor array is designed to perform convolution and transformation algorithms with arbitrary kernels. It has been designed to carry out the multiplication of analog image data with given digital kernel coefficients and to add up the results. The processor array is an analog implementation of a highly parallel architecture which is scalable to any desired sensor resolution while preserving video-rate operation. A prototype implementation has been realized in a 0.6-/spl mu/m CMOS technology. Switched current technique has been applied to obtain compact and robust circuits. The prototype's sensor resolution is 64 /spl times/ 128 pixels. The processor array occupies a small chip area and consumes only a small percentage of the power (250 /spl mu/W) of the whole image sensor.  相似文献   

8.
Simpson  P. Roberts  J.B.G. 《Electronics letters》1983,19(24):1018-1020
A highly parallel single-instruction multiple-data (SIMD) array signal processor is advocated as efficient for a wide range of real-time problems. We examine its performance for digital speech recognition and show that impressive throughput rates for realistic vocabulary sizes can be achieved for `time-warping? dynamic programming algorithms which currently form the basis of several commercial and research speech recognisers.  相似文献   

9.
Dynamic Programming (DP) applies to many signal and image processing applications including boundary following, the Viterbi algorithm, dynamic time warping, etc. This paper presents an array processor implementation of generic dynamic programming. Our architecture is a SIMD array attached to a host computer. The processing element of the architecture is based on an ASIC design opting for maximum speed-up. By adopting a torus interconnection network, a dual buffer structure, and a multilevel pipeline, the performance of the DP chip is expected to reach the order of several GOPS. The paper discusses both the dedicated hardware design and the data flow control of the DP chip and the total array.This work was supported in part by the NATO, Scientific and Environmental Affairs Division, Collaborative Research Grant SA.5-2-05(CRG.960201)424/96/JARC-501.  相似文献   

10.
A mixed-mode cellular array processor is presented in which the processing units (PUs) are coupled with programmable polynomial (linear, quadratic, and cubic) first neighborhood feedback terms. It combines analog and digital processing so that the couplings and the polynomial terms are implemented with analog blocks whereas the integrator is digital, and analog-to-digital and digital-to-analog converters are used to interface between them. A 10-mm/sup 2/, 1.027 million transistor cellular array processor with 2/spl times/72 PUs and 36 layers of memory in each was manufactured using a 0.25-/spl mu/m digital CMOS process. The array processor can perform gray scale Heun's integration of spatial convolutions with linear, quadratic, and cubic activation functions for a 72/spl times/72 data while keeping all input-output operations during processing local. One complete Heun's iteration round takes 166.4 /spl mu/s and the power consumption during processing is 192 mW. Experimental results of statistical variations in the multipliers and polynomial circuits are shown.  相似文献   

11.
The design of a fault-tolerant rectangular array of processing elements (PEs) is presented in which the reconfiguration is done by means of on-chip distributed logic, without the help of any external host. Spare PEs are included in every column of the array, and faulty PEs are bypassed within a column to facilitate reconfiguration in the presence of faults. Scan paths are used to enhance the testability of the array. PEs are tested locally using near-neighbor comparisons without the need of an external host. Because the interconnections between logical neighbors are short, the speed penalty for reconfiguration is very small. Any amount of redundancy can be incorporated in the array without changing the topology of the scheme or the design of the reconfiguration switches. The scheme is well suited for very large-area, high-density chips and wafer-scale integration. In order to demonstrate the capabilities of the scheme and evaluate its performance, an experimental chip consisting of a 6×4 array was designed, fabricated, and tested. Details of the design and the implementation of the chip are presented. The scheme is also analyzed for yield and area utilization for a range of array sizes and PE survival probabilities  相似文献   

12.
A VLSI array processor for 16-point FFT   总被引:1,自引:0,他引:1  
An implementation of a two-dimensional array processor for fast Fourier transform (FFT) using a 2-μm CMOS technology is presented. The array processor, which is dedicated to 16-point FFT, implements a 4×4 mesh array of 16 processing elements (PEs) working in parallel. Design considerations in both the chip level and the PE level are examined. A layout design methodology based on bit-slice units (BSUs) results in a very simple design, easy debugging, and a regular interconnection scheme through abutment. It contains about 48,000 transistors on an area of 53.52 mm2, excluding the 83-pad area, and operation is on a 15-MHz clock. The array processor performs 24.6 million complex multiplications per second, and computes a 16-point FFT in 3 μs  相似文献   

13.
Many early vision tasks require only 6 to 8 b of precision. For these applications, a special-purpose analog circuit is often a smaller, faster, and lower power solution than a general-purpose digital processor, but the analog chips lack the programmability of digital image processors. This paper presents a programmable mixed-signal array processor which combines the programmability of a digital processor with the small area and low power of an analog circuit. Each processor cell in the array utilizes a digitally programmable analog arithmetic unit with an accuracy of 1.3%. The analog arithmetic unit utilizes a unique circuit that combines a cyclic switched-capacitor analog-to-digital converter (ADC) and digital-to-analog converter (DAC) to perform addition, subtraction, multiplication, and division, Each processor cell, fabricated in a 0.8-μm triple-metal CMOS process, operates at a speed of 0.8 MIPS, consumes 1.8 mW of power at 5 V, and uses 700 μm by 270 μm of silicon area. An array of these processor cells performed an edge detection algorithm and a subpixel resolution algorithm  相似文献   

14.
The architectures, implementation and applications of two smart sensors, LAPP and PASIC, are described. The basic idea of these two designs is to integrate an image sensor array with a digital processor array in a single chip. The integrated camera-and-processor eliminates the bottleneck of sequential image read-out that characterizes conventional systems. They provide fast, compact and economic solutions for tasks such as industrial inspection, optical character recognition and robot vision.  相似文献   

15.
High-rate Viterbi processor: a systolic array solution   总被引:3,自引:0,他引:3  
The main part of the Viterbi algorithm (VA) is a nonlinear feedback loop, the ACS recursion (add-compare-select recursion), which presents a bottleneck for high-speed implementations and cannot be circumvented by standard means. Because the two operations of the loop form an algebraic structure called semiring, it is shown that the ACS recursion of the Viterbi algorithm can therefore be written as a linear vector recursion. This allows the authors to employ the powerful techniques of parallel processing and pipelining, known for conventional linear systems, to achieve high throughput rates. Since the VA can be written as a linear vector recursion, it can be implemented by systolic arrays. For the class of shuffle exchange codes to be decoded by the Viterbi algorithm hardware-efficient code-optimized arrays are presented. It is shown that carry-save arithmetic can be used for the operations of ACS recursion, allowing each word-level operation to be pipelined and carried out by an efficient bit-level systolic array  相似文献   

16.
This paper describes schemes for introducing fault-tolerance into a two-dimensional orthogonal array of cells with nearest neighbour communication paths. The schemes are designed to tolerate a large number of faults and are therefore applicable to the yield-enhancement of large-area VLSI circuits. Simulation results are presented which show the superiority of the schemes over previous proposals and indicate that the nearest neighbour interconnections need not be a barrier to the desirable goal of integrating an array computer onto a whole-wafer circuit.  相似文献   

17.
An architecture based on a systolic array for real-time image template matching is presented. The architecture consists mainly of four elements: a digitizer, a two-dimensional systolic array combined with variable-length shift register arrays, an adder tree, and a comparator. All the elements form a four-stage pipeline. The image data enter the pipe sequentially in the same order as the TV raster scan. The matching computation is, however, performed in a parallel manner. The analyses on time complexity and hardware complexity have shown that real-time performance is achieved. The analyses have also shown that the processing speed is higher and the hardware is simpler when compared to the architecture presented by Chou and Chen.  相似文献   

18.
A new VLSI processor (DIP chip) for image compression is presented which combines principles of multipipeline and array processing. The device is not specific to any one image compression algorithm and can be regarded as a general purpose processor. The chip has been implemented using a CMOS 1.0-μm process on a 14.4×13.5-mm2 die. An internal clock frequency of 40 MHz results in 1.2×109 operations/s on 8-bit data. Solutions to problems associated with the large bandwidth required, for both image data and instruction streams, is the main aim of the paper. The necessary problem of increasing the array clock frequency relative to the input/output clock frequency without the need for a large on-chip instruction cache or fast external clock speeds is also addressed  相似文献   

19.
A new type of receiving array which adaptively minimizes ouput noise power while simultaneously satisfying certain robustness and/or bandwidth criteria is considered. The resulting array gains are shown to be robust against direction uncertainty in the assumed look direction, against wavefront distortions and against array geometry errors. The robustness property is incorporated directly into the adaption algorithm via constraints. Extensive simulation has established very satisfactory performance of this new algorithm, both as a limited broad-band processor and as a robust narrow-band processor.  相似文献   

20.
This paper presents a new reconfiguration technique for VLSI/WSI processor arrays. The fault-tolerant capabilities of both interstitial redundancy and time redundancy are combined to provide optimal reconfiguration. Results obtained through Monte Carlo simulations show that with the proposed reconfiguration technique, a very high yield and chip area utilization is achieved. It is also shown that in the presence of harsh environments, where a high rate of transient faults occur, the proposed algorithm is more robust compared to the existing approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号