首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 906 毫秒
1.
Design of a unit which handles functions of line scanning, digit forming, and digit pulsing fully independent of the main processor has been presented. These functions as implemented in conventional systems form a major portion of the real time load on the processor. The hardware unit described in this paper takes care of all these functions with only nominal interaction with the processsor, thereby reducing considerably the real time load. In this unit one very high speed logic assembly is shared by a large number of lines with the help of memory and timing circuits. The same hardware with only minor modifications can be used for telex application also. A typical unit for 1000 lines requires about 300 IC's. This includes the processor interface.  相似文献   

2.
Many early vision tasks require only 6 to 8 b of precision. For these applications, a special-purpose analog circuit is often a smaller, faster, and lower power solution than a general-purpose digital processor, but the analog chips lack the programmability of digital image processors. This paper presents a programmable mixed-signal array processor which combines the programmability of a digital processor with the small area and low power of an analog circuit. Each processor cell in the array utilizes a digitally programmable analog arithmetic unit with an accuracy of 1.3%. The analog arithmetic unit utilizes a unique circuit that combines a cyclic switched-capacitor analog-to-digital converter (ADC) and digital-to-analog converter (DAC) to perform addition, subtraction, multiplication, and division, Each processor cell, fabricated in a 0.8-μm triple-metal CMOS process, operates at a speed of 0.8 MIPS, consumes 1.8 mW of power at 5 V, and uses 700 μm by 270 μm of silicon area. An array of these processor cells performed an edge detection algorithm and a subpixel resolution algorithm  相似文献   

3.
介绍了一种适用于Viterbi解码器的异步ACS(加法器-比较器-选择器)的设计.它采用异步握手信号取代了同步电路中的整体时钟.给出了一种异步实现结构的异步加法单元、异步比较单元和异步选择单元电路.采用全定制设计方法设计了一个异步4-bit ACS,并通过0.6μm CMOS工艺进行投片验证.经过测试,芯片在工作电压5V,工作频率20MHz时的功耗为75.5mW.由于采用异步控制,芯片在"睡眠"状态待机时不消耗动态功耗.芯片的平均响应时间为19.18ns,仅为最差响应时间23.37ns的82%.通过与相同工艺下的同步4-bit ACS在功耗和性能方面仿真结果的比较,可见异步ACS较同步ACS具有优势.  相似文献   

4.
A processor is any self-contained computer of at least personal-computer capability. The paper explores how much the processor mean time-to-failure can be improved by replacing it with an N-processor module, where each processor in the module consists of a copy of the original processor augmented with a communication protocol unit. The copy of the original processor is faulty with probability, pc, and the protocol unit is faulty with probability, p. The asynchronous N-processor module uses a Byzantine agreement (F-ID-P) algorithm to identify which of its processors disagreed with a module consensus. The identified processors are presumed faulty, and the module replaces them with duplicates from a set of standbys. The F-ID-P algorithm is a modification of Bracha's, which guarantees that in a module of 3t+1 processors, up to t faults can be identified by at least t+1 non-faulty processors. The module fails if faults in more than t of its processors prevent it from: 1) obtaining a correct consensus, or 2) executing the algorithm. The F-ID-P algorithm departs from Bracha's by using a random instead of an adversary scheduler of message delays. Simulation showed that almost always F-ID-P algorithm correctly identified all of a module's faulty processors if more than half of them were nonfaulty. Thus F-ID-P algorithm was about 3/2 more fault tolerant than guaranteed. Also, compared to a single processor's mean number of decisions to failure, the F-ID-P module was 841 times better when N=37, down to 5.1 times better when N=10  相似文献   

5.
《IEE Review》1990,36(9):331-335
The Sun 4/200 workstation, launched in 1987, is powered by a new processor design developed by Sun and called the Sparc (scalable processor architecture). In Sparc the architecture is logically organised as three distinct units: an integer unit, a floating-point unit and an optional, implementation-defined coprocessor. Sparc is an example of a RISC (reduced instruction set computer) and the emphasis is on keeping instructions simple and maximising the rate at which they are executed  相似文献   

6.
This paper describes a gesteral-purpose digital-signal processor which is constructed with 4 bit bipolar microprocessor slices. The signal processor is microprogrammable and contains special features which allow it to employ distributed arithmetic. Hence, the processor can achieve high sampling rates without using a hardware multiplier unit. The processor's architecture is presented and its micro-order structure is examined. The processor wordlength is 16 bit; its basic cycle time, 300 ns; its data memory size, 2K words; its control store size, 256 x 56 bits. It consumes 48 W of power and has special address processing hardware. Experimental results with a twelfth-order digital filter are demonstrated. The signal processor is also compared with several other signal processors of its class described in the literature.  相似文献   

7.
In this paper, the architecture of an embedded processor extended with a tightly-coupled coarse-grain reconfigurable functional unit (RFU) is proposed. The efficient integration of the RFU with the control unit and the datapath of the processor eliminate the communication overhead between them. To speed up execution, the RFU exploits instruction level parallelism (ILP) and spatial computation. Also, the proposed integration of the RFU efficiently exploits the pipeline structure of the processor, leading to further performance improvements. Furthermore, a development framework for the introduced architecture is presented. The framework is fully automated, hiding all reconfigurable hardware related issues from the user. The hardware model of the architecture was synthesized in a 0.13?µm process and all information regarding area and delay were estimated and presented. A set of benchmarks is used to evaluate the architecture and the development framework. Experimental results prove performance improvements in addition to potential energy reduction.  相似文献   

8.
A 1-million transistor 64-b microprocessor has been fabricated using 0.8-μm double-metal CMOS technology. A 40-MIPS (million instructions per second) and 20-MFLOPS (million floating-point operations per second) peak performance at 40 MHz is realized by a self-clocked register file and two translation lookaside buffers (TLBs) with word-line transition detection circuits. The processor contains an integer unit based on the SPARC (scalable processor architecture) RISC (reduced instruction set computer) architecture, a floating-point unit (FPU) which executes IEEE-754 single- and double-precision floating-point operations a 6-KB three-way set-associative physical instruction cache, a 2-KB two-way set-associative physical data cache, a memory management unit that has two TLBs, and a bus control unit with an ECC (error-correcting code) circuit  相似文献   

9.
The CMOS Imager presented integrates a 2D photoreceptor array with a nine input analog processor on the same focal plane. The analog processor is fully programmable, performing multiply-accumulate operations. A VLSI implementation of spatial convolution operations performed on images is presented. A modified photoreceptor is presented that is based on current mode for signal transmission, thus decreasing the effect of noise on the transmitted signal and increasing the sensitivity per decade. A novel decoding scheme was used to decode the required set of photoreceptors to be presented to the analog processor. Thus only one processor unit is needed whose inputs depend on the time state. A prototype system was fabricated that incorporates 15×15 pixels in a 2×2 mm2 using a 2 m double metal, single poly process.  相似文献   

10.
高广坦 《电子工程师》2010,36(11):17-19
以ADI公司高性能浮点DSP芯片TS201为核心处理器,结合Xilinx公司VIRTEX-IIPRO系列FPGA芯片设计的2片DSP数据缓存板和4片DSP主处理板,设计了一种雷达信号处理机。该信号处理机中,DSP芯片仅用链路口完成相互间点对点的通信,各自的数据总线互补相连,存储器空间地址彼此独立。系统具有硬件结构体积小,程序易调试,整体可靠性高的优点。  相似文献   

11.
This paper describes the design and implementation of a very long instruction word (VLTW) microprocessor. The VLIW integer processor (VIPER) contains four pipelined functional units and can achieve 0.25-cycle-per-instruction performance. The processor is capable of performing multiway branch operations, two load/store operations, or up to four ALU operations in each clock cycle, with full register file access to each functional unit. Designed in twelve months, the processor is integrated with an instruction cache controller and a data cache, requiring 450,000 transistors and a die size of 12.9 mm×9.1 mm in a 1.2-μm technology  相似文献   

12.
A trellis code encoded by using the encoder of a convolutional code C with a short constraint length followed by an additional processing unit is equivalent to a trellis code with a large constraint-length. In 1993, Hellstern proposed a trellis coding scheme for which the processing unit consists of a delay processor and a signal mapper. With Hellstern's scheme, trellis codes with large free distances can be constructed. In this paper, we propose two trellis coding schemes. For the first scheme, the processing unit is composed of multiple pairs of delay processors and signal mappers. For the second scheme, the processing unit is composed of a convolutional processor and a signal mapper, where a convolutional processor is a rate 1 convolutional code. The trellis code constructed from each of the proposed schemes can be suboptimally decoded by using the trellis of the convolutional code C with some feedback information. Either of the proposed schemes can produce a trellis code that has a larger bound on free distance and better error performance as compared to the trellis code constructed from Hellstern's scheme based on the same convolutional code C  相似文献   

13.
In this paper, we proposed a new architecture of lifting processor for JPEG2000 and implemented it with both FPGA and ASIC. It includes a new cell structure that executes a unit of lifting calculation to satisfy the requirements of the lifting process of a repetitive arithmetic. After analyzing the operational sequence of lifting arithmetic in detail and imposing the causality to implement in hardware, the unit cell was optimized. A new simple lifting kernel was organized by repeatedly arranging the unit cells and a lifting processor was realized for Motion JPEG2000 with the kernel. The proposed processor can handle any size of tiles and support both lossy and lossless operation with (9,7) filter and (5,3) filter, respectively. Also, it has the same throughput rate as the input, and can continuously output the wavelet coefficients of the four types (LL, LH, HL, HH) simultaneously. The lifting processor was implemented in a 0.35 mum CMOS fabrication process, the result of which occupied about 90 000 gates, and was stably operated in about 150 MHz  相似文献   

14.
A low-power and high-performance 4-way 32-bit stream processor core is developed for handheld low-power 3-D graphics systems. It contains a floating-point unified matrix, vector, and elementary function unit. By exploiting the logarithmic arithmetic and the proposed adaptive number conversion scheme, a 4-way arithmetic unit achieves a single-cycle throughput for all these operations except for the matrix-vector multiplication that takes 2 cycles per result, which were 4 cycles in conventional way. The processor featured by this functional unit and several proposed architectural schemes including embedded register index calculations, functional unit reconfiguration, and operand forwarding in logarithmic domain achieves 19.1% cycle count reduction for OpenGL transformation and lighting (TnL) operation from the latest work.   相似文献   

15.
A 250-MHz, 16-b, fixed-point, super-high-speed video signal processor (S-VSP) ULSI has been developed for constructing a video teleconferencing system. Two major technologies have been developed. One is a high-speed large-capacity on-chip memory architecture that achieves both 250-MHz internal signal processing and 13.5-MHz input and output buffering. The other is a circuit technology that achieves 250-MHz operations with a convolver/multiplier, an arithmetic logic unit (ALU), an accumulator, and various kinds of static RAMs (SRAMs). A phase-locked loop (PLL) is also integrated to generate a 250-MHz internal clock. The S-VSP ULSI, which was fabricated with 0.8-μm BiCMOS and triple-level-metallization technology, has a 15.5-mm×13.0-mm area and contains about 1.13 million transistors. It consumes 7 W at 250-MHz internal clock frequency with a single 5-V power supply  相似文献   

16.
为了满足基于嵌入式处理器的音频解决方案的需要,提出了一种嵌入式处理器中高精度、多功能的定点化运算单元(FPU)。FPU由移位、舍入、饱和3个部分组成。通过对FPU的实现和验证,证明FPU能够显著提高嵌入式处理器定点化操作的速度。  相似文献   

17.
This paper presents a low power 16‐bit adiabatic reduced instruction set computer (RISC) microprocessor with efficient charge recovery logic (ECRL) registers. The processor consists of registers, a control block, a register file, a program counter, and an arithmetic and logical unit (ALU). Adiabatic circuits based on ECRL are designed using a 0.35 µm CMOS technology. An adiabatic latch based on ECRL is proposed for signal interfaces for the first time, and an efficient four‐phase supply clock generator is designed to provide power for the adiabatic processor. A static CMOS processor with the same architecture is designed to compare the energy consumption of adiabatic and non‐adiabatic microprocessors. Simulation results show that the power consumption of the adiabatic microprocessor is about 1/3 compared to that of the static CMOS microprocessor.  相似文献   

18.
A design space exploration methodology of 1-D FFT processor is proposed to find the best hardware architecture in a quantitative way during early design. The methodology includes architecture candidate collection, coarse-grained architecture selection, and circuit level design optimizations. We show how to select a better architecture from candidates including different architectures (SDF, SDC, MDF, MDC and memory-based) with different degree of parallelism at different radices. The sub-level designs, including designs of rotator and data scaling module, are introduced for further optimizations. As a proof of concept, an FFT processor for 4G, WLAN and future 5G is designed supporting 16-4096 and 12-2400 point FFTs. Memory-based architecture with 16-datapath mixed-radix butterfly unit is selected to satisfy the demands for 1GS/s (4096) throughput. The synthesis result based on 65nm technology shows that the silicon cost and power consumption are 1.46mm2 and 68.64mW respectively. The proposed processor has better normalized throughput per area unit and normalized FFTs per energy unit than the state of the art available designs.  相似文献   

19.
王祯  韩泽耀 《信息技术》2008,32(3):34-37
提出了一种新的基于基23算法单路径反馈流水线结构的FFT处理器.通过对数据通路的动态调整,解除了变换点数必须是8的幂次的限制,可高效实现任意2n点的FFT/IFFT变换.并将自定义浮点格式引入流水线,同时在流水线输入端添加预处理单元,在不引入过多逻辑的情况下,有效的提高了FFT的变换精度,同时存储器的使用量降低10%.  相似文献   

20.
This article proposes design and architecture of a dynamically scalable dual-core pipelined processor. Methodology of the design is the core fusion of two processors where two independent cores can dynamically morph into a larger processing unit, or they can be used as distinct processing elements to achieve high sequential performance and high parallel performance. Processor provides two execution modes. Mode1 is multiprogramming mode for execution of streams of instruction of lower data width, i.e., each core can perform 16-bit operations individually. Performance is improved in this mode due to the parallel execution of instructions in both the cores at the cost of area. In mode2, both the processing cores are coupled and behave like single, high data width processing unit, i.e., can perform 32-bit operation. Additional core-to-core communication is needed to realise this mode. The mode can switch dynamically; therefore, this processor can provide multifunction with single design. Design and verification of processor has been done successfully using Verilog on Xilinx 14.1 platform. The processor is verified in both simulation and synthesis with the help of test programs. This design aimed to be implemented on Xilinx Spartan 3E XC3S500E FPGA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号