首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a new VLSI architecture for an FFT processor. Our architecture uses few processing elements and can be laid out in a mesh-interconnected pattern. We show how to compute the discrete Fourier transform at n points with an optimal speed-up as long as the memory is large enough. The control is shown to be simple and easily implementable in VLSI.  相似文献   

2.
提出了一种改进的奇偶阵列计算结构的运动估计器架构,该运动估计器利用了二维数据复用并能够实现全搜索法。设计了运动估计器的状态机控制逻辑,在其控制下,运动估计器的处理单元达到了100%的利用率。本运动估计器实现了高速、并行的运算,从而可以应用在高清视频的实时后处理等场合。  相似文献   

3.
This paper describes a novel reconfigurable architecture for digital signal processing (DSP). This architecture consists of a two-level array of cells and interconnections. On the upper level, fundamental DSP operations such as multiplication and addition are mapped onto blocks of 4-bit cells. On the lower level, each cell uses a 4 × 4 matrix of smaller “elements” to perform the necessary computations. Cells also contain pipeline latches for increased throughput. The architecture features a simple VLSI implementation that combines the flexibility of memory elements with the speed of DOMINO logic. Initial prototypes have been fabricated using a modest 0.5-μm CMOS technology. Circuit simulations of the cell in 0.25-μm technology indicate that the design achieves a clock frequency of 200 MHz.  相似文献   

4.
An energy efficient adder design based on a hybrid carry computation is proposed. Addition takes place by considering the carry as propagating forwards from the LSB and backwards from the MSB. The incidence at a midpoint significantly accelerates the addition. This acceleration together with combining low-cost ripple-carry and carry-chain circuits, yields energy efficiency compared to other adder architectures. The optimal midpoint is analytically formulated and its closed-form expression is derived. To avoid the quadratic RC delay growth in a long carry chain, it is optimally repeated. The adder is enhanced in a tree-like structure for further acceleration. 32, 64 and 128-bit adders targeting 500 MHz and 1 GHz clock frequencies were designed in 65 nm technology. They consumed 11–18% less energy compared to adders generated by state-of-the-art EDA synthesis tool.  相似文献   

5.
传统ISA处理器内部有限的逻辑资源和外部固定的引脚封装大大的限制了它的应用范围。利用FPGA丰富的逻辑资源实现传统MCU中的各个组成部分,底层采用可配置引脚降低硬件设计复杂度,各模块间采用Wishbone总线结构的方式构建系统,可以达到传统MCU无法完成的要求,具有很好的应用前景。使用硬件描述语言,自底向上设计处理核心80C51,并且与几类通用外设互连组成系统,使用Virtex?Ⅱ Pro系列FPGA进行板级验证。板级验证结果表明实现了既定目标,与标准MCU兼容,系统运行稳定。  相似文献   

6.
提出一种基于行的实时、二维提升整数小波变换的VLSI结构。该结构包括行变换器、列变换器、中间缓存器以及输出控制单元。利用中间缓存器暂存行变换的中间结果,由输出控制单元按优先级从高到低的顺序依次输出各级小波系数。由于在硬件实现中采用基于行的提升变换结构,从而水平和垂直方向上的变换能并行处理。与现有结构相比,该结构具有并行度高、存储量低的特点,并且能够在一幅图像逐行扫描的时间间隔内完成整幅图像的多级小波变换。  相似文献   

7.
一种低存储需求的二维DWT VLSI结构设计方法   总被引:1,自引:0,他引:1  
为了减少基于提升的二维离散小波变换(DWT)VLSI结构设计中的片内存储需求,采用了一种新颖的调度方法,通过读取少量数据进行行滤波操作,并实现和列滤波的并行处理,有效地减少了片内存储容量.此外,行滤波和列滤波变换内部结构采用流水线设计方法,加快了运算速度,提高了硬件资源利用率,减小了电路的规模,并且这种基于提升的9/7离散小波变换二维结构很方便兼容5/3滤波器.经过Verilog HDL仿真验证,结果表明,在50MHz系统时钟下,采用9/7滤波器经3级分解,每秒钟可处理21帧大小为1280×1024×8bit的灰白图像.  相似文献   

8.
一种高性能的适用于AVS的二维整数逆变换实现结构   总被引:1,自引:0,他引:1  
张丁  张明  郑伟  王匡 《电路与系统学报》2006,11(5):93-95,110
针对AVS视频标准中的整数逆变换,本文提出了一种高性能的硬件实现方案.本方案采用两个一维逆变换核和4个16(16的双口SRAM.通过合理控制SRAM的读写方式,避免了数据的预处理与后处理,流水线的深度也得到减少.在列变换时,改变数据运算次序,从而保证了4个双口SRAM不影响运算速度.处理8(8的数据块,本结构仅需要37个时钟,与传统的实现方案相比,在同等运算速度下,面积节约28%.实验表明该结构适用于采用AVS标准的HDTV编解码器.  相似文献   

9.
在名址分离网络中,身份和位置的映射问题非常重要。在对其映射关系存储方法深入分析的基础上,针对逻辑拓扑和物理拓扑不一致的问题,结合遗传算法,将拓扑匹配问题看成一个旅行商问题(TSP问题),并利用遗传算法寻找此问题的满意解,然后用此满意解构建Chord环,并对Chord环的邻居表进行修改改进从而对Chord环的路由跳数进行了优化。分析和仿真结果表明,该方法实现简单,对原始Chord模型改动不大,在平均路由跳数、时延方面都有明显的优势。  相似文献   

10.
一个高效的匹配协议   总被引:2,自引:1,他引:2  
比较两个秘密整数是否相等这类协议称为匹配协议。目前这样的协议要么效率极低,要么不能抗击字典攻击。本文给出了一个新的匹配协议,该协议是语义安全的,不存在概率多项式时间算法区分对两个输入的猜测值。协议是高效的,可以比较两个大整数是否相等,计算复杂性和通信复杂性都为D(1),可以验证参与者是否诚实。该协议可以用于口令认证、电子彩票、可证实加密等安全协议设计。  相似文献   

11.
A systematic efficient fault diagnosis method for reconfigurable VLSI/WSI array architectures is presented. The basic idea is to utilize the output data path independence among a subset of processing elements (PEs) based on the topology of the array under test. The divide and conquer technique is applied to reduce the complexity of test application and enhance the controllability and observability of a processor array. The array under test is divided into nonoverlapping diagnosis blocks. Those PEs in the same diagnosis block can be diagnosed concurrently. The problem of finding diagnosis blocks is shown equivalent to a generalizedEight Queens problem. Three types of PEs and one type of switches, which are designed to be easily testable and reconfigurable, are used to show how to apply this approach. The main contribution of this paper is an efficient switch and link testing procedure, and a novel PE fault diagnosis approach which can speed up the testing by at leastO(V1/2) for the processor arrays considered in this paper, where V is the number of PEs. The significance of our approach is the ability to detect as well as to locate multiple PE, switch, and link faults with little or no hardware overhead.  相似文献   

12.
本文提出一种新的低功率分层运动估值器的VLSI结构,它支持低比特视频编码器的高级预测模式,如H.263和MPEG-4。为减少芯片尺寸及功率消耗,在所有搜索层中使用同一个基本的搜索单元 (BSU)。另外,通过对数据流的有效控制,使其在高级预测模式下,在获得宏块运动矢量的同时,也获得每个宏块中的4个88子块的运动矢量。实验结果表明,这种结构采用较少的门电路,有效降低了功率消耗,并且实现了与全搜索块匹配算法(FSBMA)相似的编码效果,可广泛应用于无线视频通信所需的低功率视频编码器中。  相似文献   

13.
图像特征点匹配是计算机视觉领域的一个瓶颈问题,为了提高匹配精度,提出了一种基于方向角的特征配准算法.该算法采用Harris算子检测角点,根据图像梯度变化计算特征点的方向角,然后将待匹配特征点对间的方向角距离转化为权值对代价函数进行加权,并对特征点进行分析,改进了代价函数.实验证明,本文算法的匹配精度高于传统的归一化互相关(NCC)匹配,且具有一定的旋转不变性和抗噪性.  相似文献   

14.
Performance of relational database systems is a major impediment to their use in many applications. We have designed and implemented a customized RISC processor to accelerate associative search and aggregation operations for relational database systems. Since the processor is programmable and supports many queries concurrently, a system utilizing tens of such processors is capable of handling thousands of complex search requests simultaneously.

While the design of a VLSI programmable processor is a complex process, research prototyping requires a fast turnaround design process. We took advantage of the logic programming paradigm and the silicon compiler technology to explore and simulate architecture alternatives prior to the actual implementation. The prototyping process allowed us to complete the chip design in nine months. The resulting processor, fabricated in 2 Itm CMOS technology, consists of 91,000 transistors, executes over 18 million predicate evaluations per second, and searches database contents at 74 megabytes per second.  相似文献   


15.
6端口CMOS寄存器堆设计   总被引:2,自引:2,他引:0  
高性能超标量处理器完成多条指令并行,需要寄存器堆提供多端口、高速访问.本文介绍一个0.18μmCMOS工艺下的四读二写6端口寄存器堆的全定制设计,它采用改进的多端口存储器单元结构和基于NAND结构的低功耗译码器,并且设计了内部时钟生成部件来提高工作频率.寄存器堆通过功能验证和性能测试,可以工作在450MHz频率上,功耗为36mW,面积0.06mm2,参考综合结果具有高速、低功耗和面积小的特点.  相似文献   

16.
In the paper we show how to approximate the Voronoi diagram of a finite set of planar points in a chemical processor consisting of an agar-palladium thin layer and potassium iodide liquid diffusing on it. The configuration of a given point set is represented by the spatial distribution of Kl drops and the bisectors of the required Voronoi diagram are computed according to the PdCl2 + 2Kl = Pdl2↓ + 2KCl.  相似文献   

17.
In this paper, we have analyzed the register complexity of direct-form and transpose-form structures of FIR filter and explored the possibility of register reuse. We find that direct-form structure involves significantly less registers than the transpose-form structure, and it allows register reuse in parallel implementation. We analyze further the LUT consumption and other resources of DA-based parallel FIR filter structures, and find that the input delay unit, coefficient storage unit and partial product generation unit are also shared besides LUT words when multiple filter outputs are computed in parallel. Based on these finding, we propose a design approach, and used that to derive a DA-based architecture for reconfigurable block-based FIR filter, which is scalable for larger block-sizes and higher filter-lengths. Interestingly, the number of registers of the proposed structure does not increase proportionately with the block-size. This is a major advantage for area-delay and energy efficient high-throughput implementation of reconfigurable FIR filters of higher block-sizes. Theoretical comparison shows that the proposed structure for block-size 8 and filter-length 64 involves 60% more flip-flops, 6.2 times more adders, 3.5 times more AND-OR gates, and offers 8 times higher throughput. ASIC synthesis result shows that the proposed structure for block-size 8 and filter-length 64 involves 1.8 times less area-delay product (ADP) and energy per sample (EPS) than the existing design, and it can support 8 times higher throughput. The proposed structure for block sizes 4 and 8, respectively, consumes 38% and 50% less power than the exiting structure for the same throughput rates on average for different supply voltages.  相似文献   

18.
We propose a programmable architecture for a single instruction multiple data image processor that has its foundation on the mathematical framework of a simplicial cellular neural networks. We develop instruction primitives for basic image processing operations and show examples of processing binary and gray scale images. Fabricated in deep submicron CMOS technologies, the complexity of the digital circuits and wiring in each cell is commensurate with pixel level processing.  相似文献   

19.
A 300-MOPS image digital signal processor (IDSP) including four pipelined date processing units and three parallel input-output (I/O) ports has been developed using a 0.8-μm BiCMOS technology. The IDSP integrates 910000 transistors in a 15.2-mm×15.2-mm area using a macrocell-oriented building-block design environment. The power dissipation was reduced to 1.0 W per 25-MHz instruction cycle, and a TTL-compatible I/O interface was retained by implementing two power supplies-one providing 3 V and the other 5 V. With this performance, a single-board 64/128-kb/s video codec was implemented with four IDSPs  相似文献   

20.
It’s a promising way to improve performance significantly by adding reconfigurable processing unit (RPU) to a general purpose processor. In this paper, a Reconfigurable Multi-Core (RMC) architecture combining general multi-core and reconfigurable logic is proposed. Reconfigurable logic is separated into RPUs logically, which are coupled with general purpose cores as co-processors via a full crossbar switch. An RPU Manager (RPU-M) is also designed to manage RPUs. To verify RMC, a simulation method based on the Simics and Virtex 5 FPGA is adopted, which simplifies the simulation and assures the evaluation accuracy of hardware function cores. Five workloads are selected to test RMC, including 3-DES, AES, SHA2, IDCT and JPEG_ENC. The experimental results show a 3.10 times average speedup over software implementation on the original multi-core, and the data and control communication overhead on RMC is acceptable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号