期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

嵌入式粗颗粒度可重构处理器的软硬件协同设计流程 总被引：4，自引：2，他引：2

于苏东刘雷波尹首一魏少军《电子学报》2009,37(5):1136-1140

面向多媒体应用的可重构处理器架构由主处理器和动态配置的可重构阵列(Reconfigurable Cell Array,RCA)组成.协同设计流程以循环流水线和流水线配置技术为基础,采用启发式算法对应用中较大的关键循环进行了软硬件划分,使用表格调度算法实现了任务在RCA上的映射.经过FPGA验证,H.264基准中的核心算法平均执行速度相比于PipeRench,MorphoSys,以及TI DSP TMS320C64X提高了3.34倍. 相似文献

2.

A General Reconfigurable Architecture for the BLAST Algorithm

Euripides Sotiriades Apostolos Dollas 《The Journal of VLSI Signal Processing》2007,48(3):189-208

The process of DNA sequence matching and database search is one of the major problems of the bioinformatics community. Major scientific efforts to address this problem have provided algorithms and software tools for molecular biologists since the early 1970s. At the algorithmic and software level BLAST is by far the most popular tool. It has been developed and continues to be maintained and distributed by the NCBI organization. The BLAST algorithm and software is computationally very intensive and as a result several computer vendors use it as a benchmark. On the other hand no systematic approach for hardware speedup of BLAST and its variants for different query and database size has been reported to date. In this paper we present our architecture that implements the BLAST algorithm for all of its major versions, and for any size of database and query. The system has been fully designed and partially implemented with reconfigurable logic. It consists of software and hardware parts and achieves a speedup of several times up to thousands of times vs general purpose computers.

Apostolos DollasEmail:

相似文献

3.

Reconfigurable Architecture for Network Flow Analysis

Yusuf S. Luk W. Sloman M. Dulay N. Lupu E.C. Brown G. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(1):57-65

This paper describes a reconfigurable architecture based on field-programmable gate-array (FPGA) technology for monitoring and analyzing network traffic at increasingly high network data rates. Our approach maps the performance-critical tasks of packet classification and flow monitoring into reconfigurable hardware, such that multiple flows can be processed in parallel. We explore the scalability of our system, showing that it can support flows at multi-gigabit rate; this is faster than most software-based solutions where acceptable data rates are typically no more than 100 million bits per second. 相似文献

4.

一种分层结构的片上网络路由设计 总被引：1，自引：1，他引：0

姚放吾翟欣虎《微电子学与计算机》2009,26(11)

随着同一芯片中处理器数日的不断增加,层次化网络结构将成为片上网络(NoC)拓扑研究的热点.针对典型的NoC不规则分层拓扑结构,设计了一套新的免死锁混合路由算法以及新的节点编址方式.同时提出了一种新的交换节点设计构想,并给出了一种有效的拥塞控制策略.仿真结果表明,当网络中数据流量变大时分层网络比传统二维网络具有更小的传输时延以及更大的吞吐量. 相似文献

5.

可配置的2D空域滤波操作VLSI架构研究

袁雅婧桑红石张天序《微电子学与计算机》2012,29(12)

提出了一种可配置的支持红外自动目标识别应用中不同窗口操作的2D空域滤波类操作VLSI架构,从SoC角度考虑能够更好地满足不同的图像处理应用．该架构与已报道的对于该类操作的其他结构解决方案进行比较,新结构具有较高的处理速率．新结构在SIMC0．18μmCMOS工艺下实现,其时钟频率为135Mhz,功耗为52mW,面积约为128．2KGates,峰值处理性能达到6．6GOPs．相似文献

6.

Reconfigurable Interpolation Architecture for Multistandard Video Decoding

Gwo Giun Lee Tzu-Chiang Tai Wei-Chiao Yang Chun-Fu Chen Chun-Hsi Huang 《Journal of Signal Processing Systems》2016,84(2):251-264

相似文献

7.

面向分组密码的可重构异构多核并行处理架构

下载免费PDF全文

冯晓李伟戴紫彬马超李功丽《电子学报》2017,45(6):1311-1320

现有的可重构分组密码实现结构中,专用指令处理器吞吐率不高,阵列结构资源利用率低、算法映射过程复杂.为此,设计了分组密码可重构异构多核并行处理架构RAMCA（Reconfigurable Asymmetrical Multi-Core Architecture）,分析了典型SP（AES-128）、Feistel（SMS4）、L-M（IDEA）及MISTY（KASUMI）结构算法在RAMCA上的映射过程.在65nm CMOS工艺下完成了逻辑综合和功能仿真.实验表明,RAMCA工作频率可达到1GHz,面积约为1.13mm²,消除工艺影响后,对各分组密码算法的运算速度均高于现有专用指令处理器以及Celator、RCPA和BCORE等阵列结构密码处理系统. 相似文献

8.

Reconfigurable Filter Coprocessor Architecture for DSP Applications 总被引：1，自引：0，他引：1

S. Ramanathan S.K. Nandy V. Visvanathan 《The Journal of VLSI Signal Processing》2000,26(3):333-359

Digital Signal Processing (DSP) is widely used in high-performance media processing and communication systems. In majority of these applications, critical DSP functions are realized as embedded cores to meet the low-power budget and high computational complexity. Usually these cores are ASICs that cannot be easily retargeted for other similar applications that share certain commonalities. This stretches the design cycle that affects time-to-market constraints. In this paper, we present a reconfigurable high-performance low-power filter coprocessor architecture for DSP applications. The coprocessor architecture, apart from having the performance and power advantage of its ASIC counterpart, can be reconfigured to support a wide variety of filtering computations. Since filtering computations abound in DSP applications, the implementation of this coprocessor architecture can serve as an important embedded hardware IP. 相似文献

9.

基于存储划分和路径重用的粗粒度可重构结构循环映射算法

张兴明袁开坚高彦钊《电子与信息学报》2018,40(6):1520-1524

目前针对粗粒度可重构结构循环映射的研究主要集中在操作布局和临时数据路由,缺乏考虑数据映射的研究,该文提出一种基于存储划分和路径重用的模调度映射流程。首先进行细粒度的存储划分找到合适的数据映射,提高数据存取的并行性,再用模调度寻找操作布局和临时数据路由,最后利用构建的路由开销模型平衡存储器路由和处理单元路由的使用,引入路径重用策略优化路由资源。实验结果表明,该方法在循环的启动间隔、每周期指令数和执行延迟等方面均具有良好的性能。相似文献

10.

面向对称密码领域的可重构阵列设计

朱敏刘雷波尹首一陈英杰魏少军《微电子学》2012,42(6)

通过研究密码系统的特点,提出一种面向对称密码领域的可重构阵列结构.该阵列普遍适用于分组密码和流密码系统,灵活性高.通过配置信息的更新,可以快速动态切换加密功能,切换时间小于20 ns.该结构包含几个16×16的比特阵列和8×8的字节阵列,AES算法实现分组密码的加密速率为640 Mb/s～2.56 Gb/s,DES算法为1.6 Gb/s～3.2 Gb/s,SMS4算法为318 Mb/s～1.6 Gb/s,流密码Geffe的加密速率为400 Mb/s.与文献[1]～[3]相比,SMS4算法的性能有接近2倍的提升. 相似文献

11.

A Security Architecture for Reconfigurable Networked Embedded Systems

Gianluca Dini Ida Maria Savino 《International Journal of Wireless Information Networks》2010,17(1-2):11-25

Nowadays, networked embedded systems (NESs) are required to be reconfigurable in order to be customizable to different operating environments and/or adaptable to changes in operating environment. However, reconfigurability acts against security as it introduces new sources of vulnerability. In this paper, we propose a security architecture that integrates, enriches and extends a component-based middleware layer with abstractions and mechanisms for secure reconfiguration and secure communication. The architecture provides a secure communication service that enforces application-specific fine-grained security policy. Furthermore, in order to support secure reconfiguration at the middleware level, the architecture provides a basic mechanism for authenticated downloading from a remote source. Finally, the architecture provides a rekeying service that performs key distribution and revocation. The architecture provides the services as a collection of middleware components that an application developer can instantiate according to the application requirements and constraints. The security architecture extends the middleware by exploiting the decoupling and encapsulation capabilities provided by components. It follows that the architecture results itself reconfigurable and can span heterogeneous devices. The security architecture has been implemented for different platforms including low-end, resource-poor ones such as Tmote Sky sensor devices. 相似文献

12.

Dynamic Context Compression for Low-Power Coarse-Grained Reconfigurable Architecture

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2010,18(1):15-28

Most of the coarse-grained reconfigurable architectures (CGRAs) are composed of reconfigurable ALU arrays and configuration cache (or context memory) to achieve high performance and flexibility. Specially, configuration cache is the main component in CGRA that provides distinct feature for dynamic reconfiguration in every cycle. However, frequent memory-read operations for dynamic reconfiguration cause much power consumption. Thus, reducing power in configuration cache has become critical for CGRA to be more competitive and reliable for its use in embedded systems. In this paper, we propose dynamically compressible context architecture for power saving in configuration cache. This power-efficient design of context architecture works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves up to 39.72% power in configuration cache with negligible area overhead (2.16%). 相似文献

13.

Low Power Reconfiguration Technique for Coarse-Grained Reconfigurable Architecture

Yoonjin Kim Mahapatra R.N. Ilhyun Park Kiyoung Choi 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(5):593-603

Coarse-grained reconfigurable architectures (CGRAs) require many processing elements (PEs) and a configuration memory unit (configuration cache) for reconfiguration of its PE array. Although this structure is meant for high performance and flexibility, it consumes significant power. Specially, power consumption by configuration cache is explicit overhead compared to other types of intellectual property (IP) cores. Reducing power is very crucial for CGRA to be more competitive and reliable processing core in embedded systems. In this paper, we propose a reusable context pipelining (RCP) architecture to reduce power-overhead caused by reconfiguration. It shows that the power reduction can be achieved by using the characteristics of loop pipelining, which is a multiple instruction stream, multiple data stream (MIMD)-style execution model. RCP efficiently reduces power consumption in configuration cache without performance degradation. Experimental results show that the proposed approach saves much power even with reduced configuration cache size. Power reduction ratio in the configuration cache and the entire architecture are up to 86.33% and 37.19%, respectively, compared to the base architecture. 相似文献

14.

A Compositional Framework for Hardware/Software Co-Design

A. Cau R. Hale J. Dimitrov H. Zedan B. Moszkowski M. Manjunathaiah M. Spivey 《Design Automation for Embedded Systems》2002,6(4):367-399

We describe a compositional framework, together with its supporting toolset, for hardware/software co-design. Our framework is an integration of a formal approach within a traditional design flow. The formal approach is based on Interval Temporal Logic and its executable subset, Tempura. Refinement is the key element in our framework because it will derivefrom a single formal specification of the system the software and hardware parts of the implementation, while preserving all properties of the system specification. During refinement simulation is used to choose the appropriate refinement rules, which are applied automatically in the HOL system. The framework is illustrated with two case studies. The work presented is part of a UK collaborative research project between the Software Technology Research Laboratory at the De Montfort University and the Oxford University Computing Laboratory. 相似文献

15.

一种新型可重构星载计算机体系架构研究

高骥高兰志崔艳鹏白洁《火控雷达技术》2014,(3)

为满足未来航天任务对星载计算机的高性能、高集成化的要求,本文提出了一种基于MPSOC的可重构星载计算机体系结构,由标准处理器、应用处理器、可重构模块、专用ASIC、I/O接口构成,采用片上多处理器系统能够满足卫星不同的应用模式和数据处理要求,从而提高计算机的并行性和处理性能。同时星载计算机采用一种软硬件协同设计方法,可提升系统性能和可靠性,并缩短研制周期。相似文献

16.

针对粗粒度可配置结构芯片的蚁群路由系统设计

宋立国姜玉宪《微电子学与计算机》2007,24(4):15-17

以最大-最小蚁群系统为基础,为蚁群采用增加了嗅觉分辨能力,应用于粗粒度可配置结构芯片的路由问题。以开发的粗粒度可重构芯片CTaiJi为对象,通过几个算例的比较,可以看到此方法找到最优解的能力优于目前常用的谈判阻塞算法。相似文献

17.

面向密码算法的异步可重构结构设计

熊华沈海斌季爱明潘雪增《微电子学与计算机》2005,22(3):170-173,177

针对FPGA和ASIC在实现密码算法时的不足之处，本文介绍了一种面向密码算法的异步可重构结构。该结构的运算功能由一个可重构单元阵列提供，数据通路由可重构单元之间的相互连接实现，异步通信采用握手信号完成。在分析握手信号传输延时对可重构结构的影响后，文章提出了一种适合该结构的单元信号传输握手控制电路。同时在单元结构中，使用改进的DSDCVS逻辑来设计其运算电路，减小了单元的面积，提高了单元的工作速度。应用实例表明，在实现密码算法时，面向密码算法的异步可重构结构表现出了比FPGA更好的性能。相似文献

18.

面向格基后量子密码算法的可重构多项式乘法架构

陈韬李慧琴李伟南龙梅杜怡然《电子与信息学报》2023,45(9):3380-3392

针对基于不同困难问题格基密码算法中的多项式乘法参数各异且实现架构不统一的现状,该文提出一种基于预处理型数论变换(PtNTT)算法的可重构架构。首先进行多项式乘法运算特征分析,综合了多项式参数(项数、模数及模多项式)对可重构架构的影响。其次,针对不同项数和模多项式设计了4×4串并行可转换型运算单元架构,可满足实现不同位宽基k-数论变换的可扩展设计。其中具体针对不同模数设计了可扩展实现16 bit模乘和32 bit乘法的可重构单元。在数据需求分析过程中,通过构建以系数地址生成、Bank划分以及实际与虚拟地址对应逻辑为主体的分配机制,设计了一种满足基k-数论变换的多Bank存储结构。实验结果表明,该文支持实现Kyber, Saber, Dilithium与NTRU等4种类型算法中的多项式乘法,与其余可重构架构相比,可采用统一架构实现4种算法中的多项式乘法。基于Xilinx Artix-7 FPGA 1.599 μs完成一组项数为256,模数为3329的多项式乘法运算,花费243个时钟。相似文献

19.

可重构结构设计空间快速搜索方法 总被引：1，自引：0，他引：1

季爱明沈海斌严晓浪《电子与信息学报》2006,28(9):1744-1747

在可重构结构评估模型的基础上,研究了在算法级估计可重构结构的面积、性能和功耗的方法。根据面积、性能和功耗,分两步搜索可重构结构的设计空间。首先,搜索结构域中每个结构实现所有算法时的最小代价,其次,在结构设计空间中搜索最优结构。该方法不依赖任何具体的架构,全面评价可重构结构的优劣,能快速获得全局最优的搜索结果。应用实例表明,在可重构结构设计初期,该方法能有效地指导可重构结构的设计。相似文献

20.

一种自适应图像插值算法及加速引擎的协同设计

严忻恺丁晟《电子与信息学报》2023,45(9):3284-3294

为提高高清彩色图像超分辨率重建效果,该文提出了一种基于边缘对比度的新型自适应图像插值算法。使用边缘对比度检测和不同尺度的感受野来自适应选择Lanczos插值的系数,自适应性和不同感受野可以进一步提升图像放大质量,图像质量相比于双线性插值平均峰值信噪比(PSNR)提高1.1 dB,结构相似度(SSIM)提高0.025,图像感知相似度(LPIPS)提高0.051,相比于双三次插值平均PSNR提高0.34 dB,SSIM提高0.01,LPIPS提高0.033。同时为减少硬件资源以及提高存储效率协同设计了一种高并行、高能效的加速插值引擎架构,通过两级数据重用和系数脉动机制极大提高计算访存比。加速引擎在16 nm工艺库的综合结果达到2 GHz时钟频率;在Xilinx Zynq Ultra scale+ xczu15eg FPGA上工作频率达到200 MHz,帧速度(fps)达到60的实时性能。相似文献