首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hardware/software (HW/SW) codesign and reconfigurable computing are commonly used methodologies for digital-systems design. However, no previous work has been carried out in order to define a HW/SW codesign methodology with dynamic scheduling for run-time reconfigurable architectures. In addition, all previous approaches to reconfigurable computing multicontext scheduling are based on static-scheduling techniques. In this paper, we present three main contributions: 1) a novel HW/SW codesign methodology with dynamic scheduling for discrete event systems using dynamically reconfigurable architectures; 2) a new dynamic approach to reconfigurable computing multicontext scheduling; and 3) a HW/SW partitioning algorithm for dynamically reconfigurable architectures. We have developed a whole codesign framework, where we have applied our methodology and algorithms to the case study of software acceleration. An exhaustive study has been carried out, and the obtained results demonstrate the benefits of our approach.  相似文献   

2.
Dynamically reconfigurable architectures are emerging as a viable design alternative to implement a wide range of computationally intensive applications. At the same time, an urgent necessity has arisen for support tool development to automate the design process and achieve optimal exploitation of the architectural features of the system. Task scheduling and context (configuration) management become very critical issues in achieving the high performance that digital signal processing (DSP) and multimedia applications demand. This article proposes a strategy to automate the design process which considers all possible optimizations that can be carried out at compilation time, regarding context and data transfers. This strategy is general in nature and could be applied to different reconfigurable systems. We also discuss the key aspects of the scheduling problem in a reconfigurable architecture such as MorphoSys. In particular, we focus on a task scheduling methodology for DSP and multimedia applications, as well as the context management and scheduling optimizations  相似文献   

3.
Domain specific coarse-grained reconfigurable architectures (CGRAs) have great promise for energy-efficient flexible designs for a suite of applications. Designing such a reconfigurable device for an application domain is very challenging because the needs of different applications must be carefully balanced to achieve the targeted design goals. It requires the evaluation of many potential architectural options to select an optimal solution. Exploring the design space manually would be very time consuming and may not even be feasible for very large designs. Even mapping one algorithm onto a customized architecture can require time ranging from minutes to hours. Running a full power simulation on a complete suite of benchmarks for various architectural options require several days. Finding the optimal point in a design space could require a very long time. We have designed a framework/tool that made such design space exploration (DSE) feasible. The resulting framework allows testing a family of algorithms and architectural options in minutes rather than days and can allow rapid selection of architectural choices. In this paper, we describe our DSE framework for domain specific reconfigurable computing where the needs of the application domain drive the construction of the device architecture. The framework has been developed to automate design space case studies, allowing application developers to explore architectural tradeoffs efficiently and reach solutions quickly. We selected some of the core signal processing benchmarks from the MediaBench benchmark suite and some edge-detection benchmarks from the image processing domain for our case studies. We describe two search algorithms: a stepped search algorithm motivated by our manual design studies and a more traditional gradient based optimization. Approximate energy models are developed in each case to guide the search toward a minimal energy solution. We validate our search results by comparing the architectural solutions selected by our tool to an architecture optimized manually and by performing sensitivity tests to evaluate the ability of our algorithms to find good quality minima in the design space. All selected fabric architectures were synthesized on 130 nm cell-based ASIC fabrication process from IBM. These architectures consume almost same amount of energy on average, but the gradient based approach is more general and promises to extend well to new problem domains. We expect these or similar heuristics and the overall design flow of the system to be useful for a wide range of architectures, including mesh based and other commonly used architectures for CGRAs.  相似文献   

4.
This paper aims to introduce a time partitioning algorithm which is an important step during the design process for fully reconfigurable systems. This algorithm is used to solve the time partitioning problem. It divides the input task graph model to an optimal number of partitions and puts each task in the appropriate partition so that the latency of the input task graph is optimal. Also a part of this paper is consecrated for implementation of some examples on a fully reconfigurable architecture following our approach.  相似文献   

5.
This paper reports on an innovative approach for solving satisfiability problems for propositional formulas in conjunctive normal form (SAT) by creating a logic circuit that is specialized to solve each problem instance on field programmable gate arrays (FPGAs). This approach has become feasible due to recent advances in reconfigurable computing and has opened up an exciting new research field in algorithm design. SAT is an important subclass of constraint satisfaction problems, which can formalize a wide range of application problems. We have developed a series of algorithms that are suitable for a logic circuit implementation, including an algorithm whose performance is equivalent to the Davis-Putnam procedure with powerful dynamic variable ordering. Simulation results show that this method can solve a hard random 3-SAT problem with 400 variables within 1.6 min at a clock rate of 10 MHz. Faster speeds can be obtained by increasing the clock rate. Furthermore, we have actually implemented a 128-variable 256-clause problem instance on FPGAs  相似文献   

6.
In this paper, we propose a configuration-aware data-partitioning approach for reconfigurable computing. We show how the reconfiguration overhead impacts the data-partitioning process. Moreover, we explore the system-level power-performance tradeoffs available when implementing streaming embedded applications on fine-grained reconfigurable architectures. For a certain group of streaming applications, we show that an efficient hardware/software partitioning algorithm is required when targeting low power. However, if the application objective is performance, then we propose the use of dynamically reconfigurable architectures. We propose a design methodology that adapts the architecture and algorithms to the application requirements. The methodology has been proven to work on a real research platform based on Xilinx devices. Finally, we have applied our methodology and algorithms to the case study of image sharpening, which is required nowadays in digital cameras and mobile phones.  相似文献   

7.
Multi-input multi-output systems with associated technologies such as smart antennas and adaptive coding and modulation techniques enhance channel capacity, diversity, and robustness of wireless communications as has been proven by many recent research results both in theory and experiments. This article focuses on the antenna aspect of MIMO systems. In particular, we emphasize the important role of the reconfigurable antenna and its links with space-time coding techniques that can be employed for further exploitation of the theoretical performance of MIMO wireless systems. The advantages of the reconfigurable antenna compared to the traditional smart antenna are discussed. Establishment of reconfigurable antennas requires novel radio frequency microelectromechanical systems technology, which has recently been developed in the authors' group. We briefly introduce this technology with emphasis on its distinct advantages over existing silicon-based MEMS technologies for reconfigurable antennas. A reconfigurable antenna design that can change its operating frequency and radiation/polarization characteristics is described. Finally, we present the experimental and theoretical results from impedance and radiation performance characterization for different antenna configurations.  相似文献   

8.
Most of the coarse-grained reconfigurable architectures (CGRAs) are composed of reconfigurable ALU arrays and configuration cache (or context memory) to achieve high performance and flexibility. Specially, configuration cache is the main component in CGRA that provides distinct feature for dynamic reconfiguration in every cycle. However, frequent memory-read operations for dynamic reconfiguration cause much power consumption. Thus, reducing power in configuration cache has become critical for CGRA to be more competitive and reliable for its use in embedded systems. In this paper, we propose dynamically compressible context architecture for power saving in configuration cache. This power-efficient design of context architecture works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves up to 39.72% power in configuration cache with negligible area overhead (2.16%).   相似文献   

9.
张惠臻  谢维波  李蹊  洪欣 《电子学报》2015,43(2):299-304
在基于指令集动态可扩展技术的可重构指令集处理器研究中,如何有效使用系统的可重构资源,将很大程度上影响扩展得到的定制指令的功能实现,进而影响系统性能的优化效果.本文针对可重构资源的利用问题,首先设计了一种可重构资源模型,该模型弱化了可重构资源的功能和数量属性,主要提供其种类和位置属性,并能够以此计算资源使用的时间属性.基于此模型,本文将图论中的图着色问题进行扩展,引入多遍着色的思想,提出了一种针对粗粒度可重构资源的资源指派算法,该算法将可重构资源的指派等价为一个图多遍着色问题,通过模型提供的属性参数和限制条件完成指派过程.实验结果验证了算法的有效性,并揭示了资源使用中的规律性,对提高资源利用率和系统性能具有一定的指导意义.  相似文献   

10.
郭力  曹超 《信息技术》2011,(5):68-72
提出了一种可以利用计算时间覆盖配置时间和数据传输时间的可重构阵列结构,并且针对该可重构阵列结构提出了一种表调度算法进行任务调度.在SOCDesigner平台上进行了软硬件协同仿真,对于IDCT,FFT,4×4矩阵乘法新可重构阵列相比原来的可重构阵列有平均约10%的速度提升.  相似文献   

11.
The advances in the programmable hardware has lead to new architectures where the hardware can be dynamically adapted to the application to gain better performance. There are still many challenging problems to be solved before any practical general-purpose reconfigurable system is built. One fundamental problem is the placement of the modules on the reconfigurable functional unit (RFU). In reconfigurable systems, we are interested both in online placement, where arrival time of tasks is determined at runtime and is not known a priori, and offline in which the schedule is known at compile time. In the case of offline placement, we are willing to spend more time during compile time to find a compact floorplan for the RFU modules and utilize the RFU area more efficiently. In this paper we present offline placement algorithms based on simulated annealing and greedy methods and show the superiority of their placements over the ones generated by an online algorithm.  相似文献   

12.
With continuing improvements in spatial resolution of positron emission tomography (PET) scanners, small patient movements during PET imaging become a significant source of resolution degradation. This work develops and investigates a comprehensive formalism for accurate motion-compensated reconstruction which at the same time is very feasible in the context of high-resolution PET. In particular, this paper proposes an effective method to incorporate presence of scattered and random coincidences in the context of motion (which is similarly applicable to various other motion correction schemes). The overall reconstruction framework takes into consideration missing projection data which are not detected due to motion, and additionally, incorporates information from all detected events, including those which fall outside the field-of-view following motion correction. The proposed approach has been extensively validated using phantom experiments as well as realistic simulations of a new mathematical brain phantom developed in this work, and the results for a dynamic patient study are also presented.   相似文献   

13.
This paper presents a perfect dynamic optically reconfigurable gate array (DORGA) architecture emulation using a holographic memory and a conventional ORGA-VLSI. In ORGAs, although a large virtual gate count can be realized by exploiting the large-capacity storage capability of a holographic memory, the actual gate count, which is the gate count of a programmable gate array VLSI, is important to increase the instantaneous performance. Nevertheless, in previously proposed ORGA-VLSIs, the static configuration memory to store a single configuration context consumed a large implementation area of the ORGA-VLSIs and prevented the realization of large-gate-count ORGA-VLSIs. Therefore, a DORGA architecture has been proposed in order to increase the gate density. It uses the junction capacitance of photodiodes as dynamic memory, thereby obviating the static configuration memory. However, to date, demonstration of a perfect optically reconfigurable architecture for DORGA-VLSIs has never been presented. Therefore, in this study, the DORGA architecture was perfectly emulated, and the performance, particularly the reconfiguration context retention time, was measured experimentally. The advantages of this architecture are discussed in relation to the results.  相似文献   

14.
Partial dynamic reconfiguration, often called run-time reconfiguration (RTR), is a key feature in modern reconfigurable platforms. In this paper, we present parallelism granularity selection (PARLGRAN), an application mapping approach that maximizes performance of application task chains on architectures with such capability. PARLGRAN essentially selects a suitable granularity of data-parallelism for individual data parallel tasks while considering key issues such as significant reconfiguration overhead and placement constraints. It integrates granularity selection very effectively in a joint scheduling and placement formulation, necessary due to constraints imposed by partial RTR. As a key step to validating PARLGRAN, we additionally present an exact strategy (integer linear programming formulation). We demonstrate that PARLGRAN generates high-quality schedules with: (1) a set of small test cases where we compare our results with the exact strategy; (2) a very large set of synthetic experiments with over a thousand data-points where we compare it with a simpler strategy that tries to statically maximize data-parallelism, i.e., only considers resource availability; and (3) a detailed application case study of JPEG encoding. The application case-study confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism. Last, but very important, we demonstrate that our approach is well-suited for true on-demand computing with detailed execution time estimates on a typical embedded processor. Heuristic execution time is comparable to task execution time, i.e., it is feasible to integrate PARLGRAN in a run-time scheduler for dynamically reconfigurable architectures.  相似文献   

15.
In this paper, we present an algorithm for partitioning sequential circuits. This algorithm is based on an analysis of a circuit's primary input cones and fanout values (PIFAN), and it uses a directed acyclic graph to represent the circuit. An invasive approach is employed, which creates logical and physical partitions by automatically inserting reconfigurable test cells and multiplexers. The test cells are used to control and observe multiple partitioning points, while the multiplexers expand the controllability and observability provided by the test cells. The feasibility and efficiency of our algorithm are evaluated by partitioning numerous standard digital circuits, including some large benchmark circuits containing up to 5597 gates. Our algorithm is based upon pseudoexhaustive testing methods where fault simulation is not required for test-pattern generation and grading; hence, engineering design time and cost are further reduced  相似文献   

16.
杜怡然  李伟  戴紫彬 《电子学报》2020,48(4):781-789
针对密码算法的高效能实现问题,该文提出了一种基于数据流的粗粒度可重构密码逻辑阵列结构PVHArray.通过研究密码算法运算及控制结构特征,基于可重构阵列结构设计方法,提出了以流水可伸缩的粗粒度可重构运算单元、层次化互连网络和面向周期级的分布式控制网络为主体的粗粒度可重构密码逻辑阵列结构及其参数化模型.为了提升可重构密码逻辑阵列的算法实现效能,该文结合密码算法映射结果,确定模型参数,构建了规模为4×4的高效能PVHArray结构.基于55nm CMOS工艺进行流片验证,芯片面积为12.25mm2,同时,针对该阵列芯片进行密码算法映射.实验结果表明,该文提出高效能PVHArray结构能够有效支持分组、序列以及杂凑密码算法的映射,在密文分组链接(CBC)模式下,相较于可重构密码逻辑阵列REMUS_LPP结构,其单位面积性能提升了约12.9%,单位功耗性能提升了约13.9%.  相似文献   

17.
粗粒度可重构密码逻辑阵列智能映射算法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
针对粗粒度可重构密码逻辑阵列密码算法映射周期长且性能不高的问题,该文通过构建粗粒度可重构密码逻辑阵列参数化模型,以密码算法映射时间及实现性能为目标,结合本文构建的粗粒度可重构密码逻辑阵列结构特征,提出了一种算法数据流图划分算法.通过将密码算法数据流图中节点聚集成簇并以簇为最小映射粒度进行映射,降低算法映射复杂度;该文借鉴机器学习过程,构建了具备学习能力的智慧蚁群模型,提出了智慧蚁群优化算法,通过对训练样本的映射学习,持续优化初始化信息素浓度矩阵,提升算法映射收敛速度,以已知算法映射指导未知算法映射,实现密码算法映射的智能化.实验结果表明,本文提出的映射方法能够平均降低编译时间37.9%并实现密码算法映射性能最大,同时,以算法数据流图作为映射输入,自动化的生成密码算法映射流,提升了密码算法映射的直观性与便捷性.  相似文献   

18.
Dynamically reconfigurable system-on-a-chip (RSoC) technology features embedded microprocessors that are dispersed on the same die with significant amounts of programmable logic fabric. In this paper, we present a strategy to solve the recently emerging problem of how to utilize the flexible but still limited RSoC resources in an effective manner for a multi-task application. The major contribution of this paper is the development of a dynamic task scheduling algorithm that can be implemented in fixed or reconfigurable hardware that will perform the online scheduling of task systems onto the RSoC type architecture. The results from extensive simulations demonstrate the benefits of the proposed dynamic scheduling approach as compared with that of other static scheduling techniques taken from the technical literature.   相似文献   

19.
Embedded digital signal processing (DSP) systems are usually associated with real time constraints and/or high data rates such that fully software implementations are often not satisfactory. In that case, mixed hardware/software implementations are to be investigated. This paper presents the design of a HW/SW G.729 voice decoder dedicated to embedded systems. The decoder has been built around, on the one hand a reconfigurable digital circuit (FPGA) to achieve the so called IP hardware part—the autocorrelation computation—using a linear systolic array, and on the other hand a digital signal processor (DSP) for the remainder of the algorithm. Apart such an implementation is typically driven by the use of reusable component (IP) it is of great interest for new G729-based applications such as Voice over IP (VoIP) for example. It results in an overall reduction of the execution time per frame. Another interesting point is the design of a parameterizable autocorrelation block which can be useful for a wide range of applications such as GSM 13 Kbit/s, APC 9.6 Kbit/s and G723 6.3 Kbit/s and 5.3 Kbit/s. In the G729 context and using a V50 Virtex FPGA, the execution time of this function is 10 times faster than a TMS320C6201 DSP implementation.  相似文献   

20.
Reversible logic has gained interest of researchers worldwide for its ultra-low power and high speed computing abilities in the future quantum information processing. Testing of these circuits is important for ensuring high reliability of their operation. In this work, we propose an ATPG algorithm for reversible circuits using an exact approach to generate CTS (Complete Test Set) which can detect single stuck-at faults, multiple stuck-at faults, repeated gate fault, partial and complete missing gate faults which are very useful logical fault models for reversible logic to model any physical defect. Proposed algorithm can be used to test a reversible circuit designed with k-CNOT, Peres and Fredkin gates. Through extensive experiments, we have validated our proposed algorithm for several benchmark circuits and other circuits with family of reversible gates. This algorithm produces a minimal and complete test set while reducing test generation time as compared to existing state-of-the-art algorithms. A testing tool is developed satisfying the purpose of generating all possible CTS’s indicating the simulation time, number of levels and gates in the circuit. This paper also contributes to the detection and removal of redundant faults for optimal test set generation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号