首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
FPGA-based configurable computing machines are evolving rapidly in large signal processing applications due to flexibility and high performance. In this paper, given a reconfigurable processing unit (RPU) with a logic capacity of ARPU and a computational task represented by a data flow graph G = (V, E, W), we propose a network flow-based multiway task partitioning algorithm to minimize communication costs for temporal partitioning. The proposed algorithm obtains an optimal solution with minimum interconnection under area constraints. The optimal solution is a cut set. In our approach, two techniques are applied. In the initial partition, any feasible min-cut is produced by the proposed network flow-based algorithm, so a set of feasible min-cuts is obtained. From the feasible solutions, the scheduling technique selects an optimal global solution.  相似文献   

2.
粗粒度可重构密码逻辑阵列智能映射算法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
针对粗粒度可重构密码逻辑阵列密码算法映射周期长且性能不高的问题,该文通过构建粗粒度可重构密码逻辑阵列参数化模型,以密码算法映射时间及实现性能为目标,结合本文构建的粗粒度可重构密码逻辑阵列结构特征,提出了一种算法数据流图划分算法.通过将密码算法数据流图中节点聚集成簇并以簇为最小映射粒度进行映射,降低算法映射复杂度;该文借鉴机器学习过程,构建了具备学习能力的智慧蚁群模型,提出了智慧蚁群优化算法,通过对训练样本的映射学习,持续优化初始化信息素浓度矩阵,提升算法映射收敛速度,以已知算法映射指导未知算法映射,实现密码算法映射的智能化.实验结果表明,本文提出的映射方法能够平均降低编译时间37.9%并实现密码算法映射性能最大,同时,以算法数据流图作为映射输入,自动化的生成密码算法映射流,提升了密码算法映射的直观性与便捷性.  相似文献   

3.
In this paper, we present an algorithm for partitioning sequential circuits. This algorithm is based on an analysis of a circuit's primary input cones and fanout values (PIFAN), and it uses a directed acyclic graph to represent the circuit. An invasive approach is employed, which creates logical and physical partitions by automatically inserting reconfigurable test cells and multiplexers. The test cells are used to control and observe multiple partitioning points, while the multiplexers expand the controllability and observability provided by the test cells. The feasibility and efficiency of our algorithm are evaluated by partitioning numerous standard digital circuits, including some large benchmark circuits containing up to 5597 gates. Our algorithm is based upon pseudoexhaustive testing methods where fault simulation is not required for test-pattern generation and grading; hence, engineering design time and cost are further reduced  相似文献   

4.
Reconfigurable single-chip emulation systems were proposed as an alternative to multichip emulation systems. Because they cannot be emulated on a single chip at once, large designs are sliced into partitions that are downloaded and executed sequentially on the same reconfigurable emulation chip. In this paper, we address the problem of partitioning a design on a reconfigurable single-chip emulator under resource constraints. First, we extract an acyclic flow graph of the design to be emulated. Then, we model the problem as an integer linear programming problem (IP) based on the acyclic flow graph of the design where the structure of the assignment and precedence constraints produce a tight formulation. To partition a design, our algorithm uses two distinct steps with different objectives. In the first step, we minimize the number of cycles needed to schedule every look-up table (LUT) in the circuit. Then flip-flops (FFs) are inserted into the appropriate cycles of the schedule in the second step. Experiments are conducted on small- and medium-size circuits from the MCNC Partitioning93 benchmark suite. The obtained results show that our algorithm produces optimal partitioning schedules  相似文献   

5.
The reconfiguration capability of modern FPGA devices can be utilized to execute an application by partitioning it into multiple segments such that each segment is executed one after the other on the device. This division of an application into multiple reconfigurable segments is called temporal partitioning. We present an automated temporal partitioning technique for acyclic behavior level task graphs. To be effective, any behavior-level partitioning method should ensure that each temporal partition meets the underlying resource constraints. For this, a knowledge of the implementation cost of each task on the hardware should be known. Since multiple implementations of a task that differ in area and delay are possible, we perform design-space exploration to choose the best implementation of a task from among the available implementations.To overcome the high reconfiguration overhead of the current day FPGA devices, we propose integration of the temporal partitioning and design space exploration methodology with block-processing. Block-processing is used to process multiple blocks of data on each temporal partition so as to amortize the reconfiguration time. We focus on applications that can be represented as task graphs that have to be executed many times over a large set of input data. We have integrated block-processing in the temporal partitioning framework so that it also influences the design point selection for each task. However, this does not exclude usage of our system for designs for which block-processing is not possible. For both block-processing and non block-processing designs our algorithm selects the best possible design point to minimize the execution time of the design.We present an ILP-based methodology for the integrated temporal partitioning, design space exploration and block-processing technique that is solved to optimality for small sized design problems and in an iterative constraint satisfaction approach for large sized design problems. We demonstrate with extensive experimental results for the Discrete Cosine Transform (DCT) and random graphs the validity of our approach.  相似文献   

6.
This brief introduces a partitioning algorithm, which facilitates pseudoexhaustive testing, to detect and locate faults in digital VLSI circuits. The algorithm is based on an analysis of circuit's primary input cones and fanout (PIFAN) values. An invasive approach is employed, which creates logical and physical partitions by automatically inserting reconfigurable test cells and multiplexers. The test cells are used to control and observe multiple partitioning points, while the multiplexers expand the controllability and observability provided by the test cells. The feasibility and efficiency of our algorithm are evaluated by partitioning numerous ISCAS 1985 and 1989 benchmark circuits containing up to 5597 gates. Our results show that the PIFAN algorithm offers significant reductions in overhead and test time when compared to previous partitioning algorithms  相似文献   

7.
用于二维RCA跨层数据传输的旁节点无冗余添加算法   总被引:1,自引:0,他引:1  
针对二维可重构单元阵列(RCA)硬件任务的跨层数据传输问题,提出了一种前序遍历回溯旁节点添加算法。该算法针对跨层输入树、跨层输出树2种类型的数据流图,保持了原有运算节点之间的逻辑关系,实现了旁节点的无冗余添加。给出了动态可重构系统划分映射的量化评估指标体系和流水化模型,给出了添加旁节点映射的临界条件。实验结果表明,基于相同的系统结构和划分映射算法,在满足临界条件的情况下,与不加旁节点映射算法相比,加旁节点映射在划分模块数,非原始输入输出次数、配置时间、总执行周期、功耗等方面均获得了较好的改进;与已有的先进算法相比,文中算法平均执行总周期降低了23.3%(RCA5×5)和30.5%(RCA8×8),平均消耗功耗降低了15.7%(RCA5×5)和18.6%(RCA8×8),从而验证了所提方法的合理性和有效性。  相似文献   

8.
张惠臻  谢维波  李蹊  洪欣 《电子学报》2015,43(2):299-304
在基于指令集动态可扩展技术的可重构指令集处理器研究中,如何有效使用系统的可重构资源,将很大程度上影响扩展得到的定制指令的功能实现,进而影响系统性能的优化效果.本文针对可重构资源的利用问题,首先设计了一种可重构资源模型,该模型弱化了可重构资源的功能和数量属性,主要提供其种类和位置属性,并能够以此计算资源使用的时间属性.基于此模型,本文将图论中的图着色问题进行扩展,引入多遍着色的思想,提出了一种针对粗粒度可重构资源的资源指派算法,该算法将可重构资源的指派等价为一个图多遍着色问题,通过模型提供的属性参数和限制条件完成指派过程.实验结果验证了算法的有效性,并揭示了资源使用中的规律性,对提高资源利用率和系统性能具有一定的指导意义.  相似文献   

9.
Hardware/software (HW/SW) partitioning and scheduling are the crucial steps during HW/SW co-design. It has been shown that they are classical combinatorial optimization problems. Due to the possible sequential or concurrent execution of the tasks, HW/SW partitioning and scheduling has become more difficult to solve optimally. In this paper more efficient heuristic algorithms are proposed for the HW/SW partitioning and scheduling. The proposed algorithm partitions a task graph by iteratively moving the task with highest benefit-to-area ratio in higher priority. The benefit-to-area ratio is updated in each iteration step to cater for the task concurrence. The proposed algorithm for task scheduling executes the task lying in hardware-only critical path in higher priority to enhance the task forecast. A large body of experimental results conclusively shows that the proposed heuristic algorithm for partitioning is superior to the latest efficient combinatorial algorithm (Tabu search) cited in this paper. Moreover, the Tabu search for partitioning has been further improved by utilizing the proposed heuristic solution as its initial solution. In addition, the proposed scheduling algorithm obtains the improvements over the most widely used approaches by up to 10% without large increase in running time. This work was presented in part at 2006 IEEE International Conference on Field Programmable Technology (ICFPT).  相似文献   

10.
针对可重构计算机系统配置次数(划分块数)的最小化问题,提出了一种融合面积估算和多目标优化的硬件任务划分算法。该算法每次划分均进行硬件资源面积的估算,并且通过充分考虑可重构资源的使用、一个数据流图所有划分块执行延迟总和、划分模块间边数等因素构造了新的探测函数prior_assigned(),该函数能够计算每个就绪节点的优先权值,新算法通过该值能动态调整就绪列表任务节点的调度次序。实验结果表明,与现有的层划分、簇划分、增强静态列表、多目标时域划分、簇层次敏感等5种划分算法相比,该算法能获得最少的模块数,并且随着可重构处理单元面积的增大,除层划分算法之外,其执行延迟的均值也是最小的。  相似文献   

11.
The paper proposes a novel heuristic technique for integrated hardware-software partitioning, hardware design space exploration and scheduling. The technique maps an application specified as a task graph on a heterogeneous architecture with an objective to minimize the latency of the task graph subject to the area constraint on the hardware coprocessor. The technique uses an iterative approach where the partitioner decides the processor mapping and HW design points of some tasks. The scheduler then simultaneously decides the processor mapping, HW design point and schedule time of the remaining tasks. There exists a tight coupling between the two design stages allowing them to produce superior quality designs in fewer iterations. The technique accounts for the time overheads due to inter-processor /intra-processor communication and shared memory access conflicts. It can therefore be used for both communication intensive and computation intensive applications. The technique also considers dynamic reconfiguration capability of the hardware coprocessor. The technique performs tradeoff analysis and maps hardware tasks to mutually exclusive temporal segments if this results in lower latency. The effectiveness of the technique is demonstrated by a case study of the JPEG image compression algorithm, comparison with an optimal ILP based approach and experimentation with synthetic graphs.  相似文献   

12.
One of the key problems in hardware/software codesign is hardware/software partitioning. This paper describes a new approach to hardware/software partitioning using integer programming (IP). The advantage of using IP is that optimal results are calculated for a chosen objective function. The partitioning approach works fully automatic and supports multi-processor systems, interfacing and hardware sharing. In contrast to other approaches where special estimators are used, we use compilation and synthesis tools for cost estimation. The increased time for calculating values for the cost metrics is compensated by an improved quality of the values. Therefore, fewer iteration steps for partitioning are needed. The paper presents an algorithm using integer programming for solving the hardware/software partitioning problem leading to promising results.  相似文献   

13.
陈乃金  江建慧  陈昕  周洲  徐莹 《电子学报》2012,40(5):1055-1066
 本文提出了一种改进的层划分算法.该算法充分考虑了划分块的最小执行延迟和尽可能充分利用可重构资源,能够跟踪层划分算法节点分配过程并进行调整,消除了经典层划分算法不能动态更新就绪节点列表选取节点进行划分的缺陷.实验结果表明,与层划分算法相比,所提出的改进层划分算法在模块数、执行延迟和跨模块间的I/O边数等三个方面均获得了改进.与现有的簇划分、增强静态列表、多目标时域划分、簇层次敏感等四种划分算法相比,新算法能获得最少的执行延迟,并且随着可重构处理单元面积的增大,模块数的均值也是最小的.  相似文献   

14.
For a multiprocessor System‐on‐Chip (MPSoC) to achieve high performance via parallelism, we must consider how to partition a given application into different components and map the components onto multiple processors. In this paper, we propose a software pipeline–based partitioning method with cyclic dependent task management and communication optimization. During task partitioning, simultaneously considering computation load balance and communication optimization can cause interference, which leads to performance loss. To address this issue, we formulate their constraints and apply an integer linear programming approach to find an optimal partitioning result — one that requires a trade‐off between these two factors. Experimental results on a reconfigurable MPSoC platform demonstrate the effectiveness of the proposed method, with 20% to 40% performance improvements compared to a traditional software pipeline–based partitioning method.  相似文献   

15.
The paper considers real time implementation of recurrent digital signal processing algorithms on an application-specific multiprocessor system. The objective is to devise a periodic, fully static task assignment for a DSP algorithm under the constraint of data sampling period by assuming interprocessor communication delay is negligible. Toward this goal, the authors propose a novel algorithm unfolding technique called the generalized perfect rate graph (GPRG). They prove that a recurrent algorithm will admit a fully static multiprocessor implementation for a given initiation interval if and only if the corresponding iterative computational dependence graph of this algorithm is a GPRG. Compared with previous results, GPRG often leads to a smaller unfolding factor αGPRG  相似文献   

16.
We present an efficient framework for dynamic reconfiguration of application-specific custom instructions. A key component of this framework is an iterative algorithm for temporal and spatial partitioning of the loop kernels. Our algorithm maximizes the performance gain of an application while taking into consideration the dynamic reconfiguration cost. It selects the appropriate custom instructions for the loops and clubs them into one or more configurations. We model the temporal partitioning problem as a k-way graph partitioning problem. A dynamic programming based solution is used for the spatial partitioning. Comprehensive experimental results indicate that our iterative partitioning algorithm is highly scalable while producing optimal or near-optimal (99% of the optimal) performance gain.  相似文献   

17.
This paper presents an efficient technique for extracting closed contours from range images' edge points. Edge points are assumed to be given as input to the algorithm (i.e., previously computed by an edge-based range image segmentation technique). The proposed approach consists of three steps. Initially, a partially connected graph is generated from those input points. Then, the minimum spanning tree of that graph is computed. Finally, a postprocessing technique generates a single path through the regions' boundaries by removing noisy links and closing open contours. The novelty of the proposed approach lies in the fact that, by representing edge points as nodes of a partially connected graph, it reduces the contour closure problem to a minimum spanning tree partitioning problem plus a cost function minimization stage to generate closed contours. Experimental results with synthetic and real range images, together with comparisons with a previous technique, are presented.  相似文献   

18.
Over the last few years, graph partitioning has been recognized as a suitable technique for optimizing cellular network structure. For example, in a recent paper, the authors proposed a classical graph partitioning algorithm to optimize the assignment of cells to Packet Control Units (PCUs) in GSM-EDGE Radio Access Network. Based on this approach, the quality of packet data services in a live environment was increased by reducing the number of cell re-selections between different PCUs. To learn more about the potential of graph partitioning in cellular networks, in this paper, a more sophisticated, yet computationally efficient, partitioning algorithm is proposed for the same problem. The new method combines multi-level refinement and adaptive multi-start techniques with algorithms to ensure the connectivity between cells under the same PCU. Performance assessment is based on an extensive set of graphs constructed with data taken from a live network. During the tests, the new method is compared with classical graph partitioning approaches. Results show that the proposed method outperforms classical approaches in terms of solution quality at the expense of a slight increase in computing time, while providing solutions that are easier to check by the network operator.  相似文献   

19.
Digital microfluidic technology is now being extensively used for implementing a lab-on-a-chip. Microfluidic biochips are often used for safety-critical applications, clinical diagnosis, and for genome analysis. Thus, devising effective and faster testing methodologies to warrant correct operations of these devices after manufacture and during bioassay operations, is very much needed. In this paper, we propose an Euler tour based technique to obtain the route plan of a test droplet for the purpose of structural testing of biochips. The method is applicable to various digital microfluidic biochip architectures, e.g., fully reconfigurable arrays, application specific biochips, pin-constrained irregular geometry biochips, and to defect-tolerant biochips. We show that in general, the optimal Eulerization and subsequent determination of an Euler tour in the graph model of a biochip can be abstracted in terms of the classical Chinese postman problem. The Euler tour can be identified by running the classical Hierholzer’s algorithm, which relies on a simple cycle decomposition and splicing method. This improved Eulerization technique leads to an efficient test plan for the chip. This can also be used in phase-based test planning that yields savings in testing time. The method provides a unified approach towards structural testing and can be easily adopted to design a droplet routing procedure for functional testing of digital microfluidic biochips.  相似文献   

20.
This paper deals with the problem of one-to-one mapping of 2n task modules of a parallel program to an n-dimensional hypercube multicomputer so as to minimize the total communication cost during the execution of the task. The problem of finding an optimal mapping has been proven to be NP-complete. First we show that the mapping problem in a hypercube multicomputer can be transformed into the problem of finding a set of maximum cutsets on a given task graph using a graph modification technique. Then we propose a repeated mapping scheme, using an existing graph bipartitioning algorithm, for the effective mapping of task modules onto the processors of a hypercube multicomputer. The repeated mapping scheme is shown to be highly effective on a number of test task graphs; it increasingly outperforms the greedy and recursive mapping algorithms as the number of processors increases. Our repeated mapping scheme is shown to be very effective for regular graphs, such as hypercube-isomorphic or ‘almost’ isomorphic graphs and meshes; it finds optimal mappings on almost all the regular task graphs considered.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号