首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《Microelectronics Journal》2014,45(2):217-225
Regular fabrics have been introduced as an approach to bridge the gap between ASICs and FPGAs in terms of cost and performance. Indeed, compared to an ASIC, by predefining most of the manufacturing masks, they highly reduce time-to-market, non-recoverable engineering costs and lithography hazards. Also, thanks to hardwired configuration and interconnections their performance is closer to those of ASICs than those of FPGAs. They are therefore well suited to many applications requiring low to medium volume applications or higher performance than those provided by FPGAs.In this paper, we evaluate the interest of using a regular fabric to reduce time and design cost significantly in applications involving specific transistor level design (radiative/spacial conditions, side-channel attacks, NMR environment, etc.). With this aim in view, after a broad state of the art overview with an emphasis on architectures and design flows, we develop our approach of a regular fabric designed to limit layout level design, ad-hoc tools and technological migration cost. Then, we evaluate its performance in a 65 nm process versus FPGA and standard cell based ASIC implementations. For sequential designs, our proposed solution is on average 2.5×slower and 2.3×bigger than a standard cell implantation, but also on average 13×faster than a FPGA.  相似文献   

2.
In H.264/AVC, a deblocking filter improves visual quality by reducing the presence of blocking artifacts in decoded video frames. The deblocking filter accounts for one third of the computational complexity of the decoder. This paper exploits the scalability on the hardware and the algorithmic level to synergize the performance and to reduce the computational complexity. First, we propose a modular deblocking filter architecture which can be scaled to adapt to the required computing capability for various bit-rates, resolutions, and frame rate of video sequences. The scalable architecture is based on FPGA using dynamic partial reconfiguration. This desirable feature of FPGAs makes it possible for different hardware configurations to be implemented during run-time. The proposed design can be scaled to filter up to four different edges simultaneously, resulting in significant reduction of total processing time. Secondly, our experiments show that significant reduction in computational complexity can be achieved by the increased presence of skipped macroblocks at lower bit-rates, thus, avoiding redundant filtering operations. The implemented architecture is evaluated using the Xilinx Virtex-4 ML410 FPGA board. The design operates at a maximum frequency of 103 MHz. The reconfiguration is done through Internal Configuration Access Port (ICAP) to achieve maximum performance needed by real time applications.  相似文献   

3.
Asynchronous serial transceivers have been recently used for data serializing in large on-chip systems to alleviate the routing congestion and improve the routability. FPGAs have considerable potential for using the asynchronous serial transmission but they have serious challenges to use this technology. In this paper, we present a new FPGA architecture corresponding with a new routing algorithm to use the asynchronous data serializing technique in modern FPGAs. Experimental results show that allocated routing tracks and routing congestion can be reduced considerably (18.81% and 48.73%, respectively) by using the asynchronous data serializing without any performance degradation in cost of reasonable overhead in area and power consumption. The resulting improvements will increase for larger and more complex FPGAs.  相似文献   

4.
5.
This paper presents a novel hardware-friendly motion estimation for real-time applications such as robotics or autonomous navigation. Our approach is based on the well-known Lucas & Kanade local algorithm, whose main problem is the unreliability of its estimations for large-range displacements. This disadvantage is solved in the literature by adding the sequential multiscale-with-warping extension, although it dramatically increases the computational cost. Our choice is the implementation of a multiresolution scheme that avoids the warping computation and allows the estimation of large-range motion. This alternative allows the parallel computation of the scale-by-scale motion estimation which makes the whole computation lighter and significantly reduces the processing time compared with the multiscale-with-warping approach. Furthermore, this last fact also means reducing the hardware resource cost for its potential implementation in digital hardware devices such as GPUs, ASICs, or FPGAs. In the discussion, we analyze the speedup of the multiresolution approach compared to the multiscale-with-warping scheme. For an FPGA implementation, we obtain a reduction of latency between 40% and 50% and a resource reduction of 30%. The final solution copes with large-range motion estimations with a simplified architecture very well-suited for customized digital hardware datapath implementations as well as current multicore architectures.  相似文献   

6.
Modern FPGAs have a great market share in hardware prototyping, massive parallel systems and reconfigurable architectures. Although the field-programmability of FPGAs is an effective feature in the growth and diversity of their applications; it has caused security concerns for IPs/Designs on FPGAs. Recent researches show that a reliable mechanism is required to protect the IPs/applications on FPGAs against malicious manipulations during all stages of design lifecycle, especially when they are operating in the field. In this paper, we propose a new tamper-resistant design methodology (Security Path methodology) and a revised security-aware FPGA architecture. This methodology protects the configured design against tampering attacks in parallel with the normal operation of the circuit. When the attack is discovered, the normal data flow is obfuscated and the circuit is blocked. Experimental results show that this methodology provides near full coverage in tampering detection with overhead of 12.32 % in power, 12 % in delay and 38 % in area.  相似文献   

7.
Non-volatile memory-based FPGAs (NV-FPGAs) are expected to replace traditional SRAM-based FPGAs to achieve higher scalability and lower power consumption. Yet the slow write performance of NVMs not only challenges FPGA reconfiguration speed and overhead but also constrains the programming cycles of FPGAs. To efficiently configure switch boxes, the majority component of an FPGA, this paper presents a routing path reuse technique. The reconfiguration cost of routing resources is first modeled mathematically and then minimized through a reuse-aware routing algorithm, which is incorporated into the standard VTR CAD tool. Experiments on standard MCNC and Titan benchmarks show that the proposed scheme is able to achieve as much as 58% path reuse rate and reduce as much as 45% configuration cost for routing resources.  相似文献   

8.
Previous studies show that interconnects occupy a large portion of the timing budget and area in FPGAs.In this work,we propose a time-multiplexing technique on FPGA interconnects.In order to fully exploit this interconnect architecture,we propose a time-multiplexed routing algorithm that can actively identify qualified nets and schedule them to multiplexable wires.We validate the algorithm by using the router to implement 20 benchmark circuits to time-multiplexed FPGAs.We achieve a 38%smaller minimum channel width and 3.8%smaller circuit critical path delay compared with the state-of-the-art architecture router when a wire can be time-multiplexed six times in a cycle.  相似文献   

9.
We herein propose a heuristic redundancy selection algorithm that combines resubmission, replication, and checkpointing redundancies to reduce the resiliency overhead in fault‐tolerant workflow scheduling. The appropriate combination of these redundancies for workflow tasks is obtained in two consecutive phases. First, to compute the replication vector (number of task replicas), we apportion the set of provisioned resources among concurrently executing tasks according to their needs. Subsequently, we obtain the optimal checkpointing interval for each task as a function of the number of replicas and characteristics of tasks and computational environment. We formulate the problem of obtaining the optimal checkpointing interval for replicated tasks in situations where checkpoint files can be exchanged among computational resources. The results of our simulation experiments, on both randomly generated workflow graphs and real‐world applications, demonstrated that both the proposed replication vector computation algorithm and the proposed checkpointing scheme reduced the resiliency overhead.  相似文献   

10.
Optimization of Pattern Matching Circuits for Regular Expression on FPGA   总被引:1,自引:0,他引:1  
Regular expressions are widely used in the network intrusion detection system (NIDS) to represent attack patterns. Previously, many hardware architectures have been proposed to accelerate regular expression matching using field-programmable gate array (FPGA) because FPGAs allow updating of new attack patterns. Because of the increasing number of attacks, we need to accommodate a large number of regular expressions on FPGAs. Although the minimization of logic equations has been studied intensively in the area of computer-aided design (CAD), the minimization of multiple regular expressions has been largely neglected. This paper presents a novel sharing architecture allowing our algorithm to extract and share common subregular expressions. Experimental results show that our sharing scheme significantly reduces the area of pattern matching circuits for regular expression.  相似文献   

11.
Memory and communication architecture have a significant impact on the performance, cost, and power of complex multiprocessor system-on-chip designs. In this paper, we present an automated bus matrix synthesis flow for efficient transaction-level design space exploration of communication architecture in a reconfigurable multimedia system-on-chip platform. Specifically, we consider hardware interface selection problem, which has significant effect on the overall cost of area and power. We propose a method to solve such hardware interface selection problem through static analysis of communication behavior. We experiment with JPEG encoder and H.264 encoder examples and the results show the reduction of area by 56.91% and power by 48.61% of bus matrix with 0.58% performance overhead on average compared to the case of maximum performance. According to our HW interface selection algorithm, we also experiment MPEG4 video decoder example. And the result is evaluated on the FPGA prototyping board.  相似文献   

12.
Field-programmable gate arrays (FPGAs) are an important implementation medium for digital logic. Unfortunately, they currently suffer from poor silicon area utilization due to routing constraints. In this paper we present Triptych, an FPGA architecture designed to achieve improved logic density with competitive performance. This is done by allowing a per-mapping tradeoff between logic and routing resources, and with a routing scheme designed to match the structure of typical circuits. We show that, using manual placement, this architecture yields a logic density improvement of up to a factor of 3.5 over commercial FPGAs, with comparable performance. We also describe Montage, the first FPGA architecture to fully support asynchronous and synchronous interface circuits  相似文献   

13.
In this paper, a fast Fourier transform (FFT) hardware architecture optimized for field-programmable gate-arrays (FPGAs) is proposed. We refer to this as the single-stream FPGA-optimized feedforward (SFF) architecture. By using a stage that trades adders for shift registers as compared with the single-path delay feedback (SDF) architecture the efficient implementation of short shift registers in Xilinx FPGAs can be exploited. Moreover, this stage can be combined with ordinary or optimized SDF stages such that adders are only traded for shift registers when beneficial. The resulting structures are well-suited for FPGA implementation, especially when efficient implementation of short shift registers is available. This holds for at least contemporary Xilinx FPGAs. The results show that the proposed architectures improve on the current state of the art.  相似文献   

14.
Field-programmable gate arrays (FPGAs) are becoming an increasingly important implementation medium for digital logic. One of the most important keys to using FPGAs effectively is a complete, automated software system for mapping onto the FPGA architecture. Unfortunately, many of the tools necessary require different techniques than traditional circuit implementation options, and these techniques are often developed specifically for only a single FPGA architecture. In this paper we describe automatic mapping tools for Triptych, an FPGA architecture with improved logic density and performance over commercial FPGAs. These tools include a simulated-annealing placement algorithm that handles the routability issues of fine-grained FPGAs, and an architecture-adaptive routing algorithm that can easily be retargeted to other FPGAs. We also describe extensions to these algorithms for mapping asynchronous circuits to Montage, the first FPGA architecture to completely support asynchronous and synchronous interface applications  相似文献   

15.
In this paper, we propose a compact threshold-based resampling algorithm and architecture for efficient hardware implementation of particle filters (PFs). By using a simple threshold-based scheme, this resampling algorithm can reduce the complexity of hardware implementation and power consumption. Simulation results indicate that this algorithm has approximately equal performance with the traditional systematic resampling (SR) algorithm when the root-mean-square error (RMSE) and lost track are considered. Experimental comparison of the proposed hardware architecture with those based on the SR and the residual systematic resampling (RSR) algorithms was conducted on a Xilinx Virtex-II Pro field programmable gate array (FPGA) platform in the bearings-only tracking context, and the results establish the superiority of the proposed architecture in terms of high memory efficiency, low power consumption, and low latency.  相似文献   

16.
Application of Reconfigurable CORDIC Architectures   总被引:1,自引:0,他引:1  
Reconfiguration enables the adaption of Coordinate Rotation DIgital Computer (CORDIC) units to the specific needs of sets of applications, hence creating application specific CORDIC-style implementations. Reconfiguration can be implemented at a high level, taking the entire CORDIC unit as a basic cell (CORDIC-cells) implemented in VLSI, or at a low level such as Field-Programmable Gate Arrays (FPGAs). We suggest a design methodology and analyze area/time results for coarse (VLSI) and fine-grain (FPGA) reconfigurable CORDIC units. For FPGAs we implement CORDIC units in Verilog HDL and our object-oriented design environment, PAM-Blox. For CORDIC-cells, multiple reconfigurable CORDIC modules are synthesized with state-of-the-art CAD tools. At the algorithm level we present a case study combining multiple CORDICs based on a geometrical interpretation of a normalized ladder algorithm for adaptive filtering to reduce latency and area of a fully pipelined CORDIC implementation. Ultimately, the goal is to create automatic tools to map applications directly to reconfigurable high-level arithmetic units such as CORDICs.  相似文献   

17.
Field Programmable Gate Arrays (FPGAs) offer high capability in implementing of com- plex systems, and currently are an attractive solution for space system electronics. However, FPGAs are susceptible to radiation induced Single-Event Upsets (SEUs). To insure reliable operation of FPGA based systems in a harsh radiation environment, various SEU mitigation techniques have been provided In this paper we propose a system based on dynamic partial reconfiguration capability of the modern devices to evaluate the SEU fault effect in FPGA. The proposed approach combines the fault injection controller with the host FPGA, and therefore the hardware complexity is minimized. All of the SEU injection and evaluation requirements are performed by a soft-core which realized inside the host FPGA Experimental results on some standard benchmark circuits reveal that the proposed system is able to speed up the fault injection campaign 50 times in compared to conventional method.  相似文献   

18.
A new FPGA architecture suitable for digital signal processing applications is presented.DSP modules can be inserted into FPGA conveniently with the proposed architecture,which is much faster when used in the field of digital signal processing compared with traditional FPGAs.An advanced 2-level MUX(multiplexer) is also proposed.With the added SLEEP MODE PASS to traditional 2-level MUX,static leakage is reduced.Furthermore, buffers are inserted at early returns of long lines.With this kind of buffer,the delay of the long line is improved by 9.8%while the area increases by 4.37%.The layout of this architecture has been taped out in standard 0.13μm CMOS technology successfully.The die size is 6.3×4.5 mm~2 with the QFP208 package.Test results show that performances of presented classical DSP cases are improved by 28.6%-302%compared with traditional FPGAs.  相似文献   

19.
Recently, a number of classification techniques have been introduced. However, processing large dataset in a reasonable time has become a major challenge. This made classification task more complex and expensive in calculation. Thus, the need for solutions to overcome these constraints such as field programmable gate arrays (FPGAs). In this paper, we give an overview of the various classification techniques. Then, we present the existing FPGA based implementation of these classification methods. After that, we investigate the confronted challenges and the optimizations strategies. Finally, we highlight the hardware accelerator architectures and tools for hardware design suggested to improve the FPGA implementation of classification methods.  相似文献   

20.
Dependability evaluation of embedded systems due to the integration of hardware and software parts is difficult to analyze. In this paper, we have proposed an experimental method to determine sensitivity to soft errors in an embedded system exploiting Altera SRAM-based FPGAs. The evaluation is performed using both the hardware and software parts of the embedded system in a single framework. To do this, the HDL hardware model of the target system as well as the C-written software codes of the target system, are required. Both permanent and transient faults are injected into the partially- or fully-synthesizable hardware of the target system and this can be performed during the design cycle of the system. The fault injection is composed of injecting SEUs into user design memory, and used configuration memory of the exploited FPGA. Using the experimental results, the sensitivity of Altera FPGAs to SEU faults are analyzed and derived. The analytical results reveal that the configuration memory is more significant than design memory to the SEUs due to the relative number of SRAM bits. Moreover, in this framework, in the case of injecting SEUs into user memory, the fault injection experiments are accelerated by the cooperation between a simulator and the FPGA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号