共查询到20条相似文献,搜索用时 31 毫秒
1.
Exploiting Thread‐Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline
Jaegeun Oh Seok Joong Hwang Huong Giang Nguyen Areum Kim Seon Wook Kim Chulwoo Kim Jong‐Kook Kim 《ETRI Journal》2008,30(4):576-586
In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler‐hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write‐back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single‐instruction multiple‐data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32‐bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2‐way MLEP and 33.7% faster with a 4‐way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler. 相似文献
2.
Hutchings B.L. Nelson B.E. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(1):201-205
Field programmable gate array (FPGA)-based systems provide advantages over conventional hardware including: (1) availability of the hardware during design and debug; (2) programmability; and (3) visibility. These three advantages can greatly shorten the design and verification cycle. This paper discusses a design environment that exploits these three FPGA-specific advantages to create a unified simulation/execution debug environment implemented in the JHDL design system. The described system provides a hardware debugging environment with the functionality of a simulator but up to 10000× faster. In addition, testbenches and other typical verification software used in simulators can be used to verify running hardware 相似文献
3.
《Integration, the VLSI Journal》2007,40(3):285-304
A typical verification intellectual property (VIP) of a bus protocol such as ARM advanced micro-controller bus architecture (AMBA) or PCI consists of a set of assertions and associated verification aids such as test-benches, design-ware models and coverage metrics. While several languages have been formalized for specifying assertions (examples include Open-Vera Assertions, Sugar, ForSpec, System Verilog Assertions, etc.), it is widely accepted that the tasks of writing protocol-compliant models and test-benches that produce protocol compliant stimuli are also tasks of equal importance. In this paper, we present a platform for high-level specification of a bus protocol in a hierarchical manner and an automated methodology for generating a variety of verification aids that supplement the set of assertions in a VIP. We also show that the verification aids can be efficiently used to determine the completeness of the set of assertions in a simulation-based verification environment. 相似文献
4.
基于FPGA的验证是SoC功能验证的有效途径,建立一个基于FPGA的原型验证系统已成为SoC验证的重要方法.ARCA3是一种高性能、低功耗,国产的嵌入式微处理器.在ARCA3和AMBA架构上集成存储器控制器等IP核和外设,构建一个嵌入式SoC,并在FPGA上实现SoC的原型验证系统和软硬件协同验证环境.在FPGA原型机上运行Bootloader和操作系统,验证整个系统硬件的可操作性和软硬件之间的交互.基于FPGA的原型验证系统的实现可以快速验证基于ARCA3的各种抽象层次的IP核和开发基于ARCA3的软件应用. 相似文献
5.
6.
Kiran Kumar Anumandla Rangababu Peesapati Samrat L. Sabat Siba K. Udgata 《Design Automation for Embedded Systems》2012,16(4):221-240
This paper presents floating point design and implementation of System on Chip (SoC) based Differential Evolution (DE) algorithm using Xilinx Virtex-5 Field Programmable Gate Array (FPGA). The hardware implementation is carried out to enhance the execution speed of the embedded applications. Intellectual Property (IP) of DE algorithm is developed and interfaced with the 32-bit PowerPC 440 processor using processor local bus (PLB) of Xilinx Virtex-5 FPGA. In the proposed architecture the algorithmic parameters of DE are scalable. The software and hardware implementation of the DE algorithm is carried out in PowerPC embedded processor and hardware IP respectively. The optimization of numerical benchmark functions and system identification in control systems are implemented to verify the proposed hardware SoC platform. The performance of the IP is measured in terms of acceleration gain of the DE algorithm. The optimization problems are solved by using floating point arithmetic in both embedded processor and hardware. The experimental result concludes that the hardware DE IP accelerates the execution speed approximately by 200 times compared to equivalent software implementation of DE algorithm on PowerPC 440 processor. Further, as a case study an Infinite Impulse Response (IIR) based system identification task on SoC using the developed hardware accelerator is implemented. 相似文献
7.
8.
9.
利用SOPC Builder可以在短时间内把Nios Ⅱ CPU、Avalon总线、外围设备、片内调试模块等集成在一起生成系统需要的NiosⅡ处理器,然后用QuartusⅡ软件把NIOSⅡ处理器其它外部设备接口结合在一起编译下载到FPGA芯片中,即完成系统的硬件设计;软件设计通常采用C/C++语言编写并用NoisⅡIDE编译后下载到FPGA中来实现一个SOPC系统。 相似文献
10.
Sutherland Hunt A. Bose Bimal K. Somuah Clement B. 《Industrial Electronics, IEEE Transactions on》1983,(4):318-322
The application of a state language to the real-time control of a hybrid electric vehicle is explained. The state language has been developed both as a specification aid to the system designer and as a means for the programmer to produce microcomputer software. A translator program, which was developed on a VAX minicomputer, preprocesses the state language into a software module to be compiled by the standard Intel PL/M 86 compiler. 相似文献
11.
12.
Tomasz Patyk Perttu Salmela Teemu Pitkänen Pekka Jääskeläinen Jarmo Takala 《Journal of Signal Processing Systems》2011,65(2):245-259
Field programmable gate array (FPGA) is a flexible solution for offloading part of the computations from a processor. In particular,
it can be used to accelerate an execution of a computationally heavy part of the software application, e.g., in DSP, where
small kernels are repeated often. Since an application code for a processor is a software, a design methodology is needed
to convert the code into a hardware implementation, applicable to the FPGA. In this paper, we propose a design method, which
uses the Transport Triggered Architecture (TTA) processor template and the TTA-based Co-design Environment toolset to automate
the design process. With software as a starting point, we generate a RTL implementation of an application-specific TTA processor
together with the hardware/software interfaces required to offload computations from the system main processor. To exemplify
how the integration of the customized TTA with a new platform could look like, we describe a process of developing required
interfaces from a scratch. Finally, we present how to take advantage of the scalability of the TTA processor to target platform
and application-specific requirements. 相似文献
13.
基于SoC设计的软硬件协同验证方法学 总被引:3,自引:3,他引:0
文章介绍了软硬件协同验证方法学及其验证流程。在软件方面,采用了一套完整的软件编译调试仿真工具链,它包括处理器的仿真虚拟原型和基本的汇编、链接、调试器;在硬件方面,对软件调试好的应用程序进行RTL仿真、综合,并最终在SoC设计的硬件映像加速器(FPGA)上实现并验证。 相似文献
14.
Gschwind M. Salapura V. Maurer D. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(2):241-250
Application-specific processors offer an attractive option in the design of embedded systems by providing high performance for a specific application domain. In this work, we describe the use of a reconfigurable processor core based on an RISC architecture as starting point for application-specific processor design. By using a common base instruction set, development cost can be reduced and design space exploration is focused on the application-specific aspects of performance. An important aspect of deploying any new architecture is verification which usually requires lengthy software simulation of a design model. We show how hardware emulation based on programmable logic can be integrated into the hardware/software codesign flow. While previously hardware emulation required massive investment in design effort and special purpose emulators, an emulation approach based on high-density field-programmable gate array (FPGA) devices now makes hardware emulation practical and cost effective for embedded processor designs. To reduce development cost and avoid duplication of design effort, FPGA prototypes and ASIC implementations are derived from a common source: We show how to perform targeted optimizations to fully exploit the capabilities of the target technology while maintaining a common source base 相似文献
15.
16.
简单介绍了多DSP硬件模块的应用背景。主要介绍了基于美国德州仪器(TI)公司生产的TMS320 VC5416 DSP芯片实现的8 DSP硬件模块实现方法。该模块的结构主要包括多片DSP、FLASH程序加载、JTAG硬件仿真和FPGA等子模块。详细论述了多DSP与FPGA的连接、FLASH存储器与DSP、FPGA的连接,以及硬件仿真所用的JTAG菊花链。并且通过验证该硬件模块运行正确。 相似文献
17.
Xinming Huang Cao Liang Jing Ma 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(2):188-197
Multiple-input-multiple-output (MIMO) systems use multiple antennas in both transmitter and receiver ends for higher spectrum efficiency. The hardware implementation of MIMO detection becomes a challenging task as the computational complexity increases. This paper presents the architectures and implementations of two typical sphere decoding algorithms, including the Viterbo-Boutros (VB) algorithm and the Schnorr-Euchner (SE) algorithm. Hardware/software codesign technique is applied to partition the decoding algorithm on a single field-programmable gate array (FPGA) device. Three levels of parallelism are explored to improve the decoding rate: the concurrent execution of the channel matrix preprocessing on an embedded processor and the decoding functions on customized hardware modules, the parallel decoding of real/imaginary parts for complex constellation, and the concurrent execution of multiple steps during the closest lattice point search. The decoders for a 4times4 MIMO system with 16-QAM modulation are prototyped on a Xilinx XC2VP30 FPGA device with a MicroBlaze soft core processor. The hardware prototypes of the SE and VB algorithms show that they support up to 81.5 and 36.1 Mb/s data rates at 20 dB signal-to-noise ratio, which are about 22 and 97 times faster than their respective implementations in a digital signal processor. 相似文献
18.
Xilinx公司开发的Virtex-Ⅱ pro等FPGA结合可编程片上系统(SOPC)技术嵌入了PowerPC处理器硬核。本文结合Linux操作系统的优点及PowerPC嵌入式处理器硬核,在Virtex-Ⅱ Pro开发平台上,研究并实现了Linux操作系统在PowerPC405处理器中的移植,其中包括硬件平台的定制、交叉编译环境的建立、内核的配置及根文件系统的制作,最后通过具体的应用验证了系统的稳定性及可靠性。文中将处理器、操作系统与FPGA融合在一起完成既定的信号处理任务,既具有操作系统多任务、实时性等优点,又充分发挥了FPGA的优势,具有较好的应用前景。 相似文献
19.
Arora D. Ravi S. Raghunathan A. Jha N. K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(12):1295-1308
Embedded system security is often compromised when "trusted" software is subverted to result in unintended behavior, such as leakage of sensitive data or execution of malicious code. Several countermeasures have been proposed in the literature to counteract these intrusions. A common underlying theme in most of them is to define security policies at the system level in an application-independent manner and check for security violations either statically or at run time. In this paper, we present a methodology that addresses this issue from a different perspective. It defines correct execution as synonymous with the way the program was intended to run and employs a dedicated hardware monitor to detect and prevent unintended program behavior. Specifically, we extract properties of an embedded program through static program analysis and use them as the bases for enforcing permissible program behavior at run time. The processor architecture is augmented with a hardware monitor that observes the program's dynamic execution trace, checks whether it falls within the allowed program behavior, and flags any deviations from expected behavior to trigger appropriate response mechanisms. We present properties that capture permissible program behavior at different levels of granularity, namely inter-procedural control flow, intra-procedural control flow, and instruction-stream integrity. We outline a systematic methodology to design application-specific hardware monitors for any given embedded program. Hardware implementations using a commercial design flow, and cycle-accurate performance simulations indicate that the proposed technique can thwart several common software and physical attacks, facilitating secure program execution with minimal overheads 相似文献