期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient design space exploration for application specific systems-on-a-chip

《Journal of Systems Architecture》2007,53(10):733-750

A reduction in the time-to-market has led to widespread use of pre-designed parametric architectural solutions known as system-on-a-chip (SoC) platforms. A system designer has to configure the platform in such a way as to optimize it for the execution of a specific application. Very frequently, however, the space of possible configurations that can be mapped onto a SoC platform is huge and the computational effort needed to evaluate a single system configuration can be very costly. In this paper we propose an approach which tackles the problem of design space exploration (DSE) in both of the fronts of the reduction of the number of system configurations to be simulated and the reduction of the time required to evaluate (i.e., simulate) a system configuration. More precisely, we propose the use of Multi-objective Evolutionary Algorithms as optimization technique and Fuzzy Systems for the estimation of the performance indexes to be optimized. The proposed approach is applied on a highly parameterized SoC platform based on a parameterized VLIW processor and a parameterized memory hierarchy for the optimization of performance and power dissipation. The approach is evaluated in terms of both accuracy and efficiency and compared with several established DSE approaches. The results obtained for a set of multimedia applications show an improvement in both accuracy and exploration time. 相似文献

2.

Hybrid functional- and instruction-level power modeling for embedded and heterogeneous processor architectures

《Journal of Systems Architecture》2007,53(10):689-702

In this contribution the concept of functional- level power analysis (FLPA) for power estimation of programmable processors is extended in order to model embedded as well as heterogeneous processor architectures featuring different embedded processor cores. The basic FLPA approach is based on the separation of the processor architecture into functional blocks like, e.g. processing unit, clock network, internal memory, etc. The power consumption of these blocks is described by parameterized arithmetic models. By application of a parser based automated analysis of assembler codes the input parameters of the arithmetic functions like e.g. the achieved degree of parallelism or the kind and number of memory accesses can be computed. For modeling an embedded general purpose processor (here, an ARM940T) the basic FLPA modeling concept had to be extended to a so-called hybrid functional-level and instruction-level (FLPA/ILPA) model in order to achieve a good modeling accuracy. In order to show the applicability of this approach even a heterogeneous processor architecture (OMAP5912) featuring an ARM926EJ-S core and a C55x DSP core has been modeled using the hybrid FLPA/ILPA technique described before. The approach is exemplarily demonstrated and evaluated applying a variety of basic digital signal processing tasks ranging from basic filters to complete audio decoders or classical benchmark suits. Estimated power figures for the inspected tasks are compared to physically measured values for both inspected processor architectures. A resulting maximum estimation error of 9% for the ARM940T and less than 4% for the OMAP5912 is achieved. 相似文献

3.

一种基于体系结构模板的粗粒度可重构SoC设计方法

沈剑良李思昆王观武吕平刘磊刘勤让《计算机工程与科学》2016,38(6):1071-1077

针对传统的面向应用领域的多核SoC体系结构设计方法存在系统结构探索空间大、设计复杂度高等问题,提出了一种基于体系结构模板的粗粒度可重构SoC系统架构设计方法。该设计方法以体系结构设计为中心,体系结构模板可重用、参数可配置,从而缩小了体系结构设计探索空间,提高了体系结构设计效率,降低了应用程序编译器开发复杂性。最后,以密码处理领域为例,将模板参数实例化,构建了一个面向密码处理领域的多核可重构指令集处理器SoC系统(Multi-RISP SoC)。实验结果表明,MultiRISP SoC系统与几个典型可重构平台在性能上相当,但系统构建更为快速高效。相似文献

4.

The QC-2 parallel Queue processor architecture

Ben A. Abderazek Arquimedes CanedoAuthor VitaeTsutomu YoshinagaAuthor Vitae Masahiro SowaAuthor Vitae 《Journal of Parallel and Distributed Computing》2008

Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2)—an improved and optimized version of the produced order parallel Queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor. 相似文献

5.

An FPGA-based architecture for embedded systems performance acceleration applied to Optimum-Path Forest classifier

《Microprocessors and Microsystems》2017

Classification techniques development constitutes a foundation for machine learning evolution, which has become a major part of the current mainstream of Artificial Intelligence research lines. However, the computational cost associated with these techniques limits their use in resource constrained embedded platforms. As the classification task is often combined with other high computational cost functions, efficient performance of the main modules is fundamental requirements to achieve hard real-time speed for the whole system. Graph-based machine learning techniques offer a powerful framework for building classifiers. Optimum-Path Forest (OPF) is a graph-based classifier presenting the interesting ability to provide nonlinear classes separation surfaces. This work proposes a SoC/FPGA based design and implementation of an architecture for embedded applications, presenting a hardware converted algorithm for an OPF classifier. Comparison of the achieved results with an embedded processor software implementation shows accelerations of the OPF classification from 2.18 to 9 times, which permits to expect real-time performance to embedded applications. 相似文献

6.

Automated architecture synthesis for parallel programs on FPGA multiprocessor systems

Harold Ishebabi Christophe Bobda 《Microprocessors and Microsystems》2009,33(1):63-71

This paper presents a concept for automated architecture synthesis for adaptive multiprocessors on chip, in particular for Field-Programmable Gate-Array (FPGA) devices. Given a parallel program, the intent is to simultaneously allocate processor resources and the corresponding communication network, and at the same time, to map the parallel application to get an optimum application-specific architecture. This approach builds up on a previously proposed design platform that automates system integration and FPGA synthesis for such architectures. As a result, the overall concept offers an automated design approach from application mapping to system and FPGA configuration. The automated synthesis is based on combinatorial optimization. Automation is possible because a solvable Integer Linear Programming (ILP) model that captures all necessary design trade-off parameters of such systems has been found. Experimental results to study the feasibility of the automated synthesis indicate that problems with sizes that can be encountered in the embedded domain can be readily solved. Results obtained underscore the need for an automated synthesis for design space exploration. 相似文献

7.

Hardware/software co-design for particle swarm optimization algorithm 总被引：1，自引：0，他引：1

Shih-An Li Ching-Chang Wong 《Information Sciences》2011,181(20):4582-4596

This paper presents a hardware/software (HW/SW) co-design approach using SOPC technique and pipeline design method to improve design flexibility and execution performance of particle swarm optimization (PSO) for embedded applications. Based on modular design architecture, a Particle Updating Accelerator module via hardware implementation for updating velocity and position of particles and a Fitness Evaluation module implemented either on a soft-cored processor or Field Programmable Gate Array (FPGA) for evaluating the objective functions are respectively designed to work closely together to carry out the evolution process at different design stages. Thanks to the design flexibility, the proposed approach can tackle various optimization problems of embedded applications without the need for hardware redesign. To further improve the execution performance of the PSO, a hardware random number generator (RNG) is also designed in this paper in addition to a particle re-initialization scheme to promote exploration search during the optimization process. Experimental results have demonstrated that the proposed HW/SW co-design approach for PSO algorithms has good efficiency for obtaining high-quality solutions for embedded applications. 相似文献

8.

An architecture framework for an adaptive extensible processor

Hamid Noori Farhad Mehdipour Kazuaki Murakami Koji Inoue Morteza Saheb Zamani 《The Journal of supercomputing》2008,45(3):313-340

To improve the performance of embedded processors, an effective technique is collapsing critical computation subgraphs as application-specific instruction set extensions and executing them on custom functional units. The problem with this approach is the immense cost and the long times required to design a new processor for each application. As a solution to this issue, we propose an adaptive extensible processor in which custom instructions (CIs) are generated and added after chip-fabrication. To support this feature, custom functional units are replaced by a reconfigurable matrix of functional units (FUs). A systematic quantitative approach is used for determining the appropriate structure of the reconfigurable functional unit (RFU). We also introduce an integrated framework for generating mappable CIs on the RFU. Using this architecture, performance is improved by up to 1.33, with an average improvement of 1.16, compared to a 4-issue in-order RISC processor. By partitioning the configuration memory, detecting similar/subset CIs and merging small CIs, the size of the configuration memory is reduced by 40%. 相似文献

9.

用遗传算法优化测试通路结构设计 总被引：1，自引：0，他引：1

王英翔黄维康《计算机辅助设计与图形学学报》2004,16(3):348-354

嵌入核测试通路问题是片上系统设计中的重要问题，由于嵌入核与芯片的输入/输出管脚没有直接通路，因此需要设计专门的测试通路结构对它们进行测试，以减少测试时间，降低测试成本，提出一种基于遗传算法的优化算法来设计测试通路结构，并选取了两个假定的、比较复杂的片上系统作为例子，实验结果表明，文中算法搜索到全局最优解(或近似全局最优解)的能力优于现有的整数线性规划方法。相似文献

10.

Performance evaluation of efficient multi-objective evolutionary algorithms for design space exploration of embedded computer systems

Giuseppe Ascia Vincenzo Catania Alessandro G. Di Nuovo Maurizio Palesi Davide Patti 《Applied Soft Computing》2011,11(1):382-398

Multi-objective evolutionary algorithms (MOEAs) have received increasing interest in industry because they have proved to be powerful optimizers. Despite the great success achieved, however, MOEAs have also encountered many challenges in real-world applications. One of the main difficulties in applying MOEAs is the large number of fitness evaluations (objective calculations) that are often needed before an acceptable solution can be found. There are, in fact, several industrial situations in which fitness evaluations are computationally expensive and the time available is very short. In these applications efficient strategies to approximate the fitness function have to be adopted, looking for a trade-off between optimization performance and efficiency. This is the case in designing a complex embedded system, where it is necessary to define an optimal architecture in relation to certain performance indexes while respecting strict time-to-market constraints. This activity, known as design space exploration (DSE), is still a great challenge for the EDA (electronic design automation) community. One of the most important bottlenecks in the overall design flow of an embedded system is due to simulation. Simulation occurs at every phase of the design flow and is used to evaluate a system which is a candidate for implementation. In this paper we focus on system level design, proposing an extensive comparison of the state-of-the-art of MOEA approaches with an approach based on fuzzy approximation to speed up the evaluation of a candidate system configuration. The comparison is performed in a real case study: optimization of the performance and power dissipation of embedded architectures based on a Very Long Instruction Word (VLIW) microprocessor in a mobile multimedia application domain. The results of the comparison demonstrate that the fuzzy approach outperforms in terms of both performance and efficiency the state of the art in MOEA strategies applied to DSE of a parameterized embedded system. 相似文献

11.

Performance evaluation of network processor architectures: combining simulation with analytical estimation

《Computer Networks》2003,41(5):641-665

The designs of most systems-on-a-chip (SoC) architectures rely on simulation as a means for performance estimation. Such designs usually start with a parameterizable template architecture, and the design space exploration is restricted to identifying the suitable parameters for all the architectural components. However, in the case of heterogeneous SoC architectures such as network processors the design space exploration also involves a combinatorial aspect––which architectural components are to be chosen, how should they be interconnected, task mapping decisions––thereby increasing the design space. Moreover, in the case of network processor architectures there is also an associated uncertainty in terms of the application scenario and the traffic it will be required to process. As a result, simulation is no longer a feasible option for evaluating such architectures in any automated or semi-automated design space exploration process due to the high simulation times involved. To address this problem, in this paper we hypothesize that the design space exploration for network processors should be separated into multiple stages, each having a different level of abstraction. Further, it would be appropriate to use analytical evaluation frameworks during the initial stages and resort to simulation techniques only when a relatively small set of potential architectures is identified. None of the known performance evaluation methods for network processors have been positioned from this perspective.We show that there are already suitable analytical models for network processor performance evaluation which may be used to support our hypothesis. To this end, we choose a reference system-level model of a network processor architecture and compare its performance evaluation results derived using a known analytical model [Thiele et al., Design space exploration of network processor architectures, in: Proc. 1st Workshop on Network Processors, Cambridge, MA, February 2002; Thiele et al., A framework for evaluating design tradeoffs in packet processing architectures, in: Proc. 39th Design Automation Conference (DAC), New Orleans, USA, ACM Press, 2002] with the results derived by detailed simulation. Based on this comparison, we propose a scheme for the design space exploration of network processor architectures where both analytical performance evaluation techniques and simulation techniques have unique roles to play. 相似文献

12.

基于龙芯3A处理器的嵌入式操作系统BSP设计技术

殷杰波《测控技术》2014,33(7):121-123

以国产龙芯3A处理器为栽体,通过分析多核系统的体系架构和硬件特征,提出了基于龙芯3A处理器的JARI-works操作系统BsP设计的基本思路及关键技术,目前该BSP包已经在项目中得到应用,对其他嵌入式操作系统在龙芯系列处理器上的移植具有一定的参考作用。相似文献

13.

Multi-objective efficient design space exploration and architectural synthesis of an application specific processor (ASP) 总被引：1，自引：0，他引：1

Anirban SenguptaAuthor VitaeReza SedaghatAuthor Vitae Zhipeng Zeng 《Microprocessors and Microsystems》2011,35(4):392-404

As the growth of system complexity rapidly increases, the gap between Electronic System Level (ESL) and the Register Transfer Level (RTL) must be filled. Currently, Very Large Scale Integration (VLSI) and System-on-Chip (SoC) designs are multi-objective in nature, requiring simultaneous fulfillment of multiple parameters. Extensive research on Design Space Exploration (DSE) problems and synthesis of an application specific processor (ASP) design have been done until now but none of the prior works have focused explicitly on integrating a fast multi-objective architecture exploration mechanism with the architectural synthesis stages to formalize the design methodology of an application specific processor in case of multiple objectives. This paper proposes a design methodology of a multi-objective application specific processor by integrating an efficient multi-objective (area occupied, execution time and power consumption) exploration approach with the architecture synthesis process, useful for portable devices and many high end applications. The formalized steps of the design methodology for the ASP guarantees the designer an error free approach to design the system with strict limitations on compound operational constraints. The results of implementation of the designed ASP using the proposed design methodology in FPGA and ASIC have also been shown. 相似文献

14.

基于Au1200处理器的车载多媒体电脑设计

下载免费PDF全文

王建国马然《计算机工程》2009,35(23):243-245

提出一种基于Au1200处理器的车载多媒体电脑设计方案,采用SOC技术及MIPS架构,嵌入WindowsCE操作系统,从而实现多种功能,介绍硬件平台设计方案和操作系统的开发流程,并对嵌入式操作系统的设计过程进行说明。仿真实验结果表明,该方案是有效可行的。相似文献

15.

Variable Length Instruction Compression on Transport Triggered Architectures

Timo Viitanen Janne Helkala Heikki Kultala Pekka Jääskeläinen Jarmo Takala Tommi Zetterman Heikki Berg 《International journal of parallel programming》2018,46(6):1283-1303

The memories used for embedded microprocessor devices consume a large portion of the system’s power. The power dissipation of the instruction memory can be reduced by using code compression methods, which may require the use of variable length instruction formats in the processor. The power-efficient design of variable length instruction fetch and decode is challenging for static multiple-issue processors, which aim for low power consumption on embedded platforms. The memory-side power savings using compression are easily lost on inefficient fetch unit design. We propose an implementation for instruction template-based compression and two instruction fetch alternatives for variable length instruction encoding on transport triggered architecture, a static multiple-issue exposed data path architecture. With applications from the CHStone benchmark suite, the compression approach reaches an average compression ratio of 44% at best. We show that the variable length fetch designs reduce the number of memory accesses and often allow the use of a smaller memory component. The proposed compression scheme reduced the energy consumption of synthesized benchmark processors by 15% and area by 33% on average. 相似文献

16.

基于FPGA的组件控制器设计

刘利民《微计算机信息》2007,23(8):222-224

OC(System on a Chip)是目前国际上嵌入式系统研究的一个热点。为了能使SOC面向实际应用,需要设计各种相应的组件控制器。本文描述了采用可编程器件进行组件控制器设计的方法,并以LCD控制为实例,采用FPGA进行设计、仿真和实现SOC单指令驱动LCD工作。仿真结果表明:该设计是成功的,使SOC外部组件的操控运行更高效。相似文献

17.

Automatic instruction-set architecture synthesis for VLIW processor cores in the ASAM project

《Microprocessors and Microsystems》2017

相似文献

18.

基于FPGA的航空发动机电子控制器设计技术研究 总被引：1，自引：0，他引：1

刘冬冬张天宏黄向华陈建《测控技术》2012,31(1):57-61

基于FPGA的并行运行、可重配置以及采用软/硬件协同设计的技术特点,提出了一种基于FPGA的片内分布式航空发动机电子控制器设计方法。重点研究了FPGA内嵌处理器选型、硬件协处理器及同步数据总线设计等3个关键技术问题。在此基础上,基于Altera FPGAEP2C35设计了控制器原理样机,并进行了硬件性能测试,结果表明该控制器设计方法在当前的技术条件下具有实施的可行性。所提出的发动机电子控制器设计方法有利于克服当前集中式电子控制器设计时存在的软件高度定制、可重用性差、并行实时任务开发难度大、开发效率低等缺相似文献

19.

Design of write merging and read prefetching buffer in DRAM controller for embedded processor

Chen Zhao Kuizhi Mei Nanning ZhengAuthor Vitae 《Microprocessors and Microsystems》2014

Write merging and read prefetching are effective methods for improving processor performance, and they are mainly used in processors for desktop or server. As embedded system requires more powerful microprocessor, how to improve the performance of embedded processor is worthy of concern. This paper presents the architecture of write merging and read prefetching buffer in DRAM controller for embedded processor. The evaluation model is constructed, and the result demonstrates that the proposed method can reduce cache miss penalty dramatically. Additionally, the design of DRAM controller with write merging and read prefetching buffer is implemented and verified on FPGA platform, it can reduce CPI by 19.6% on average. Moreover, the RTL module of presented design is synthesized by Design Compiler, and synthesis result shows that hardware cost of proposed architecture is relatively small compared to performance amelioration. 相似文献

20.

Embedded processor validation environment using a cycle-accurate retargetable instruction-set simulator

Hoonmo Yang Moonkey Lee 《The Journal of supercomputing》2005,33(1-2):19-32

相似文献