期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Embedded software in real-time signal processing systems:application and architecture trends

Paulin P.G. Liem C. Cornero M. Nacabal F. Goossens G. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1997,85(3):419-435

相似文献

2.

Configuration and Extension of Embedded Processors to Optimize IPSec Protocol Execution

Potlapally N.R. Ravi S. Raghunathan A. Lee R.B. Jha N.K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(5):605-609

Security protocols, such as IPSec and SSL, are being increasingly deployed in the context of networked embedded systems. The resource-constrained nature of embedded systems and, in particular, the modest capabilities of embedded processors make it challenging to achieve satisfactory performance while executing security protocols. A promising approach for improving performance in embedded systems is to use application-specific instruction set processors that are designed based on configurable and extensible processors. In this paper, we perform a comprehensive performance analysis of the IPSec protocol on a state-of-the-art configurable and extensible embedded processor (Xtensa from Tensilica Inc.). We present performance profiles of a lightweight embedded IPSec implementation running on the Xtensa processor, and examine in detail the various factors that contribute to the processing latencies, including cryptographic and protocol processing. In order to improve the efficiency of IPSec processing on embedded devices, we then study the impact of customizing an embedded processor by synergistically 1) configuring architectural parameters, such as instruction and data cache sizes, processor-memory interface width, write buffers, etc., and 2) extending the base instruction set of the processor using custom instructions for both cryptographic and protocol processing. Our experimental results demonstrate that upto 3.2times speedup in IPSec processing is possible over a popular embedded IPSec software implementation 相似文献

3.

片上异构双PowerPC雷达控制器的设计与应用

施海锋柏玉娴《现代雷达》2014,(6):35-38

针对Virtex-5 FXT系列FPGA中具有两个PowerPC440嵌入式处理器内核的特点,文中提出一种“主-从”异构式控制模型架构的嵌入式雷达控制器设计方法。该方法采用FC、sRIO等高速串行传输技术提高了控制器接口带宽,并通过预先任务规划,充分发挥了两个PowerPC处理器的性能,设计成本与已有解决方案相比显著降低。应用表明,该控制器整体性能明显提高,可满足现代相控阵雷达提出的微秒级响应与吉比特级传输要求。相似文献

4.

Integration of medium-throughput signal processing algorithms on flexible instruction-set architectures

Gert Goossens Dirk Lanneer Marc Pauwels Francis Depuydt Koen Schoofs Augusli Kifli Marco Cornero Paolo Petroni Francky Catthoor Hugo De Man 《Journal of Signal Processing Systems》1995,9(1-2):49-65

Integrated circuits in telecommunications and consumer electronics are rapidly evolving towards single chip solutions. New IC architectures are emerging, which combine instruction-set processor cores with customised hardware. This paper describes a high-level synthesis system for integration of real-time signal processing systems on such processor cores. The compiler supports a flexible architectural model. It can handle certain types of incompletely specified architectures, and offers capabilities for retargetable compilation and architectural exploration. Results for a realistic application from the domain of audio processing indicate the feasibility and power of the presented approach. 相似文献

5.

基于DSP的无线宽带通信接收机信号同步模块设计 总被引：1，自引：0，他引：1

赵娟《电子设计工程》2014,(4):173-175,178

基于目前无线通信领域飞速发展,为促进嵌入技术在无线通信领域的应用,DSP是目前应用最热门的嵌入式处理器之一,在无线通信领域应用很广.本文采用DSP嵌入式处理器,通过软硬件相结合的方法,结合软件设计与验证试验,得出无线宽带通信接收机信号同步模块的设计方案.本文重点是从信号同步原理和信号同步实现两方面入手研究了利用DSP实现无线宽带通信系统信号同步模块的设计. 相似文献

6.

Instruction-Based Self-Testing of Delay Faults in Pipelined Processors

Singh V. Inoue M. Saluja K. K. Fujiwara H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(11):1203-1215

Aggressive processor design methodology using high-speed clock and deep submicrometer technology is necessitating the use of at-speed delay fault testing. Although nearly all modern processors use pipelined architecture, no method has been proposed in literature to model these for the purpose of test generation. This paper proposes a graph theoretic model of pipelined processors and develops a systematic approach to path delay fault testing of such processor cores using the processor instruction set. The proposed methodology generates test vectors under the extracted architectural constraints. These test vectors can be applied in functional mode of operation, hence, self-test becomes possible. Self-test in a functional mode can also be used for online periodic testing. Our approach uses a graph model for architectural constraint extraction and path classification. Test vectors are generated using constrained automatic test pattern generation (ATPG) under the extracted constraints. Finally, a test program consisting of an instruction sequence is generated for the application of generated test vectors. We applied our method to two example processors, namely a 16-bit 5-stage VPRO pipelined processor and a 32-bit pipelined DLX processor, to demonstrate the effectiveness of our methodology 相似文献

7.

Hardware/Software Communication and System Integration for Embedded Architectures

Steven Vercauteren Bill Lin 《Design Automation for Embedded Systems》1997,2(3-4):359-382

Embedded system architectures comprising of software programmable components (e.g. DSP, ASIP, and micro-controller cores) and customized hardware co-processors, integrated into a single cost-efficient VLSI chip, are emerging as a key solution to todays microelectronics design problems. This trend is being driven by new emerging applications in the areas of wireless communication, high-speed optical networking, and multimedia computing, fueled by increasing levels of integration. These applications are often subject to stringent requirements in terms of processing performance, power dissipation, and flexibility. A key problem confronted by embedded system designers today is the rapid prototyping of an application-specific embedded system architecture where different combinations of programmable processor components, library hardware components, and customized hardware components must be integrated together, while ensuring that the hardware and software parts communicate correctly. Designers often spend an enormous time on this highly error proned task. In this paper, we present a solution to this embedded architecture co-synthesis and system integration problem based on an orchestrated combination of architectural strategies, parameterized libraries, and software CAD tools. 相似文献

8.

Predictive system shutdown and other architectural techniques forenergy efficient programmable computation

Srivastava M.B. Chandrakasan A.P. Brodersen R.W. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1996,4(1):42-55

With the popularity of portable devices such as personal digital assistants and personal communicators, as well as with increasing awareness of the economic and environmental costs of power consumption by desktop computers, energy efficiency has emerged as an important issue in the design of electronic systems. While power efficient ASIC's with dedicated architectures have addressed the energy efficiency issue for niche applications such as DSP, much of the computation continues to be implemented as software running on programmable processors such as microprocessors, microcontrollers, and programmable DSP's. Not only is this true for general purpose computation on personal computers and workstations, but also for portable devices, application-specific systems etc. In fact, firmware and embedded software executing on RISC and DSP processor cores that are embedded in ASIC's has emerged as a leading implementation methodology for speech coding, modem functionality, video compression, communication protocol processing etc. This paper describes architectural techniques for energy efficient implementation of programmable computation, particularly focussing on the computation needed in portable devices where event-driven user interfaces, communication protocols, and signal processing play a dominant role. Two key approaches described here are predictive system shutdown and extended voltage scaling. Results indicate that a large reduction in power consumption can be achieved over current day solutions with little or no loss in system performance 相似文献

9.

可扩展处理器的自定义指令自动识别综述

下载免费PDF全文

肖成龙王珊珊王心霖林军王晶玥《电子学报》2020,48(8):1655-1664

近年来,可扩展处理器越来越多地应用于嵌入式系统当中.在可扩展处理器周围使用自定义指令能够保证一定的灵活性,同时也能很好地满足嵌入式应用对高性能和低功耗的需求.自定义指令自动识别是可扩展处理器设计中的关键问题之一.针对可扩展处理器的应用领域和发展趋势,介绍近年来自定义指令自动识别的研究进展;在此基础上,对于自定义指令识别涉及的关键步骤：中间表示生成、自定义指令枚举、自定义指令选择和代码转换,分别进行总结和归纳,分析不同方法的优点和难点;按照不同应用领域,对可扩展处理器的应用进行了总结和分析;最后展望了自定义指令自动识别的未来发展趋势和研究方向. 相似文献

10.

Recent trends in embedded system software performance estimation

Rajendra Patel Arvind Rajawat 《Design Automation for Embedded Systems》2013,17(1):193-213

It is observed that due to the availability of fast and highly efficient processors, many embedded system developers are attracted to implement the majority of the system components in software rather than hardware. Software implementation offers a great level of flexibility and scalability of the design. At the same time, a wide choice exists between generic processors, DSP processors, network processors, etc. This increases the design space exploration by many folds to select an appropriate processor or a processor version for a specific application or application component. In this review, recent prominent directions for embedded software performance estimation have been discussed and their salient features are summarized. 相似文献

11.

Retargetable Code Generation Based on Structural Processor Description

Rainer Leupers Peter Marwedel 《Design Automation for Embedded Systems》1998,3(1):75-108

Design automation for embedded systems comprising both hardware and software components demands for code generators integrated into electronic CAD systems. These code generators provide the necessary link between software synthesis tools in HW/SW codesign systems and embedded processors. General-purpose compilers for standard processors are often insufficient, because they do not provide flexibility with respect to different target processors and also suffer from inferior code quality. While recent research on code generation for embedded processors has primarily focussed on code quality issues, in this contribution we emphasize the importance of retargetability, and we describe an approach to achieve retargetability. We propose usage of uniform, external target processor models in code generation, which describe embedded processors by means of RT-level netlists. Such structural models incorporate more hardware details than purely behavioral models, thereby permitting a close link to hardware design tools and fast adaptation to different target processors. The MSSQ compiler, which is part of the MIMOLA hardware design system, operates on structural models. We describe input formats, central data structures, and code generation techniques in MSSQ. The compiler has been successfully retargeted to a number of real-life processors, which proves feasibility of our approach with respect to retargetability. We discuss capabilities and limitations of MSSQ, and identify possible areas of improvement. 相似文献

12.

Single Source Design Environment for Embedded Systems Based on SystemC

H.?Posadas Email author F.?Herrera V.?Fernández P.?Sánchez E.?Villar F.?Blasco 《Design Automation for Embedded Systems》2004,9(4):293-312

相似文献

13.

The Advanced Onboard Signal Processor (AOSP)

Daniel J. Dechant 《The Journal of VLSI Signal Processing》1990,2(2):69-78

The Advanced Onboard Signal Processor (AOSP) address the onboard processing requirements of future space systems. In this context, a unique multiprocessor architectural concept has evolved, envisioning a distributed array of loosely coupled programmable processors interconnected by a packet-based high-speed bus local area network. Sophisticated fault recovery and reconfiguration capabilities achieve the desired degrees of autonomy and system availability. Processing tasks are partitioned and mapped onto the resources of, the AOSP in the form of a generalized series-parallel pipeline, with processor interactions synchronized by data flow.The AOSP architectural concept is extremely flexible. Without change to the fundamental operating system, AOSP can be configured to perform a wide range of processing applications. Moreover, the design is not restricted to any particular topology or number of processors.Novel at its conception [1], [2], AOSP's major design features are becoming increasingly popular, and the literature on loosely coupled and data-flow-synchronized multiprocessors is expanding rapidly [3]–[8]. In this article we describe the architecture of the AOSP, its embedded survivable local area network (LAN), its operating system software, and its fault tolerance features; discuss the issues associated with partitioning and programming of applications; and, finally, summarize the program phases. 相似文献

14.

高密度集成与单芯片多核系统及其研究进展

李东生高明伦《半导体技术》2012,37(2):89-95

在体积、重量和功耗有严格约束的情况下,系统小型化遇到多种技术挑战,为了满足高密度计算和小型化的要求,高密度系统集成和单芯片多核处理器至关重要。讨论了高密度集成与单芯片多核处理器技术及其研究进展,其中包括单芯片多核处理器(CMP)、片上网络(NoC)、3D集成电路、高密度封装。提出了CMP的两个发展特征,即小核大数量和层次型簇结构。指出高密度集成设计与高密度封装设计逐渐融合,并为单芯片多核系统的物理实现提供了技术保证,为最终实现高密度计算和小型化系统提供了硬件解决方案。相似文献

15.

A circuit-driven design methodology for video signal-processingdatapath elements

Dutta S. Wolf W. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(2):229-240

The programmable video signal processor (VSP) is an important category of processors for multimedia systems. Programmable video processors combine the flexibility of programmability with special architectural features that improve performance on video processing applications. VSPs are typically multiple processors with several processing elements (PEs) and a parallel memory system. This paper focuses on the architectural design of the PE's in a video processor and shows how technology and circuit parameters influence the structure of the datapath and, hence, the overall architecture of a programmable VSP. We emphasize the need to consider technological and circuit-level issues during the design of a system architecture and present a method whereby the conceptual organization of the PEs-the number of PEs, pipelining of the datapath, size of the register file, and number of register ports-can be evaluated in terms of a target set of applications before a detailed design is undertaken. We use motion-estimation and discrete cosine transform as example applications to illustrate how various technology parameters affect the architectural design choices. We show that the design of the register file and the datapath-pipeline depth can drastically affect PE utilization and, therefore, the number of PEs required for different applications. Our results demonstrate that pursuing the fastest cycle time can greatly increase the silicon area which must be devoted to PEs, due to both increased pipeline latency and reduced register file bandwidth 相似文献

16.

System-level power consumption modeling and tradeoff analysistechniques for superscalar processor design

Conte T.M. Menezes K.N. Sathaye S.W. Toburen M.C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(2):129-137

This paper presents systematic techniques to find low-power high-performance superscalar processors tailored to specific user applications. The model of power is novel because it separates power into architectural and technology components. The architectural component is found via trace-driven simulation, which also produces performance estimates. An example technology model is presented that estimates the technology component, along with critical delay time and real estate usage. This model is based on case studies of actual designs. It is used to solve an important problem: decreasing power consumption in a superscalar processor without greatly impacting performance. Results are presented from runs using simulated annealing to reduce power consumption subject to performance reduction bounds. The major contributions of this paper are the separation of architectural and technology components of dynamic power the use of trace-driven simulation for architectural power measurement, and the use of a near-optimal search to tailor a processor design to a benchmark 相似文献

17.

Design of a system software based on a Java SoC processor

Zhirui Chen Chunyou Lin Hongzhou Tan 《电子科学学刊(英文版)》2010,27(6):853-859

Java technology is spreading rapidly all over the world in recent years. It is a popular application development language for its well-encapsulation, platform-independent and high security. There are great amounts of Java games and other gadgets on mobile platforms, as well as on set-up-box systems. As Java applications become more sophisticated, the Java Virtual Machine (JVM) middle-wares in embedded systems are not satisfying, Java-specific chips extend in the market. All existing Java-based system software or Operating System (OS) are used on JVM, they cannot be used on Java processors. It is important to develop a pure Java system software or OS so that embedded systems using Java processors will have great performance in Java applications. This paper presents a set of system software designed for a Java-specified processor VP6K, which is also a System-on-Chip (SoC). This system software includes real-time multitask dispatching, file management, device management, hardware drivers, and infrastructural Application Programming Interface (APIs). According to experimental results, the system software provides interfaces for Java programs to fully handle CPU resource, so that all applications can be executed properly and efficiently. VP6K embedded platform shows its good performance for Java applications when the system software is implemented. 相似文献

18.

Architecture and Compiler Optimizations for Data Bandwidth Improvement in Configurable Processors 总被引：1，自引：0，他引：1

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(9):986-997

Many commercially available embedded processors are capable of extending their base instruction set for a specific domain of applications. While steady progress has been made in the tools and methodologies of automatic instruction set extension for configurable processors, the limited data bandwidth available in the core processor (e.g., the number of simultaneous accesses to the register file) becomes a potential performance bottleneck. In this paper, we first present a quantitative analysis of the data bandwidth limitation in configurable processors, and then propose a novel low-cost architectural extension and associated compilation techniques to address the problem. Specifically, we embed a single control bit in the instruction op-codes to selectively copy the execution results to a set of hash-mapped shadow registers in the write-back stage. This can efficiently reduce the communication overhead due to data transfers between the core processor and the custom logic. We also present a novel simultaneous global shadow register binding with a hash function generation algorithm to take full advantage of the extension. The application of our approach leads to a nearly optimal performance speedup 相似文献

19.

An architectural co-synthesis algorithm for distributed, embeddedcomputing systems

Wolf W.H. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1997,5(2):218-229

Many embedded computers are distributed systems, composed of several heterogeneous processors and communication links of varying speeds and topologies. This paper describes a new, heuristic algorithm which simultaneously synthesizes the hardware and software architectures of a distributed system to meet a performance goal and minimize cost. The hardware architecture of the synthesized system consists of a network of processors of multiple types and arbitrary communication topology; the software architecture consists of an allocation of processes to processors and a schedule for the processes. Most previous work in co-synthesis targets an architectural template, whereas this algorithm can synthesize a distributed system of arbitrary topology. The algorithm works from a technology database which describes the available processors, communication links, I/O devices, and implementations of processes on processors. Previous work had proposed solving this problem by integer linear programming (ILP); our algorithm is much faster than ILP and produces high-quality results 相似文献

20.

IMAPCAR: A 100 GOPS In-Vehicle Vision Processor Based on 128 Ring Connected Four-Way VLIW Processing Elements

Shorin Kyo Shin’ichiro Okazaki 《Journal of Signal Processing Systems》2011,62(1):5-16

This paper presents IMAPCAR, a 100GOPS programmable highly parallel vision processor LSI consuming less than 2 W of power for in-vehicle vision tasks of driver assistance systems. First, requirements of vision processors for driver assistance systems as well as the characteristics of vision tasks for safety are summarized. Next, features in the design of IMAPCAR are described in detail, which comparing with a previous design, improved the performance for major vision tasks by a factor of 2.5 while reduced 50% of power. Design choices taken by other in-vehicle vision processors are also compared and analyzed. Finally, technology perspectives of future in-vehicle vision processors are discussed. 相似文献