共查询到20条相似文献,搜索用时 15 毫秒
1.
Design automation for embedded systems comprising both hardware and software components demands for code generators integrated into electronic CAD systems. These code generators provide the necessary link between software synthesis tools in HW/SW codesign systems and embedded processors. General-purpose compilers for standard processors are often insufficient, because they do not provide flexibility with respect to different target processors and also suffer from inferior code quality. While recent research on code generation for embedded processors has primarily focussed on code quality issues, in this contribution we emphasize the importance of retargetability, and we describe an approach to achieve retargetability. We propose usage of uniform, external target processor models in code generation, which describe embedded processors by means of RT-level netlists. Such structural models incorporate more hardware details than purely behavioral models, thereby permitting a close link to hardware design tools and fast adaptation to different target processors. The MSSQ compiler, which is part of the MIMOLA hardware design system, operates on structural models. We describe input formats, central data structures, and code generation techniques in MSSQ. The compiler has been successfully retargeted to a number of real-life processors, which proves feasibility of our approach with respect to retargetability. We discuss capabilities and limitations of MSSQ, and identify possible areas of improvement. 相似文献
2.
以嵌入式系统编译器LCC和32位MIPS处理器为基础,完成了LCC在目标机MIPS处理器上的移植工作。为迅速有效地生成代码生成器,根据新目标机的特点,将原有的宏汇编指令通过指令拆分和指令间的相互转化技术重新书写机器描述文件,使得生成的目标代码包含的指令集更小,结构更加紧凑。目标代码的操作码约缩小50%,并成功实现C代码到汇编代码的转换,能通过MIPS模拟器PCSPIM的验证,同时性能也得到大幅提高。通过汇编器生成相应的机器码,并用Xilinx ISE自带的仿真软件Isim(ISE Simulator)验证了其正确性,实现LCC在MIPS处理器上的成功移植。 相似文献
3.
A single integer linear programming model for optimally scheduling partitioned regular algorithms is presented. The herein presented methodology differs from existing methods in the following capabilities: 1) Not only constraints on the number of available processors and communication capabilities are taken into account, but also local memories and constraints on the size of available memories. 2) Different types of processors can be handled. 3) The size of the optimization model (number of integer variables) is independent of the size of the tiles to be executed. Hence, 4) the number of integer variables in the optimization model is greatly reduced such that problems of relevant size can be solved in practical execution time. 相似文献
4.
该文对紧凑的频率选择性表面阵列的谐振特性进行了详细的研究。这类新型谐振单元尺寸大大缩小,结构紧凑,为低频段的频率选择性表面阵列的实现提供了可能。计算仿真与测量结果基本相符合。 相似文献
5.
In this paper, we present a solution to the problem of joint tiling and scheduling a given loop nest with uniform data dependencies symbolically. This challenge arises when the size and number of available processors for parallel loop execution is not known at compile time. But still, in order to avoid any overhead of dynamic (run-time) recompilation, a schedule of loop iterations shall be computed and optimized statically. In this paper, it will be shown that it is possible to derive parameterized latency-optimal schedules statically by proposing a two step approach: First, the iteration space of a loop program is tiled symbolically into orthotopes of parametrized extensions. Subsequently, the resulting tiled program is also scheduled symbolically, resulting in a set of latency-optimal parameterized schedule candidates. At run time, once the size of the processor array becomes known, simple comparisons of latency-determining expressions finally steer which of these schedules will be dynamically selected and the corresponding program configuration executed on the resulting processor array so to avoid any further run-time optimization or expensive recompilation. Our theory of symbolic loop parallelization is applied to a number of loop programs from the domains of signal processing and linear algebra. Finally, as a proof of concept, we demonstrate our proposed methodology for a massively parallel processor array architecture called tightly coupled processor array (TCPA) on which applications may dynamically claim regions of processors in the context of invasive computing. 相似文献
6.
1Introduction Withrapidincreasinginnetworktechnologyanddis tributedcomputing,gridhasemergedfromitsinitial scientificbackground[1]andmadefirstattemptstostart acommercialusagewithaccesstonongrid relatedpro fessionalsandend users[2-5].Significantworkhasbeen … 相似文献
7.
Presented here are techniques for generating desired gray codes with properties different from those of the standard binary reflected gray code. For communications using phase shift-keyed (PSK) signals with different significances for different phases, it is desirable to have such codes with different bit error probabilities. For some applications, it is important to have codes with arbitrary crossover Hamming distances. These techniques are suitable for generating codes with either or both of these properties. The codes generated may be far longer than those generated by current computer search techniques. 相似文献
8.
This paper presents results of using a Coarse Grain Reconfigurable Architecture called DRRA (Dynamically Reconfigurable Resource Array) for FFT implementations varying in order and degree of parallelism using radix-2 decimation in time (DIT). The DRRA fabric is extended with memory architecture to be able to deal with data-sets much larger than what can be accommodated in the register files of DRRA. The proposed implementation scheme is generic in terms of the number of FFT point, the size of memory and the size of register file in DRRA. Two implementations (DRRA-1 and DRRA-2) have been synthesized in 65 nm technology and energy/delay numbers measured with post-layout annotated gate level simulations. The results are compared to other Coarse Grain Reconfigurable Architectures (CGRAs), and dedicated FFT processors for 1024 and 2048 point FFT. For 1024 point FFT, in terms of FFT operations per unit energy, DRRA-1 and DRRA-2 outperforms all CGRA by at least 2× and is worse than ASIC by 3.45×. However, in terms of energy-delay product DRRA-2 outperforms CGRAs by at least 1.66× and dedicated FFT processors by at least 10.9×. For 2048-point FFT, DRRA-1 and DRRA-2 are 10× better for energy efficiency and 94.84 better for energy-delay product. However, radix-2 implementation is worse by 9.64× and 255× in terms of energy efficiency and energy-delay product when compared against a radix-24 implementation. 相似文献
9.
In this article, we discuss the design of a smart-physics-based processor for microcantilever sensor arrays. The processor is coupled to a microelectromechanical sensor and estimates the presence of critical materials or chemicals in solution. We first briefly present microcantilever sensors and then discuss the microcantilever sensor array design, which consists of the cantilever physics propagation model, cantilever array measurement model, model-based parameter estimator design, and model-based processor (MBP) design. Finally, we end with experimental results and conclusions 相似文献
10.
Fast changing, increasingly complex, and diverse computing platforms pose central problems in scientific computing: How to achieve, with reasonable effort, portable optimal performance? We present SPIRAL, which considers this problem for the performance-critical domain of linear digital signal processing (DSP) transforms. For a specified transform, SPIRAL automatically generates high-performance code that is tuned to the given platform. SPIRAL formulates the tuning as an optimization problem and exploits the domain-specific mathematical structure of transform algorithms to implement a feedback-driven optimizer. Similar to a human expert, for a specified transform, SPIRAL "intelligently" generates and explores algorithmic and implementation choices to find the best match to the computer's microarchitecture. The "intelligence" is provided by search and learning techniques that exploit the structure of the algorithm and implementation space to guide the exploration and optimization. SPIRAL generates high-performance code for a broad set of DSP transforms, including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms. Experimental results show that the code generated by SPIRAL competes with, and sometimes outperforms, the best available human tuned transform library code. 相似文献
12.
Simulation and design of microfluidic systems requires various level models: high-fidelity models (usually 3D) for design and optimization of particular elements and devices, as well as system-level models allowing for VLSI-scale simulation of such systems. For the latter purpose, reduced or compact models are necessary to make such system simulations computationally feasible. In this paper, we present a design methodology and practical approach for generation of compact models of microfluidic elements. In this procedure we use high-fidelity 3D simulations of the microfluidic devices to extract their characteristics for compact models, and subsequently, to validate the compact model behavior in various regimes of operation. The compact models are generated automatically in the formats that can be directly used in SPICE or SABER. As an example of a nonlinear fluidic device, the generation of compact model for Tesla valve is described in detail. Tesla valve is one of the no-moving-parts (NMP) valves used in micropumps in MEMS. Its principle of operation is based on the rectification of the fluid, so it may be considered as a fluidic diode. 相似文献
13.
The initial part of this paper reviews the early challenges (c 1980) in achieving real-time silicon implementations of DSP computations. In particular, it discusses research on application specific architectures, including bit level systolic circuits that led to important advances in achieving the DSP performance levels then required. These were many orders of magnitude greater than those achievable using programmable (including early DSP) processors, and were demonstrated through the design of commercial digital correlator and digital filter chips. As is discussed, an important challenge was the application of these concepts to recursive computations as occur, for example, in Infinite Impulse Response (IIR) filters. An important breakthrough was to show how fine grained pipelining can be used if arithmetic is performed most significant bit (msb) first. This can be achieved using redundant number systems, including carry-save arithmetic. This research and its practical benefits were again demonstrated through a number of novel IIR filter chip designs which at the time, exhibited performance much greater than previous solutions. The architectural insights gained coupled with the regular nature of many DSP and video processing computations also provided the foundation for new methods for the rapid design and synthesis of complex DSP System-on-Chip (SoC), Intellectual Property (IP) cores. This included the creation of a wide portfolio of commercial SoC video compression cores (MPEG2, MPEG4, H.264) for very high performance applications ranging from cell phones to High Definition TV (HDTV). The work provided the foundation for systematic methodologies, tools and design flows including high-level design optimizations based on ”algorithmic engineering” and also led to the creation of the Abhainn tool environment for the design of complex heterogeneous DSP platforms comprising processors and multiple FPGAs. The paper concludes with a discussion of the problems faced by designers in developing complex DSP systems using current SoC technology. 相似文献
15.
As the microelectronics technology continuously advances to deep submicron scales, the occurrence of Multiple Cell Upset (MCU) induced by radiation in memory devices becomes more likely to happen. The implementation of a robust Error Correction Code (ECC) is a suitable solution. However, the more complex an ECC, the more delay, area usage and energy consumption. An ECC with an appropriate balance between error coverage and computational cost is essential for applications where fault tolerance is heavily needed, and the energy resources are scarce. This paper describes the conception, implementation, and evaluation of Column-Line-Code (CLC), a novel algorithm for the detection and correction of MCU in memory devices, which combines extended Hamming code and parity bits. Besides, this paper evaluates the variation of the 2D CLC schemes and proposes an additional operation to correct more MCU patterns called extended mode. We compared the implementation cost, reliability level, detection/correction rate and the mean time to failure among the CLC versions and other correction codes, proving the CLCs have high MCU correction efficacy with reduced area, power and delay costs. 相似文献
17.
Functional validation of pipelined microprocessors is a challenging task, as the behavior of a pipeline is determined by a sequence of instructions and by the interaction between their operands. This paper describes an approach to automatic test-program generation based on an evolutionary algorithm. The proposed methodology is able to tackle complex pipelined designs. Human intervention is limited to the formalized listing of the instruction set, and also internal parameters of the test program generator are auto-adapted. A prototype was built and exploited to generate test programs for the DLX/pII, a pipelined microprocessor. For the purpose of these experiments, test programs were devised trying to maximize the RT-level statement coverage. However, the method can be used to generate test programs on different target metrics. Results show the feasibility and effectiveness of the method. 相似文献
18.
We describe a technique for image encoding in which local operators of many scales but identical shape serve as the basis functions. The representation differs from established techniques in that the code elements are localized in spatial frequency as well as in space. Pixel-to-pixel correlations are first removed by subtracting a lowpass filtered copy of the image from the image itself. The result is a net data compression since the difference, or error, image has low variance and entropy, and the low-pass filtered image may represented at reduced sample density. Further data compression is achieved by quantizing the difference image. These steps are then repeated to compress the low-pass image. Iteration of the process at appropriately expanded scales generates a pyramid data structure. The encoding process is equivalent to sampling the image with Laplacian operators of many scales. Thus, the code tends to enhance salient image features. A further advantage of the present code is that it is well suited for many image analysis tasks as well as for image compression. Fast algorithms are described for coding and decoding. 相似文献
20.
地址产生器是FFT处理器的主要组成部分,地址快速生成和旋转因子读取次数是它的两个重要指标,但很少有算法能够将其统一起来。本文采取了一种新的操作数地址生成顺序并构造了一种新的FFT循环级数表示方法,基于操作数地址的位倒序方式,提出了一种兼有地址简单快速生成与避免重复读取旋转因子特点的可变长地址生成方法,解决了以往地址产生时生成速度与旋转因子重复读取之间的矛盾,实现了快速和降低系统功耗的统一。 相似文献
|