首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
软硬件划分问题是软硬件协同设计的重要问题之一,它涉及到系统建模,划分算法和划分方案评价等问题,其中划分算法设计是关键点。以提高系统时间性能为目标,利用任务流图构造系统模型,在其上实现了基于优先权的评价函数,提出了搜索空间平滑技术与离散粒子群算法相结合的软硬件划分算法,并且解决了两者的融合问题,并能根据系统信息动态适应调整算法参数。实验结果表明,算法时间开销稳定,求解质量较高。  相似文献   

2.
基于NSGA-II的嵌入式系统软硬件划分方法   总被引:2,自引:0,他引:2  
软硬件划分是软硬件协同设计中的一个关键问题。针对单处理器嵌入式系统,提出将NSGA-II应用于软硬件划分中,该算法一次运行可以获得多个Pareto最优解,为各个目标函数之间权衡分析提供了有效的工具,提高了设计效率。结果表明,通过该划分方法,在满足系统性能要求下,可为复杂嵌入式系统提供多个设计目标的全局优化方案。  相似文献   

3.
一种基于层次平台SoC设计中的软硬件划分方法   总被引:1,自引:1,他引:0       下载免费PDF全文
软硬件划分是SoC设计中的一个关键问题,合理的划分结果对最终生成的芯片在成本、性能、可扩展性等方面有重要影响。提出了在基于层次平台的SoC设计中,采用遗传算法进行软硬件划分的方法,并通过实验验证了其在SoC设计中的可行性。  相似文献   

4.
In this paper,a TPP(Task-based Parallelization and Pipelining)scheme is proposed to implement AVS(Audio Video coding Standard)video decoding algorithm on REMUS(REconfigurable MUltimedia System),which is a coarse-grained reconfigurable multimedia system.An AVS decoder has been implemented with the consideration of HW/SW optimized partitioning.Several parallel techniques,such as MB(Macro-Block)-based parallel and block-based parallel techniques,and several pipeline techniques,such as MB level pipeline and block level pipeline techniques are adopted by hardware implementation,for performance improvement of the AVS decoder.Also,most computation-intensive tasks in AVS video standards,such as MC(Motion Compensation),IP(Intra Prediction),IDCT(Inverse Discrete Cosine Transform),REC(REConstruct)and DF(Deblocking Filter),are performed in the two RPUs(Reconfigurable Processing Units),which are the major computing engines of REMUS.Owing to the proposed scheme,the decoder introduced here can support AVS JP(Jizhun Profile)1920×1088@39fps streams when exploiting a 200 MHz working frequency.  相似文献   

5.
The BOAR emulation system is targeted to hardware/software (HW/SW) codevelopment of advanced embedded DSP and telecom systems. The challenge of the BOAR system is efficient customization of programmable hardware, and dedicated partitioning routine to target applications and structures, which allows quite high overall system performance. The system allows multiple configurations for communication between processors and field programmable gate arrays (FPGAs) making the BOAR system an efficient tool for real-time HW/SW coverification. The reprogrammable hardware of the emulation tool is based on four Xilinx 4000-series devices, two Texas TMS320C50 signal processors and one Motorola MC68302 microcontroller. With current devices the BOAR hardware provides approximately 40–70 kgates of logic capacity in DSP applications. The emulation capacity can be expanded by connecting several similar boards in chain. The system has also a versatile internal reprogrammable test environment for test bench development, performance evaluations and design debugging. The logic development environment is based on the Synopsys synthesis tools and an automatic design management software, which performs resource mapping and performance-driven design partitioning between FPGAs. The emulation hardware is currently connected to logic and software development environments via an RS-232C bus. The BOAR emulation system has been found a very efficient platform for real-life prototyping of different types of DSP algorithms and systems, and validating correct functionality of a VHDL macro library.  相似文献   

6.
In recent technology nodes,reliability is increasingly considered a part of the standard design flow to be taken into account at all levels of embedded systems design.While traditional fault simulation techniques based on low-level models at gate-and register transfer-level offer high accuracy,they are too inefficient to properly cope with the complexity of modern embedded systems.Moreover,they do not allow for early exploration of design alternatives when a detailed model of the whole system is not yet ava...  相似文献   

7.
We present P-Ware, a framework for joint software and hardware modelling and synthesis of multiprocessor embedded systems. The framework consists of (1) component-based annotated transaction-level models for joint modelling of parallel software and multiprocessor hardware, and (2) exploration-driven methodology for joint software and hardware synthesis. The methodology has the advantage of combining real-time requirements of software with efficient optimization of hardware performance. We describe and apply the methodology to synthesize a scheduler of a H264 video encoder on the Cake multiprocessor. Moreover, experiments show that the framework is scalable while achieving rapid and efficient designs.  相似文献   

8.
9.
面向SoC的软硬件协同验证平台设计   总被引:1,自引:1,他引:0  
鲍华  洪一  郭二辉 《计算机工程》2009,35(8):271-273
针对SoC设计验证的实际需求,介绍一种面向SoC设计的软硬件协同验证平台。平台中软硬件模型分别在不同环境下运行,通过网络实现信息交互。硬件用硬件描述语言实现对系统事务级、RTL级的建模,软件用高级编程语言来编写,使用指令集仿真器完成对硬件的仿真。仿真过程使用不同的进程并行进行,应用进程间通信方式实现仿真器之间的信息交互。  相似文献   

10.
基于FPGA的实时图像预处理技术在汽车夜视系统中的应用   总被引:1,自引:0,他引:1  
针对红外图像的特点,提出了汽车夜视系统中图像增强的预处理方案。给出了基于FPGA的视频格式转换、快速中值滤波、自适应平台直方图双向均衡化的原理、实现方法及仿真结果。仿真结果表明本方案较好地满足了图像处理效果和处理速度的要求。  相似文献   

11.
Image processing requires high computational power, plus the ability to experiment with algorithms. Recently, reconfigurable hardware devices in the form of field programmable gate arrays (FPGAs) have been proposed as a way of obtaining high performance at an economical price. At present, however, users must program FPGAs at a very low level and have a detailed knowledge of the architecture of the device being used. They do not therefore facilitate easy development of, or experimentation with, image processing algorithms. To try to reconcile the dual requirements of high performance and ease of development, this paper reports on the design and realisation of an FPGA based image processing machine and its associated high level programming model. This abstract programming model allows an application developer to concentrate on the image processing algorithm in hand rather than on its hardware implementation. The abstract machine is based on a PC host system with a PCI-bus add-on card containing Xilinx XC6200 series FPGA(s). The machine's high level instruction set is based on the operators of image algebra. XC6200 series FPGA configurations have been developed to implement each high level instruction.  相似文献   

12.
在片上系统(SOC)设计过程中,软硬件的划分是一个非常关键的问题,它是一个多目标的优化问题,直接影响了系统的性能,为寻求这个问题的最优解需要处理许多问题。如何构造耗费函数来向导划分过程达到系统的要求和限制就是其中一个非常重要的问题,本文将重点介绍耗费函数的构造方法及其原理,以及我们在这方面的一些有益探索。  相似文献   

13.
K.  L.  B.  I. 《Computers & Electrical Engineering》2007,33(5-6):324-332
It is a challenge to implement large word length public-key algorithms on embedded systems. Examples are smartcards, RF-ID tags and mobile terminals. This paper presents a HW/SW co-design solution for RSA and Elliptic Curve Cryptography (ECC) over GF(p) on a 12 MHz 8-bit 8051 micro-controller. The hardware coprocessor has a Modular Arithmetic Logic Unit (MALU) of which the digit size (d) is variable. It can be adapted to the speed and bandwidth of the micro-controller to which it is connected. The HW/SW co-design space exploration is based on the GEZEL system-level design environment. It allows the designer to find the best performance-area combination for the digit size. As a case study of an FPGA prototyping, 160-bit ECC over GF(p) (ECC-160p) was implemented on Xilinx Virtex-II PRO (XC2VP30). The results show that one point multiplication takes only 130 ms including all communications between the 8051 and the coprocessor. The performance is 40 times faster than the most optimized SW implementation on a small CPU in literature. This is achieved by the HW/SW co-design exploration in order to find the optimized digit size of the MALU. On the other hand, the design of ECC-160p maintains a high level of flexibility by using coprocessor instructions. Our proposed architecture proves that HW/SW co-design provides a high performance close to ASIC solutions with a flexible feature of SW even on a small CPU.  相似文献   

14.
The paper presents an FPGA-based image and data processing core for future generation wireless capsule endoscopy (WCE). The main part of the presented core is an image compressor, for which a hardware implementation architecture requiring only two clock cycles for processing a single image pixel is proposed. Apart from the image compressor, the presented core includes a camera interface, a FIFO queue storing the compressed image bitstream, a forward error correction encoder protecting transmitted data against random and burst transmission errors, and a system controller supervising internal WCE operations. The presented core has been implemented in a single ultra low power, 65 nm FPGA chip. Power consumption of the designed FPGA core was determined to be comparable to other ASIC-based WCE systems.  相似文献   

15.
In this paper, we report a hardware/software (HW/SW) co-designed K-means clustering algorithm with high flexibility and high performance for machine learning, pattern recognition and multimedia applications. The contributions of this work can be attributed to two aspects. The first is the hardware architecture for nearest neighbor searching, which is used to overcome the main computational cost of a K-means clustering algorithm. The second aspect is the high flexibility for different applications which comes from not only the software but also the hardware. High flexibility with respect to the number of training data samples, the dimensionality of each sample vector, the number of clusters, and the target application, is one of the major shortcomings of dedicated hardware implementations for the K-means algorithm. In particular, the HW/SW K-means algorithm is extendable to embedded systems and mobile devices. We benchmark our multi-purpose K-means system against the application of handwritten digit recognition, face recognition and image segmentation to demonstrate its excellent performance, high flexibility, fast clustering speed, short recognition time, good recognition rate and versatile functionality.  相似文献   

16.
Implementation issues of neuro-fuzzy hardware: going toward HW/SW codesign   总被引:1,自引:0,他引:1  
This paper presents an annotated overview of existing hardware implementations of artificial neural and fuzzy systems and points out limitations, advantages, and drawbacks of analog, digital, pulse stream (spiking), and other implementation techniques. We analyze hardware performance parameters and tradeoffs, and the bottlenecks which are intrinsic in several implementation methodologies. The constraints posed by hardware technologies onto algorithms and performance are also described. The results of the analyses proposed lead to the use of hardware/software codesign, as a means of exploiting the best from both hardware and software techniques. Hardware/software codesign appears, at present, the most promising research area concerning the implementation of neuro-fuzzy systems (not including bioinspired systems, which are out of the scope of this work), as it allows the fast design of complex systems with the highest performance/cost ratio.  相似文献   

17.
异构片上系统(System-on-Chip,SoC)在同一芯片上集成了多种类型的处理器,在处理能力、尺寸、重量、功耗等各方面有较大优势,因此在很多领域得到了应用。具有动态部分可重构特性的SoC(Dynamic Partial Reconfigurability SoC,DPR-SoC)是异构SoC的一种重要类型,这种系统兼具了软件的灵活性和硬件的高效性。此类系统的设计通常涉及到软硬件协同问题,其中如何进行应用的软硬件划分是保证系统实时性的关键技术。DPR-SoC中的软硬件划分问题可归类为组合优化问题,问题目标是获得调度长度最短的调度方案,包括任务映射、排序和定时。混合整数线性规划(Mixed Integer Linear Programming,MILP)是求解组合优化问题的一种有效方法;然而,将具体问题建模为MILP模型是求解问题的关键一环,不同建模方式对问题求解时间有重要影响。已有针对DPR-SoC软硬件划分问题的MILP模型存在大量变量和约束方程,对问题求解时间产生了不利影响;此外,其假设条件过多,使得求解结果与实际应用不符。针对这些问题,提出了一种新颖的MILP模型,其极大地降低了模型复杂度,提高了求解结果与实际应用的符合度。将应用建模成DAG图,并使用整数线性规划求解工具对问题进行求解。大量求解结果表明,新的模型能够有效地降低模型复杂度,缩短求解时间;并且随着问题规模的增大,所提模型在求解时间上的优势表现得更加显著。  相似文献   

18.
This work describes a hardware/software co-design system development, named IEEE 1451 platform, to be used in process automation. This platform intends to make easier the implementation of IEEE standards 1451.0, 1451.1, 1451.2 and 1451.5. The hardware was built using NIOS II processor resources on Alteras Cyclone II FPGA. The software was done using Java technology and C/C++ for the processors programming. This HW/SW system implements the IEEE 1451 based on a control module and supervisory software for industrial automation.  相似文献   

19.
The Journal of Supercomputing - Two-dimensional convolution plays a fundamental role in different image processing applications. Image convolving with different kernel sizes enriches the overall...  相似文献   

20.
With the increased performance capabilities of desktop computers, networked computing has become a popular vehicle for using parallelism to solve a variety of computationally intense problems. However, node heterogeneity and high communication costs may limit performance unless the problem space is carefully partitioned across the network in a way that considers both the capabilities of the machines and the high network communication costs. We describe an advisory system that is designed to help the programmer, compiler or run-time environment choose the best decomposition strategy for partitioning specific data-parallel applications across a given collection of machines. The system includes provisions for assessing the capabilities of the participating machines and the network in light of the current workload. Given information about the problem space, the machine speeds and the network, the system provides a ranking of three standard partitioning methods. We test the validity of our system by comparing the observed relative performance with predicted relative performance of different data decompositions on a program with a variable number of floating point operations and a 5-point stencil communication pattern.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号