共查询到20条相似文献,搜索用时 15 毫秒
1.
软硬件划分问题是软硬件协同设计的重要问题之一,它涉及到系统建模,划分算法和划分方案评价等问题,其中划分算法设计是关键点。以提高系统时间性能为目标,利用任务流图构造系统模型,在其上实现了基于优先权的评价函数,提出了搜索空间平滑技术与离散粒子群算法相结合的软硬件划分算法,并且解决了两者的融合问题,并能根据系统信息动态适应调整算法参数。实验结果表明,算法时间开销稳定,求解质量较高。 相似文献
2.
3.
软硬件划分是SoC设计中的一个关键问题,合理的划分结果对最终生成的芯片在成本、性能、可扩展性等方面有重要影响。提出了在基于层次平台的SoC设计中,采用遗传算法进行软硬件划分的方法,并通过实验验证了其在SoC设计中的可行性。 相似文献
4.
In this paper,a TPP(Task-based Parallelization and Pipelining)scheme is proposed to implement AVS(Audio Video coding Standard)video decoding algorithm on REMUS(REconfigurable MUltimedia System),which is a coarse-grained reconfigurable multimedia system.An AVS decoder has been implemented with the consideration of HW/SW optimized partitioning.Several parallel techniques,such as MB(Macro-Block)-based parallel and block-based parallel techniques,and several pipeline techniques,such as MB level pipeline and block level pipeline techniques are adopted by hardware implementation,for performance improvement of the AVS decoder.Also,most computation-intensive tasks in AVS video standards,such as MC(Motion Compensation),IP(Intra Prediction),IDCT(Inverse Discrete Cosine Transform),REC(REConstruct)and DF(Deblocking Filter),are performed in the two RPUs(Reconfigurable Processing Units),which are the major computing engines of REMUS.Owing to the proposed scheme,the decoder introduced here can support AVS JP(Jizhun Profile)1920×1088@39fps streams when exploiting a 200 MHz working frequency. 相似文献
5.
Jouni Isoaho Vesa Köppä Jarkko Oksala Pasi OjalaAuthor vitae 《Microprocessors and Microsystems》1997,20(10):2330-615
The BOAR emulation system is targeted to hardware/software (HW/SW) codevelopment of advanced embedded DSP and telecom systems. The challenge of the BOAR system is efficient customization of programmable hardware, and dedicated partitioning routine to target applications and structures, which allows quite high overall system performance. The system allows multiple configurations for communication between processors and field programmable gate arrays (FPGAs) making the BOAR system an efficient tool for real-time HW/SW coverification. The reprogrammable hardware of the emulation tool is based on four Xilinx 4000-series devices, two Texas TMS320C50 signal processors and one Motorola MC68302 microcontroller. With current devices the BOAR hardware provides approximately 40–70 kgates of logic capacity in DSP applications. The emulation capacity can be expanded by connecting several similar boards in chain. The system has also a versatile internal reprogrammable test environment for test bench development, performance evaluations and design debugging. The logic development environment is based on the Synopsys synthesis tools and an automatic design management software, which performs resource mapping and performance-driven design partitioning between FPGAs. The emulation hardware is currently connected to logic and software development environments via an RS-232C bus. The BOAR emulation system has been found a very efficient platform for real-life prototyping of different types of DSP algorithms and systems, and validating correct functionality of a VHDL macro library. 相似文献
6.
BARANOWSKI Rafal DI CARLO Stefano HATAMI Nadereh IMHOF Michael E. KOCHTE Michael A. PRINETTO Paolo WUNDERLICH Hans-Joachim ZOELLIN Christian G. 《中国科学:信息科学(英文版)》2011,(9):1784-1796
In recent technology nodes,reliability is increasingly considered a part of the standard design flow to be taken into account at all levels of embedded systems design.While traditional fault simulation techniques based on low-level models at gate-and register transfer-level offer high accuracy,they are too inefficient to properly cope with the complexity of modern embedded systems.Moreover,they do not allow for early exploration of design alternatives when a detailed model of the whole system is not yet ava... 相似文献
7.
We present P-Ware, a framework for joint software and hardware modelling and synthesis of multiprocessor embedded systems. The framework consists of (1) component-based annotated transaction-level models for joint modelling of parallel software and multiprocessor hardware, and (2) exploration-driven methodology for joint software and hardware synthesis. The methodology has the advantage of combining real-time requirements of software with efficient optimization of hardware performance. We describe and apply the methodology to synthesize a scheduler of a H264 video encoder on the Cake multiprocessor. Moreover, experiments show that the framework is scalable while achieving rapid and efficient designs. 相似文献
8.
9.
10.
11.
《Journal of Systems Architecture》1999,45(10):809-824
Image processing requires high computational power, plus the ability to experiment with algorithms. Recently, reconfigurable hardware devices in the form of field programmable gate arrays (FPGAs) have been proposed as a way of obtaining high performance at an economical price. At present, however, users must program FPGAs at a very low level and have a detailed knowledge of the architecture of the device being used. They do not therefore facilitate easy development of, or experimentation with, image processing algorithms. To try to reconcile the dual requirements of high performance and ease of development, this paper reports on the design and realisation of an FPGA based image processing machine and its associated high level programming model. This abstract programming model allows an application developer to concentrate on the image processing algorithm in hand rather than on its hardware implementation. The abstract machine is based on a PC host system with a PCI-bus add-on card containing Xilinx XC6200 series FPGA(s). The machine's high level instruction set is based on the operators of image algebra. XC6200 series FPGA configurations have been developed to implement each high level instruction. 相似文献
12.
在片上系统(SOC)设计过程中,软硬件的划分是一个非常关键的问题,它是一个多目标的优化问题,直接影响了系统的性能,为寻求这个问题的最优解需要处理许多问题。如何构造耗费函数来向导划分过程达到系统的要求和限制就是其中一个非常重要的问题,本文将重点介绍耗费函数的构造方法及其原理,以及我们在这方面的一些有益探索。 相似文献
13.
It is a challenge to implement large word length public-key algorithms on embedded systems. Examples are smartcards, RF-ID tags and mobile terminals. This paper presents a HW/SW co-design solution for RSA and Elliptic Curve Cryptography (ECC) over GF(p) on a 12 MHz 8-bit 8051 micro-controller. The hardware coprocessor has a Modular Arithmetic Logic Unit (MALU) of which the digit size (d) is variable. It can be adapted to the speed and bandwidth of the micro-controller to which it is connected. The HW/SW co-design space exploration is based on the GEZEL system-level design environment. It allows the designer to find the best performance-area combination for the digit size. As a case study of an FPGA prototyping, 160-bit ECC over GF(p) (ECC-160p) was implemented on Xilinx Virtex-II PRO (XC2VP30). The results show that one point multiplication takes only 130 ms including all communications between the 8051 and the coprocessor. The performance is 40 times faster than the most optimized SW implementation on a small CPU in literature. This is achieved by the HW/SW co-design exploration in order to find the optimized digit size of the MALU. On the other hand, the design of ECC-160p maintains a high level of flexibility by using coprocessor instructions. Our proposed architecture proves that HW/SW co-design provides a high performance close to ASIC solutions with a flexible feature of SW even on a small CPU. 相似文献
14.
Pawel TurczaAuthor Vitae Mariusz DuplagaAuthor Vitae 《Sensors and actuators. A, Physical》2011,172(2):552-560
The paper presents an FPGA-based image and data processing core for future generation wireless capsule endoscopy (WCE). The main part of the presented core is an image compressor, for which a hardware implementation architecture requiring only two clock cycles for processing a single image pixel is proposed. Apart from the image compressor, the presented core includes a camera interface, a FIFO queue storing the compressed image bitstream, a forward error correction encoder protecting transmitted data against random and burst transmission errors, and a system controller supervising internal WCE operations. The presented core has been implemented in a single ultra low power, 65 nm FPGA chip. Power consumption of the designed FPGA core was determined to be comparable to other ASIC-based WCE systems. 相似文献
15.
《Journal of Systems Architecture》2013,59(3):155-164
In this paper, we report a hardware/software (HW/SW) co-designed K-means clustering algorithm with high flexibility and high performance for machine learning, pattern recognition and multimedia applications. The contributions of this work can be attributed to two aspects. The first is the hardware architecture for nearest neighbor searching, which is used to overcome the main computational cost of a K-means clustering algorithm. The second aspect is the high flexibility for different applications which comes from not only the software but also the hardware. High flexibility with respect to the number of training data samples, the dimensionality of each sample vector, the number of clusters, and the target application, is one of the major shortcomings of dedicated hardware implementations for the K-means algorithm. In particular, the HW/SW K-means algorithm is extendable to embedded systems and mobile devices. We benchmark our multi-purpose K-means system against the application of handwritten digit recognition, face recognition and image segmentation to demonstrate its excellent performance, high flexibility, fast clustering speed, short recognition time, good recognition rate and versatile functionality. 相似文献
16.
This paper presents an annotated overview of existing hardware implementations of artificial neural and fuzzy systems and points out limitations, advantages, and drawbacks of analog, digital, pulse stream (spiking), and other implementation techniques. We analyze hardware performance parameters and tradeoffs, and the bottlenecks which are intrinsic in several implementation methodologies. The constraints posed by hardware technologies onto algorithms and performance are also described. The results of the analyses proposed lead to the use of hardware/software codesign, as a means of exploiting the best from both hardware and software techniques. Hardware/software codesign appears, at present, the most promising research area concerning the implementation of neuro-fuzzy systems (not including bioinspired systems, which are out of the scope of this work), as it allows the fast design of complex systems with the highest performance/cost ratio. 相似文献
17.
异构片上系统(System-on-Chip,SoC)在同一芯片上集成了多种类型的处理器,在处理能力、尺寸、重量、功耗等各方面有较大优势,因此在很多领域得到了应用。具有动态部分可重构特性的SoC(Dynamic Partial Reconfigurability SoC,DPR-SoC)是异构SoC的一种重要类型,这种系统兼具了软件的灵活性和硬件的高效性。此类系统的设计通常涉及到软硬件协同问题,其中如何进行应用的软硬件划分是保证系统实时性的关键技术。DPR-SoC中的软硬件划分问题可归类为组合优化问题,问题目标是获得调度长度最短的调度方案,包括任务映射、排序和定时。混合整数线性规划(Mixed Integer Linear Programming,MILP)是求解组合优化问题的一种有效方法;然而,将具体问题建模为MILP模型是求解问题的关键一环,不同建模方式对问题求解时间有重要影响。已有针对DPR-SoC软硬件划分问题的MILP模型存在大量变量和约束方程,对问题求解时间产生了不利影响;此外,其假设条件过多,使得求解结果与实际应用不符。针对这些问题,提出了一种新颖的MILP模型,其极大地降低了模型复杂度,提高了求解结果与实际应用的符合度。将应用建模成DAG图,并使用整数线性规划求解工具对问题进行求解。大量求解结果表明,新的模型能够有效地降低模型复杂度,缩短求解时间;并且随着问题规模的增大,所提模型在求解时间上的优势表现得更加显著。 相似文献
18.
E.A. BatistaAuthor Vitae L. GondaAuthor Vitae A.C.R. da SilvaAuthor Vitae 《Computer Standards & Interfaces》2012,34(1):1-13
This work describes a hardware/software co-design system development, named IEEE 1451 platform, to be used in process automation. This platform intends to make easier the implementation of IEEE standards 1451.0, 1451.1, 1451.2 and 1451.5. The hardware was built using NIOS II processor resources on Alteras Cyclone II FPGA. The software was done using Java technology and C/C++ for the processors programming. This HW/SW system implements the IEEE 1451 based on a control module and supervisory software for industrial automation. 相似文献
19.
Dehghani Abbas Kavari Ali Kalbasi Mahdi RahimiZadeh Keyvan 《The Journal of supercomputing》2022,78(2):2597-2615
The Journal of Supercomputing - Two-dimensional convolution plays a fundamental role in different image processing applications. Image convolving with different kernel sizes enriches the overall... 相似文献
20.
With the increased performance capabilities of desktop computers, networked computing has become a popular vehicle for using parallelism to solve a variety of computationally intense problems. However, node heterogeneity and high communication costs may limit performance unless the problem space is carefully partitioned across the network in a way that considers both the capabilities of the machines and the high network communication costs. We describe an advisory system that is designed to help the programmer, compiler or run-time environment choose the best decomposition strategy for partitioning specific data-parallel applications across a given collection of machines. The system includes provisions for assessing the capabilities of the participating machines and the network in light of the current workload. Given information about the problem space, the machine speeds and the network, the system provides a ranking of three standard partitioning methods. We test the validity of our system by comparing the observed relative performance with predicted relative performance of different data decompositions on a program with a variable number of floating point operations and a 5-point stencil communication pattern. 相似文献