首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Wolinski  C. Gokhale  M. McCave  K. 《Micro, IEEE》2002,22(5):56-68
We propose a polymorphous computing fabric-based system (FBS) well suited to digital signal processing (DSP) and image processing applications. We have implemented our design on a system on a programmable chip (SoPC). The fabric's highly parameterized cellular architecture enables customized synthesis of fabric instances to achieve high performance for different classes of applications. The system's innovative global memory provides a host control processor with random access to all the variables and instructions on the fabric. The fabric supports several computing models including multiple instruction, multiple data (MIMD); single program, multiple data (SPMD); and systolic flow and permits dynamic reconfiguration of communication patterns. To illustrate the capabilities of our approach, we present two fabric instances with implementations of representative applications including a k-means clustering algorithm, a bank of finite impulse response (FIR) filters, an N-tap FIR filter (N is the number of taps of the filter), and a vector-by-matrix multiplication. Each fabric instance holds 52 cells on the Altera Excalibur ARM embedded processor system  相似文献   

2.
介绍了利用NIOSⅡ软核处理器设计嵌入式测试系统的两类系统架构,详细讲述了基于NIOSⅡ软核处理器的嵌入式测试系统软硬件设计方法;最后结合EP2C8Q-208C8型FPGA芯片,利用Verilog语言描述A/D芯片的工作时序逻辑,利用NIOSⅡ软核处理器设计串口处理单元,将A/D采集的数据通过串口发送到计算机显示。实践表明,利用NIOS II软核处理器设计嵌入式测试系统,具有开发周期短,系统集成度高,功能灵活多样等特点,与传统利用单片机设计嵌入式测试系统相比,具有时钟频率高、运行速度快、调试方便等特点,是一种值得推广的嵌入式测试系统设计方法。  相似文献   

3.
Motion estimation in videos is a computationally intensive process. A popular strategy for dealing with such a high processing load is to accelerate algorithms with dedicated hardware such as graphic processor units (GPU), field programmable gate arrays (FPGA), and digital signal processors (DSP). Previous approaches addressed the problem using accelerators together with a general purpose processor, such as acorn RISC machines (ARM). In this work, we present a co-processing architecture using FPGA and DSP. A portable platform for motion estimation based on sparse feature point detection and tracking is developed for real-time embedded systems and smart video sensors applications. A Harris corner detection IP core is designed with a customized fine grain pipeline on a Virtex-4 FPGA. The detected feature points are then tracked using the Lucas–Kanade algorithm in a DSP that acts as a co-processor for the FPGA. The hybrid system offers a throughput of 160 frames per second (fps) for VGA image resolution. We have also tested the benefits of our proposed solution (FPGA + DSP) in comparison with two other traditional architectures and co-processing strategies: hybrid ARM + DSP and DSP only. The proposed FPGA + DSP system offers a speedup of about 20 times and 3 times over ARM + DSP and DSP only configurations, respectively. A comparison of the Harris feature detection algorithm performance between different embedded processors (DSP, ARM, and FPGA) reveals that the DSP offers the best performance when scaling up from QVGA to VGA resolutions.  相似文献   

4.
针对基于ARM9系列的处理器内核的WiMAX终端SoC,构建了一个软硬件协同仿真环境。连接ARM926ejs处理器内核的仿真模型和SoC的RTL模型,利用仿真模型支持的ARM指令集的特性运行WiMAX终端SoC中的MAC层firmware程序,实现了SoC软硬件的同步调试,有效的提高了系统集成和验证的效率,有效地缩短了系统开发时间。  相似文献   

5.
介绍一种基于可编程片上系统和处理器软核技术的SCSI应用系统的设计方案,其应用系统控制核心选用了基于NIOS软核的微处理器,将SCSI控制单元的外部主机处理器,DMA数据通道控制和数据缓存控制逻辑等集成在1片FPGA上实现,在能充分利用逻辑器件资源的同时,使得设计更紧凑、灵活、高速和可靠。  相似文献   

6.
在分析了SCSI应用系统结构和常规设计方法的基础上,提出了一种基于可编程片上系统和处理器软核技术的SCSI应用系统的设计方案,其中应用系统控制核心选用了基于NIOS软核的微处理器,将SCSI控制单元的外部主机处理器,DMA数据通道控制和数据缓存控制逻辑等集成在1片FPGA上实现,在能充分利用逻辑器件资源的同时,使得设计更紧凑、灵活、高速和可靠,  相似文献   

7.
韩旭  于小亿 《微型机与应用》2012,31(6):57-59,65
在分析研究红外线发射器和接收器原理的基础上,以可编程片上系统PSoC芯片为核心部件,利用PSoC集成开发环境Creator内嵌的固件元件,进行了红外线通信测控系统的软件和硬件设计。PSoC是一款以ARM和CPLD两大功能部件组成的混合处理器。在Creator环境下,固件元件类似于面向对象程序设计的控件,使硬件设计软件化,与硬件相关的源程序编译器自动生成。采用PSoC设计的红外线发送与接收电路具有硬件设计简单、软件设计图形化、可以充分利用PSoC提供的固件元件的优点。PSoC非常适合在通信和测控中应用。  相似文献   

8.
Li  Min  Yang  Chao  Sun  Qiao  Ma  Wen-Jing  Cao  Wen-Long  Ao  Yu-Long 《计算机科学技术学报》2019,34(1):77-93

With the advent of the big data era, the amounts of sampling data and the dimensions of data features are rapidly growing. It is highly desired to enable fast and efficient clustering of unlabeled samples based on feature similarities. As a fundamental primitive for data clustering, the k-means operation is receiving increasingly more attentions today. To achieve high performance k-means computations on modern multi-core/many-core systems, we propose a matrix-based fused framework that can achieve high performance by conducting computations on a distance matrix and at the same time can improve the memory reuse through the fusion of the distance-matrix computation and the nearest centroids reduction. We implement and optimize the parallel k-means algorithm on the SW26010 many-core processor, which is the major horsepower of Sunway TaihuLight. In particular, we design a task mapping strategy for load-balanced task distribution, a data sharing scheme to reduce the memory footprint and a register blocking strategy to increase the data locality. Optimization techniques such as instruction reordering and double buffering are further applied to improve the sustained performance. Discussions on block-size tuning and performance modeling are also presented. We show by experiments on both randomly generated and real-world datasets that our parallel implementation of k-means on SW26010 can sustain a double-precision performance of over 348.1 Gflops, which is 46.9% of the peak performance and 84% of the theoretical performance upper bound on a single core group, and can achieve a nearly ideal scalability to the whole SW26010 processor of four core groups. Performance comparisons with the previous state-of-the-art on both CPU and GPU are also provided to show the superiority of our optimized k-means kernel.

  相似文献   

9.
给出了嵌入式ARM-Linux平台环境下的智能搬运分拣机器人的设计方案。其硬件平台是以ARM9架构处理器S3C2440A作为系统控制核心,并根据硬件资源对kernel、根文件系统以及应用程序进行了修订和编译,为机器人构建了Linux系统软件平台。同时还采用了一种改进型SIFT算法进行货物识别。凭借着强大硬件性能、开放的软件系统以及高效的匹配算法,使本机器人动作稳定性好、拓展性强以及识别精准度高等特点,能够工作在多种作业环境下。  相似文献   

10.
针对分布植入式压电机敏结构振动主动控制技术需求,提出一种新型基于嵌入式架构的多通道振动响应控制器;该系统以嵌入式处理器(ARM)和数字信号处理器(DSP)为双处理器核心,ARM处理器上运行实时操作系统μC/OS-II,并提供人机接口单元和通信等功能,DSP处理器主要负责数据采集、算法运算和处理结果输出,整个系统充分结合了ARM处理器强大的中断处理能力和DSP处理器高效快速的数据处理能力;详细阐述系统总体设计思想、系统软硬件设计方案、系统构成与核心部件、功能指标和开发过程,以及实验测试设置与结果验证;设计开发与测试分析表明,该控制器性能良好且功能丰富,能够满足实际研究工作的需要。  相似文献   

11.
This paper presents a novel pipelined architecture for competitive learning (CL). The architecture is implemented by the field programmable gate array (FPGA). It is used as a hardware accelerator in a system on programmable chip (SOPC) for reducing the computation time. In the architecture, a novel codeword swapping scheme is adopted so that neuron competitions for different training vectors can be operated concurrently. The neuron updating process is based on a hardware divider with simple table lookup operations. The divider performs finite precision calculations for area cost reduction at the expense of slight degradation in training performance. The CPU time of the NIOS processor executing the CL training with the proposed architecture as an accelerator is measured. Experimental results show that the NIOS processor with the proposed architecture as an accelerator can achieve up to a speedup of 254 over its software counterpart running on a general purpose processor Pentium IV without hardware support.  相似文献   

12.
互相关器是构成两相流相关流量测量系统的核心装置。文章介绍了互相关流量测量系统的工作原理和互相关算法以及一种基于NIOSⅡ嵌入式软核处理器的互相关器的设计,详细阐述了互相关器的总体结构、硬件和软件的设计。Matlab仿真实验验证了该相关器可实现随机噪声信号的互相关运算。与传统的互相关器相比,该相关器具有实时性好、硬件构成简单、可靠性高的特点,能够满足互相关流量在线测量的要求。  相似文献   

13.
We present an issue of the dynamically reconfigurable hardware-software architecture which allows for partitioning networking functions on a SoC (System on Chip) platform. We address this issue as a partition problem of implementing network protocol functions into dynamically reconfigurable hardware and software modules. Such a partitioning technique can improve the co-design productivity of hardware and software modules. Practically, the proposed partitioning technique, which is called the ITC (Inter-Task Communication) technique incorporating the RT-IJC2 (Real-Time Inter-Job Communication Channel), makes it possible to resolve the issue of partitioning networking functions into hardware and software modules on the SoC platform. Additionally, the proposed partitioning technique can support the modularity and reuse of complex network protocol functions, enabling a higher level of abstraction of future network protocol specifications onto the SoC platform. Especially, the RT-IJC2 allows for more complex data transfers between hardware and software tasks as well as provides real-time data processing simultaneously for given application-specific real-time requirements. We conduct a variety of experiments to illustrate the application and efficiency of the proposed technique after implementing it on a commercial SoC platform based on the Altera’s Excalibur including the ARM922T core and up to 1 million gates of programmable logic.  相似文献   

14.
潘青松  张怡  杨宗明  秦剑秀 《计算机科学》2017,44(Z11):530-533, 556
以Zynq芯片为基础,采用软硬件协同设计的方法设计并实现整个系统。Zynq芯片内部采用ARM+FPGA的异构架构,既具备ARM处理器的灵活性,又拥有FPGA并行处理的能力。本系统的设计充分发挥了Zynq芯片的优势,在软硬件划分上, 通过ARM处理器来实现图像的采集;图像角点及边缘检测用FPGA来完成,即通过硬件加速提升系统的整体性能。ARM处理器与FPGA通过AXI4总线进行数据交互,在Zynq上实现集图像采集、图像特征提取、图像显示为一体的片上系统。最终系统测试结果表明,采用硬件加速实现图像特征提取的相关算法比在ARM处理器软件上实现的算法的速度提高了6~8倍。  相似文献   

15.
针对目前网络监控中心不能实时和有效地对光纤收发器的状态进行远程监控的问题,提出了一种新型带内网管功能的智能光纤收发器的设计方案,详细介绍了该收发器的硬件组成及软件设计。该收发器采用基于ARM的LPC2210嵌入式处理器和IP113S光电介质转换芯片为核心,通过移植嵌入式多任务操作系统μCLinux实现新型智能光纤收发器远程网管的功能。测试结果表明,该收发器性能稳定、网管功能强,满足了实时性要求,适合电信级业务应用。  相似文献   

16.
针对农村电网电能质量问题.设计以DSP和ARM为核心的嵌入式系统来检测农村电网的电能质量。在嵌入式Linux操作系统的软件平台上,通过QT/Embedded实现良好的人机交互界面,以TMS320VC5416数据处理器、Samsung公司的S3 C2410为硬件核心,以SPI实现DSP与ARM的串口通信,并利用希尔伯特·黄变换(HHT)算法对农村电能质量进行检测。大大提高了对农村电网的用电质量监测能力,同时也为农村电网的重建和改造提供了重要依据。  相似文献   

17.
基于系统级FPGA/CPLD的SoPC嵌入式开发研究   总被引:1,自引:0,他引:1  
针对基于系统级FPGA/CPLD的SoPC嵌入式设计特点,介绍采用SoPC Builder设计工具有选择地将处理器、存储器、I/O等系统设计所需的IP组件集成到PLD器件上,也可以通过自定义用户逻辑集成到PLD器件上的开发方法,构建高效SoC。文中分析了嵌入式处理器Nios软核的特性,并给出了基于Nios内核的SoPC软硬件开发流程和白定义用户逻辑的软硬件设计过程。  相似文献   

18.
Warp processors are a novel architecture capable of autonomously optimizing an executing application by dynamically re-implementing critical kernels within the software as custom hardware circuits in an on-chip FPGA. Previous research on warp processing focused on low-power embedded systems, incorporating a low-end ARM processor as the main software execution resource. We provide a thorough analysis of the scalability of warp processing by evaluating several possible warp processor implementations, from low-power to high-performance, and by evaluating the potential for parallel execution of the partitioned software and hardware. We further demonstrate that even considering a high-performance 1 GHz embedded processor, warp processing provides the equivalent performance of a 2.4 GHz processor. By further enabling parallel execution between the processes and FPGA, the parallel warp processor execution provides the equivalent performance of a 3.2 GHz processor.  相似文献   

19.
针对当前基于ARM和DSP的嵌入式图像处理系统前端采集速度慢和图像处理算法不易加速的缺点,设计了一种基于HDMI接口的全高清(分辨率1920×1080)实时视频采集与图像处理系统;采用500万像素级别CMOS摄像头作为前端数据源,主芯片内部采用ARM+FPGA的异构架构,兼备FPGA的并行处理能力与ARM处理器任务调度功能;基于AXI协议设计了自定义数据存储传输的IP核,实现了处理速度与带宽最大化;利用HLS工具将图像预处理算法快速打包生成IP核,在FPGA中实现图像算法的硬件加速,完成图像处理系统平台原型机的设计;与传统的PC机和相机的机器视觉平台相比,该系统运行平均耗时在10 ms以内,实时检测效果令人满意,有效解决了低功耗与高数据带宽和处理速度之间的矛盾,为后端结果分析和边缘加速提供了良好支持。  相似文献   

20.
The article demonstrates the usefulness of heterogeneous System on Chip (SoC) devices in smart cameras used in intelligent transportation systems (ITS). In a compact, energy efficient system the following exemplary algorithms were implemented: vehicle queue length estimation, vehicle detection, vehicle counting and speed estimation (using multiple virtual detection lines), as well as vehicle type (local binary features and SVM classifier) and colour (k-means classifier and YCbCr colourspace analysis) recognition. The solution exploits the hardware–software architecture, i.e. the combination of reconfigurable resources and the efficient ARM processor. Most of the modules were implemented in hardware, using Verilog HDL, taking full advantage of the possible parallelization and pipeline, which allowed to obtain real-time image processing. The ARM processor is responsible for executing some parts of the algorithm, i.e. high-level image processing and analysis, as well as for communication with the external systems (e.g. traffic lights controllers). The demonstrated results indicate that modern SoC systems are a very interesting platform for advanced ITS systems and other advanced embedded image processing, analysis and recognition applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号