首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In this paper, we present an optimized design method for high-speed embedded image processing system using 32 bit floating-point Digital Signal Processor (DSP) and Complex Programmable Logic Device (CPLD). The DSP acts as the main processor of the system: executes digital image processing algorithms and operates other devices such as image sensor and CPLD. The CPLD is used to acquire images and achieve complex logic control of the whole system. Some key technologies are introduced to enhance the performance of our system. In particular, the use of DSP/BIOS tool to develop DSP applications makes our program run much more efficiently. As a result, this system can provide an excellent computing platform not only for executing complex image processing algorithms, but also for other digital signal processing or multi-channel data collection by choosing different sensors or Analog-to-Digital (A/D) converters.  相似文献   

2.
A system design for performing low-level image processing tasks in real time is presented. The design is based on large processor-per-pixel arrays implemented using integrated circuit technology. Two integrated circuit architectures are summarized: an associative parallel processor and a parallel processor employing DRAM cells. In both architectures, the layout pitch of one-bit-wide logic is matched to the pitch of memory cells to form high-density processing element arrays. The system design features an efficient control path implementation, providing high processing element array utilization without demanding complex controller hardware. Sequences of array instructions are generated by a host computer before processing begins, then stored in a simple controller. Once processing begins, the host computer initiates stored sequences to perform pixel-parallel operations. A programming framework implemented using the C++ programming language supports application development. A prototype system employs associative parallel processor devices, a controller, and the programming framework. Three sample applications, smoothing and segmentation, median filtering, and optical flow, establish the suitability of the system for real-time image processing  相似文献   

3.
LS MPP是西安微电子技术研究所自行研制出的面向航空嵌入式应用的大规模并行图象处理机。其宿主机为自行研制的32位浮点RISC芯片,图象协处理系统为自行研制的MPP协处理器。文章论述了LS MPP计算机的系统软件设计,包括汇编程序、监控程序和C编译程序。  相似文献   

4.
The author describes the parallelisation of three different versions of the CCITT H.261 encoder algorithm using a generalised parallel design methodology based upon pipelines of processor farms (PPFs). For each algorithm, a theoretical upper-bound scaling model was derived by analysing the execution time profile of the algorithm and its feedback structure. The performance predicted by the model was, in each case, in good agreement with that achieved by the corresponding practical implementation. Practical throughput scaling up to a factor of 11 was achieved, using PPFs containing up to 16 processors. The design examples illustrate the impact which feedback has on potential speedup for image coding algorithms, and the diagnostic role of the model in identifying those algorithm components which restrict scaling performance. It is believed that the techniques presented may be useful both in developing embedded image coders based upon multiple DSP devices, and for simulation work with large image sequences in application areas such as image coding for HDTV and SHDTV  相似文献   

5.
A 50-ns digital image signal processor (DISP)-an image/video application-specific VLSI chip-is discussed. This chip integrates 538 K transistors and dissipates 1.4 W at a 40-MHz clock. It is based on a 24-b fixed-point architecture with a five-stage pipeline. The DISP features a real-time processing capability realized by an enhanced parallel architecture, video-oriented data processing functions, and an instruction cycle time that is typically 35 ns, and 50 ns at worst. This 50-ns cycle time allows the DISP to execute mor than 60-million operations per second (MOPS). High-density 1.0-μm CMOS technology allows numerous on-chip features, including specified resources optimized for image processing. This allows a flexible hardware implementation of various algorithms for picture coding. Several circuit design techniques that are intended to attain a fast instruction cycle are reviewed, including distributed instruction decoding and a hierarchical clocking circuit. The LSI has been designed by the extensive use of a cell-based design method. The processor incorporates a sophisticated testing function compatible with a cell-based design environment  相似文献   

6.
A system chip targeting image and voice processing and recognition application domains is implemented as a representative of the potential of using programmable logic in system design. It features an embedded reconfigurable processor built by joining a configurable and extensible processor core and an SRAM-based embedded field-programmable gate array (FPGA). Application-specific bus-mapped coprocessors and flexible input/output peripherals and interfaces can also be added and dynamically modified by reconfiguring the embedded FPGA. The architecture of the system is discussed as well as the design flows for pre- and post-silicon design and customization. The silicon area required by the system is 20 mm/sup 2/ in a 0.18-/spl mu/m CMOS technology. The embedded FPGA accounts for about 40% of the system area.  相似文献   

7.
邻域图像处理机中的新型邻域功能流水线结构   总被引:5,自引:0,他引:5       下载免费PDF全文
苏光大 《电子学报》2000,28(8):120-123
本文介绍了邻域图像处理机原理,提出了邻域图像处理中新型的收缩型和级联型邻域功能流水线结构.这两种邻域功能流水线的流水线作业是以独立的图像处理算法为基础进行的,可以实时(甚至超实时)地完成多个独立的图像处理算法,高度体现了并行处理机数据并行、处理并行的原则,体现了多个算法的有机集成,因此特别适合于实际问题对综合算法的需求.这种邻域功能流水线结构不仅大大提高了图像处理的速度,而且增强了系统的灵活性.本文论述了收缩型和级联型邻域功能流水线的结构,给出了多个图像处理功能的组合.  相似文献   

8.
文章以嵌入式和数据采集技术为基础,研究设计并实现了基于ARM+FPGA体系架构面向高速实时数据采集应用的一种实用新型智能控制器。本文阐述了主处理器ARM最小系统、协处理器FPGA最小系统和ARM与FPGA通信接口等硬件系统技术的实现,以及Linux FPGA字符设备驱动程序开发、协处理器FPGA控制程序和主处理器ARM应用程序设计。智能控制器运用FPGA并行运算处理结构的优势,控制ADC进行高速数据采集。FPGA还可配置成软核处理器-Nios II嵌入式处理器,与ARM构成双核处理器系统。智能控制器通过ARM实现对FPGA的管理控制、实时数据采集和丰富外围接口的通信。  相似文献   

9.

The proposed system portrays the application space examination of a diverse cryptosystem processor with dynamic reconfiguration abilities. It is appropriate to a variety of signal processing application domains namely telecommunications, image processing, video coding and cryptographic processing. To differentiate between application spaces of the processor, the performance is correlated with cutting edge devices, taking ability to program, energy efficiency and computational potential as the important factors. In general the conventional method of computation is processed by means of Virtual Secure Circuit (VSC) on Advanced Encryption Standard (AES) and performance of the device Field Programmable Gate Array (FPGA) after implementation is analyzed in terms of delay and throughput. In the conventional method area overhead and power consumption are less where as the architecture lags in performance and throughput. It has been overcome through the fully parallel pipelined Architecture of the VSC on AES which outperforms the existing method in terms of performance and throughput. The energy efficiency and performance are considerably more important than processor that are used for general purpose, while still preserving a Convenient approach of programming that mainly bank on software oriented languages. The exploit of VSC based AES is to formulate the cryptographic processor held against Side Channel Attacks like attacks based on power supply and electromagnetic signals. Then the experimental result shows the promising outcomes when compared to previous methods.

  相似文献   

10.
为了提高LS MPP(Li-Shan MPP)系统的性能,并将其纳入新型嵌入式流处理器之中.以LS MPP体系结构为基础,根据嵌入式流处理器概念模型,针对图像处理应用的特征,提出了基于LS MPP的流处理技术.该技术通过定义新型流数据类型和核函数,构造了流处理模型,并分析了以LS MPP为基础提出的嵌入式流处理器概念模型上的流调度的实现方法,为全面提高LS MPP嵌入式流处理器的性能提供了系统软件支持.  相似文献   

11.
Image registration is an ubiquitous task occurring in countless image analysis applications. A dedicated implementation of image registration algorithms is the best approach to meet the intensive computation requirements of implementing image registration schemes in real time. This paper presents an efficient VLSI architecture for real-time implementation of image registration algorithms using an exhaustive search method. Normalized cross correlation function, mean square error, and blue screen technique algorithms are implemented for image registration. The architecture is based on a data flow design that allows sequential inputs but performs parallel processing. Based on the architecture, a programmable chip can be designed for image registration. Chips can be cascaded to achieve better performance and sizes of both the search and the reference image which can vary with time from a small to a very large value.  相似文献   

12.
基于S3C6410和WinCE6.0的嵌入式立体摄像系统   总被引:1,自引:1,他引:0  
针对目前立体对图像同步采集问题,提出一种基于S3C6410处理器和WinCE6.0系统的实时同步采集方案.首先,完成立体摄像模块硬件设计;其次,开发相应的OV3640摄像头驱动程序并定制WinCE6.0操作系统;最后,开发基于DirectShow技术的立体摄像系统应用程序.应用程序利用摄像数据流反馈实现左右格式立体对图像的同步采集,关联左右两路摄像数据流,基于视差约束实现双摄像采集帧同步.系统利用ARM 11处理器的数据处理能力和WinCE系统的可裁剪,提高了嵌入式立体摄像系统的可靠性与便携性.  相似文献   

13.
基于Zynq-7000高速图像采集与实时处理系统   总被引:1,自引:0,他引:1  
杨晓安  罗杰  苏豪  包文博 《电子科技》2014,27(7):151-154
Xilinx公司推出的Zynq-7000系列全可编程SoC采用了微处理器加可编程逻辑的结构,该项目在Zynq-7000的可编程逻辑部分搭建了图像采集系统,在双核处理器Cortex-A9部分搭建了用于处理图像的实时Linux操作系统,使用WiFi与外界进行交互。同时还介绍了使用该平台进行高性能图像处理的方法,为小型机器人的高性能图像处理应用提供了一种设计方案。  相似文献   

14.
基于FPGA和DSP的高速图像处理系统   总被引:2,自引:1,他引:1  
为了提高图像处理系统的高性能和低功耗,提出了一种基于FPGA和DSP协同作业的高速图像处理嵌入式系统,其中DSP为主处理器,负责图像处理,而FPGA为协处理器,负责系统的所有数字逻辑。整个系统中FPGA和DSP的工作之间形成流水,同时借助于单片双口RAM(CY7C025AV-15AI)完成两者的通信,比使用单片DSP建立的处理系统性能提高25%左右。该系统具有可重构性,方便其他的算法于该系统上实现。  相似文献   

15.
Low power and high performance are the two most important criteria for many signal-processing system designs, particularly in real-time multimedia applications. There have been many approaches to achieve these two design goals at many different implementation levels ranging from very-large-scale-integration fabrication technology to system design. We review the works that have been done at various levels and focus on the algorithm-based approaches for low-power and high-performance design of signal processing systems. We present the concept of multirate computing that originates from filterbank design, then show how to employ it along with the other algorithmic methods to develop low-power and high-performance signal processing systems. The proposed multirate design methodology is systematic and applicable to many problems. We demonstrate that multirate computing is a powerful tool at the algorithmic level that enables designers to achieve either significant power reduction or high throughput depending on their choice. Design examples on basic multimedia processing blocks such as filtering, source coding, and channel coding are given. A digital signal-processing engine that is an adaptive reconfigurable architecture is also derived from the common features of our approach. Such an architecture forms a new generation of high-performance embedded signal processor based on the adaptive computing model. The goal of this paper is to demonstrate the flexibility and effectiveness of algorithm-based approaches and to show that the multirate approach is an effective and systematic design methodology to achieve low-power and high throughput signal processing at the algorithmic and architectural level  相似文献   

16.
Since the number of processing cores in a General Purpose Processor (GPP) increases steadily, parallelization of algorithms is a well known topic in computer science. Algorithms have to be adapted to this new processor architecture to fully exploit the available processing power. This development equally affects the Software Defined Radio (SDR) technology because the GPP has become an important processor for SDR platforms. To make use of the entire processing power of a multi-core GPP and hence to avoid system inefficiency, this work provides an approach to parallelize C/C+ + code using OpenMP. This application programming interface provides a rapid way to parallelize code using compiler directives inserted at appropriate positions in the code. The processing load can be shared between all available cores. We use Matlab Simulink as a framework for a model-based design and evaluate the processing gain of embedded handwritten C-code blocks with OpenMP support.We will show that with OpenMP the core utilization is increased. Compared to a single-core GPP, we will present the increase of the processing speed depending on the number of cores. We will also highlight the limitations of code parallelization. In our results, we will show that a straightforward implementation of algorithms without multi-core consideration will cause an underutilized system.  相似文献   

17.
基于数字信号处理器(DSP)TMS320VC5416和复杂可编程逻辑器件(CPLD)的嵌入式车牌识别系统的硬件设计,利用视频处理芯片SAA7111作为视频A/D,在CPLD的控制下将采集到的图像数据写入帧存储器中,DSP对图像数据进行实时分析处理。采用"乒乓"存储结构,实现了图像数据的采集和处理的并行运行。识别结果通过串口传到上位机或者保存在E2PROM中,实现了车牌识别系统脱机、联机工作,在实时高速图像处理系统中有广泛的工程技术应用前景。  相似文献   

18.
Local processing, which is a dominant type of processing in image and video applications, requires a huge computational power to be performed in real-time. However, processing locality, in space and/or in time, allows to exploit data parallelism and data reusing. Although it is possible to exploit these properties to achieve high performance image and video processing in multi-core processors, it is necessary to develop suitable models and parallel algorithms, in particular for non-shared memory architectures. This paper proposes an efficient and simple model for local image and video processing on non-shared memory multi-core architectures. This model adopts a single program multiple data approach, where data is distributed, processed and reused in an optimal way, regarding the data size, the number of cores and the local memory capacity. The model was experimentally evaluated by developing video local processing algorithms and programming the Cell Broadband Engine multi-core processor, namely for advanced video motion estimation and in-loop deblocking filtering. Furthermore, based on these experiences it is also addressed the main challenges of vectorization, and the reduction of branch mispredictions and computational load imbalances. The limits and advantages of the regular and adaptive algorithms are also discussed. Experimental results show the adequacy of the proposed model to perform local video processing, and that real-time is achieved even to process the most demanding parts of advanced video coding. Full-pixel motion estimation is performed over high resolution video (720×576 pixels) at a rate of 30 frames per second, by considering large search areas and five reference frames.  相似文献   

19.
One major issue in designing image processors is to design a memory system that supports parallel access with a simple interconnection network. This paper presents an efficient memory allocation to minimize the number of memory modules and processing elements with a parallel access capability when multiple windows with arbitrary shapes are specified. This paper also presents an efficient search method based on regularity of window-type image processing. We give some practical examples including a stereo-matching processor for acquiring 3-D information, and an optical-flow processor for motion estimation. These examples show that the numbers of memory modules are reduced to 2.7% and 10%, respectively, in comparison with a basic approach. It is also shown that the search time is less than 1 ms for practical image sizes and window sizes.   相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号