首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
《Real》1998,4(5):361-376
A modular multiple platform design environment is proposed for the real-time implementation of image analysis systems suited to tasks such as visual inspection and other similar applications involving the analysis of two-dimensional (2D) shapes. The design strategies proposed are particularly suited to the implementation of high performance image classifiers based on the multiple expert paradigm. The unified configuration includes an integrated environment incorporating different software and hardware platforms to maximize the overall efficiency of the complete image processing or recognition task. One of the major application areas of such systems is the recognition of handwritten characters. In recent years, a new generation of handwritten recognition systems has been explored which is based on a multiple expert paradigm. The decision combination required for these configurations is a specialized data fusion process. It has been found that these multiple expert decision combination configurations can easily outperform most of the individual experts working on their own, but successful integration of decisions taken by multiple experts depends not only on the access to different individual algorithms implemented and applied independently, but also on optimized implementations of these individual algorithms. It has been demonstrated that different image processing and recognition algorithms cannot be completely optimized on a single implementation platform. On the contrary, it has been found that different processes can be implemented with maximum efficiency on different platforms. In this paper, comparative performance analysis of the same algorithms on different platforms has been carried out to select the optimum implementation platform for different algorithms in terms of complexity and execution time constraints with the aim of implementing various multiple expert decision combination configurations, and very encouraging results have been achieved. This reasoning has been further explored to build a generalized, flexible and modular design environment to facilitate the incorporation of pipelined multiple platform implementations suitable for a range of image processing, image analysis and computer vision applications.  相似文献   

2.
Parallel processing is one approach to achieving the large computational processing capabilities required by many real-time computing tasks. One of the problems that must be addressed in the use of reconfigurable multiprocessor systems is matching the architecture configuration to the algorithms to be executed. This paper presents a conceptual model that explores the potential of artificial intelligence tools, specifically expert systems, to design an Intelligent Operating System for multiprocessor systems. The target task is the implementation of image understanding systems on multiprocessor architectures. PASM is used as an example multiprocessor. The Intelligent Operating System concepts developed here could also be used to address other problems requiring real-time processing. An example image understanding task is presented to illustrate the concept of intelligent scheduling by the Intelligent Operating System. Also considered is the use of the conceptual model when developing an image understanding system in order to test different strategies for choosing algorithms, imposing execution order constraints, and integrating results from various algorithms.  相似文献   

3.
Image segmentation is an important process that facilitates image analysis such as in object detection. Because of its importance, many different algorithms were proposed in the last decade to enhance image segmentation techniques. Clustering algorithms are among the most popular in image segmentation. The proposed algorithms differ in their accuracy and computational efficiency. This paper studies the most famous and new clustering algorithms and provides an analysis on their feasibility for parallel implementation. We have studied four algorithms which are: fuzzy C-mean, type-2 fuzzy C-mean, interval type-2 fuzzy C-mean, and modified interval type-2 fuzzy C-mean. We have implemented them in a sequential (CPU only) and a parallel hybrid CPU–GPU version. Speedup gains of 6\(\times \) to 20\(\times \) were achieved in the parallel implementation over the sequential implementation. We detail in this paper our discoveries on the portions of the algorithms that are highly parallel so as to help the image processing community, especially if these algorithms are to be used in real-time processing where efficient computation is critical.  相似文献   

4.
This paper presents a real-time image acquisition system with an improved image quality assessment module to acquire high-quality near infrared (NIR) images. Thermal imaging plays a vital role in a wide range of medical and military applications. The demand for high-throughput image acquisition and image processing has continuously increased especially for critical medical and military purposes where executions under real-time constraints are required. This work implements an NIR image quality assessment module, which utilizes improved two-dimensional entropy and mask-based edge detection algorithms. The effectiveness of the proposed image quality assessment algorithms is demonstrated through the implementation of a complete finger-vein biometric system. The proposed model is implemented as an embedded system on a field programmable gate array prototyping platform. By including the image quality assessment module, the proposed system is able to achieve a recognition accuracy of 0.87 % equal error rate, and can handle real-time processing at 15 frames/s (live video rate). This is achieved through hardware acceleration of the proposed image quality assessment algorithms via a novel streaming architecture.  相似文献   

5.
任务并行是并行程序设计的基础设计模式.但由于算法本身的复杂性及目标平台的特殊性,设计实现高效率的任务并行程序对程序员来说往往充满挑战.基于新兴的SW26010众核CPU,本文提出支持任务嵌套并行模式的通用运行时框架SWAN.SWAN对任务并行程序的实现提供了高层次的抽象,使程序员能够专注于算法逻辑本身而提高开发效率.在性能方面,SWAN框架对诸多共享资源进行了细粒度的划分,从而有效地避免了众多线程间对共享资源的高强度争用.本文还充分利用平台的高速访存机制,高速可控缓存和原子操作等特性,对SWAN框架的核心数据结构进行优化设计以降低其本身的性能开销.另外,SWAN还具备动态负载均衡能力使得各个处理器核心的资源得以充分利用.本文基于SWAN框架在目标平台上实现了若干典型的具有递归特性的嵌套并行算法,包括N-皇后问题,二叉树遍历,快速排序和凸包求解.实验表明,这些通过使用SWAN框架得以并行化的算法相对其串行版本取得了4.5至32倍的加速,充分说明了SWAN框架具有较高的实用性及性能.  相似文献   

6.
Adam is a high-level language for parallel processing. It is intended for programming resource scheduling applications, in particular supervisory packages for run-time scheduling of multiprocessing systems. An important design goal was to provide support for implementation of Ada and its run-time environment. Adam has been used to implement Ada task supervision and also as a high-level target language for compilation of Ada tasking. Adam provides facilities corresponding to the Ada sequential constructs (including subprograms, packages, exceptions, generics). In addition, it provides specialized module constructs for implementation of packages that may be shared between parallel processes, and new predefined types for scheduling. The parallel processing constructs of Adam are more primitive than Ada tasking. Strong restrictions are enforced on the ways in which parallel processes can interact. A compiler for Adam has been implemented in MacLisp on DEC PDP-10 computers. Runtime support packages in Adam for scheduling (on a single CPU) and I/O are also provided. The compiler contains a library manipulation facility for separate compilation. The Adam compiler has been used to build an Ada compiler for most of the July 1980 Ada, including task types and rendezvous constructs. This was achieved by implementing the translation of Ada tasking into Adam parallel processing as a preprocessor to the Adam compiler. This present Ada compiler, which has been operational since December 1980, uses a procedure call implementation of tasking. It can be easily modified to other implementations. Compilation of Ada tasking into a high-level target language such as Adam facilitates studying questions of correctness and efficiency of various compilation algorithms, and code optimizations specific to tasking, e.g. elimination of unnecessary threads of control. This paper gives an overview of Adam and examples of its use. Emphasis is placed on the differences from Ada. Experience using Adam to build the experimental Ada system is evaluated. Design of a run-time supervisor in Adam is discussed in detail.  相似文献   

7.
The paper focuses on the problem of partitioning and mapping parallel programs onto heterogeneous embedded multiprocessor architectures for real-time applications. Such applications present unique constraints and challenges. In addition to heterogeneity, the proposed partitioning and mapping algorithms satisfy memory, task throughput, task placement, intertask communication bandwidth, and co-location constraints. They do so for architectures that utilize circuit-switched (rather than packet-switched) interprocessor communication and optimize latency and throughput in addition to load-balancing. Finally, these mapping algorithms make use of knowledge of the local scheduling discipline to accommodate real-time scheduling constraints. Our focus is on unstructured parallel programs that fall into one of two classes: (i) the class of computations characteristic of control applications in a real-time environment where tasks execute concurrently, periodically exchanging information, and (ii) pipelined computation graphs found in sensor data processing applications. The algorithms are implemented in a set of tools that operate with commercial CASE tools at one end, and present an interface to multiprocessor simulators at the other end. Collectively, the algorithms form a significant component of an interactive design environment for the development and mapping of real-time embedded parallel programs. The paper describes the algorithms, the encapsulating toolset, and presents an example of their application to an existing embedded application—an Autonomous Underwater Vehicle application.  相似文献   

8.
GPGPU加速器是当前提高图像处理算法性能的主流加速平台,但是,在GPGPU平台上,同一个程序充分利用硬件体系结构特征和软件特征的优化版本与简单实现版本在性能上会有数量级的差异。GPGPU加速器具有多维多层的大量执行线程和层次化存储体系结构,后者的不同层次具有不同的容量、带宽、延迟和访问权限。同时,图像处理应用程序具有复杂的计算操作、边界处理规则和数据访问特性。因此,任务的并发执行模式、线程的组织方式和并发任务到设备的映射不仅影响到程序的并发度、调度、通信和同步等特性,而且也会影响到访存的带宽、延迟等。因此,GPGPU平台上的程序优化是一个困难、复杂且效率较低的过程。本文提出基于语言扩展的领域编程模型:ParaC。ParaC编程环境利用高层语言扩展描述的程序语义信息,自动分析获取应用程序的操作信息、并发任务间的数据重用信息和访存信息等程序特征,同时结合硬件平台特征,利用基于领域先验知识驱动的编译优化模型自动生成GPGPU平台上的优化代码,最后,利用源源变换编译器生成标准OpenCL程序。本文在测试用例上的实验结果表明,ParaC在GPGPU平台上自动生成的优化版本相对于手工优化版本的加速比最高达到3.22倍,但代码行数只是后者的1.2%到39.68%。  相似文献   

9.
孙乔  黎雷生  赵海涛  赵慧  吴长茂 《软件学报》2021,32(8):2352-2364
任务并行是并行程序设计的基础设计模式.但由于算法本身的复杂性及目标平台的特殊性,设计实现高效率的任务并行程序对程序员来说往往充满挑战.基于新兴的SW26010众核CPU,提出了支持任务嵌套并行模式的通用运行时框架SWAN.SWAN对任务并行程序的实现提供了高层次的抽象,使程序员能够专注于算法逻辑本身而提高开发效率.在性...  相似文献   

10.
针对海量遥感影像快速分类的应用需求,提出一种基于K-means算法的遥感影像并行分类方法.该方法结合CPU下进程级与线程级模式的并行特征,设计融合进程级与线程级并行的两阶段数据粒度划分方法和任务调度方法,在保证精度的基础上实现并行加速.利用大数据量的多尺度遥感影像进行实验,结果表明:所提并行方法可大大减少遥感影像的分类时间,取得了良好的加速比(13.83),并可达到负载均衡,从而解决了大区域遥感影像快速分类的问题.  相似文献   

11.
为了能够实时地采集、处理、显示视频,设计并实现了一种基于双PowerPC硬核架构的实时视频处理平台;用硬件实现视频的预处理算法,并以用户IP核的形式添加到硬件系统中,上层的视频处理软件程序则直接从存储器中调用预处理后的图像数据;重点介绍了在FPGA上构建双PowerPC硬核架构的硬件系统;采用乒乓控制算法缓存一行图像数据;用DMA的方式将图像数据保存在存储器中;以边缘检测作为视频预处理算法的一个实例,在平台上实现,实验结果表明,用本平台实现仅需40ms;本平台能够实时处理视频,具有较高的实用价值。  相似文献   

12.
The Hessian matrix-based edge detection algorithm of Dr. Carsten Steger has the advantages of high accuracy and versatility. However, this algorithm has a complex and time-consuming computation process. Large-scale Gaussian convolution also employs a large number of multipliers when implemented on a field programmable gate array (FPGA). To address these problems, an FPGA implementation for Steger’s edge detection algorithm is proposed. This implementation employs pipeline and parallel architectures at both task and data levels for data stream processing. The original kernels of Gaussian convolution are simplified with box-filter to convert the multiplication operation in the convolution into addition, subtraction, or shift operations with the concept of integral image, thereby minimizing the multiplier resources. The proposed FPGA implementation demonstrates a favorable accuracy and anti-noise capability when dealing with different degrees of blur and noise in an image. Therefore, the FPGA implementation can satisfy real-time edge detection requirements.  相似文献   

13.
RM及其扩展可调度性判定算法性能分析   总被引:4,自引:0,他引:4  
可调度性判定是实时调度算法的关键问题·单调速率算法RM(ratemonotonic)及其扩展是应用广泛的实时调度算法,大量文献讨论了实时任务在这些算法下的可调度性判定,给出了相应的判定算法·但迄今为止,对这些判定算法的性能分析都是理论上的定性分析或者只是少数几种判定算法之间的简单比较,这不利于实时系统的开发·归纳了RM及其扩展的可调度性判定算法,通过测试平台,系统地测试和分析了各算法的性能和适用场合,讨论了各种条件和实现方式对算法性能和可调度性的影响·  相似文献   

14.
PASM is a proposed large-scale distributed/parallel processing system which can be partitioned into independent SIMD/MIMD machines of various sizes. One design problem for systems such as PASM is task scheduling. The use of multiple FIFO queues for nonpreemptive task scheduling is described. Four multiple-queue scheduling algorithms with different placement policies are presented and applied to the PASM parallel processing system. Simulation of a queueing network model is used to compare the performance of the algorithms. Their performance is also considered in the case where there are faulty control units and processors. The multiple-queue scheduling algorithms can be adapted for inclusion in other multiple-SIMD and partitionable SIMD/MIMD systems that use similar types of interconnection networks to those being considered for PASM.  相似文献   

15.
Signal-processing algorithms often have to be strongly modified in case of hardware implementation. Especially, continuous real-time image processing at high speed is a challenging task. In this paper, a hardware/software codesign is applied to a stereophotogrammetric system. For calculations of a depth map, an optimized algorithm is implemented as a hierarchical parallel hardware solution. By dividing the measuring range into subranges and switching between them, real-time 3D measurements within a wide measuring range is possible. Object clustering and tracking are realized in a processor. The density distribution of the disparity in the depth map (disparity histogram) is used for object detection. A Kalman filter stabilizes the parameters of the results. The text was submitted by the authors in English.  相似文献   

16.
基于FPGA的动态目标跟踪系统设计   总被引:2,自引:0,他引:2  
为了解决基于PC机的视频动态目标跟踪实时性瓶颈问题,设计出一种基于FPGA的动态目标跟踪系统。设计遵循图像处理金字塔模型,针对低层和中层算法简单、数据量大且存在一定并行性等特点采用FPGA硬件实现,而高层较复杂算法使用Nios Ⅱ软核进行C语言编程。整个设计采用Verilog-HDL对算法完成建模与实现,并在QUARTUS Ⅱ上进行了综合、布线等工作,最后以Altera公司的DE2开发板为硬件平台实现了整个系统。  相似文献   

17.
Optimal online scheduling algorithms are known for sporadic task systems scheduled upon a single processor. Additionally, optimal online scheduling algorithms are also known for restricted subclasses of sporadic task systems upon an identical multiprocessor platform. The research reported in this article addresses the question of existence of optimal online multiprocessor scheduling algorithms for general sporadic task systems. Our main result is a proof of the impossibility of optimal online scheduling for sporadic task systems upon a system comprised of two or more processors. The result is shown by finding a sporadic task system that is feasible on a multiprocessor platform that cannot be correctly scheduled by any possible online, deterministic scheduling algorithm. Since the sporadic task model is a subclass of many more general real-time task models, the nonexistence of optimal scheduling algorithms for the sporadic task systems implies nonexistence for any model which generalizes the sporadic task model.  相似文献   

18.
陈泽玮  杨茂林  雷航  廖勇  谢玮 《计算机应用》2017,37(5):1270-1275
近年来,随着实时调度研究的快速发展,可调度性实验的复杂性随之增加,然而,由于缺乏标准化、模块化的可调度性实验工具,研究者往往需要耗费大量时间进行实验;此外,由于实验源码不能公开获得,使得实验结果难以验证,实验代码难以重用与扩展。针对可调度性实验重复工作量大、难以验证的问题,提出一种可调度性实验基础框架。该框架通过随机分布产生任务系统集合,并测试其可调度性,基于该框架设计并实现了一个新的可调度性实验开源平台——SET-MRTS。该平台采用模块化架构设计了任务模块、处理器模块、共享资源模块、算法库、配置解析模块以及输出模块。实验结果表明,SET-MRTS支持单/多处理器实时调度算法和实时同步协议分析,能够正确地进行可调度性对比实验,输出直观的实验结果,并且支持算法库的扩充,与算法库中已实现的算法进行对比实验,具有良好的兼容性与可扩展性。SET-MRTS是第一个支持完整实验流程,包括算法实现、参数配置、结果统计、图表绘制等的可调度性实验开源平台。  相似文献   

19.
Triggered by the ever increasing advancements in processor and networking technology, a cluster of PCs connected by a high-speed network has become a viable and cost-effective platform for the execution of computation intensive parallel multithreaded applications. However, there are two research issues to be tackled in the scheduling problem for PC cluster computing: (1) how to reduce the communication overhead of executing a multithreaded application on the cluster; (2) how to exploit the heterogeneity, which is unavoidable in an evolving PC cluster, for the application. In this paper, we propose to use a duplication based approach in scheduling tasks/threads to a heterogeneous cluster of PCs. In duplication based scheduling, critical tasks are redundantly scheduled to more than one machine, in order to reduce the number of inter-task communication operations. The start times of the succeeding tasks are also reduced. The task duplication process is guided given the system heterogeneity in that the critical tasks are scheduled or replicated in faster machines. The algorithm has been implemented in our experimental application parallelization system for generating multithreaded parallel code executable on a cluster of Pentium PCs. Our experiments, using three numerical applications and one protocol processing kernel (multithreading per request), have indicated that heterogeneity of PC cluster is indeed useful for optimizing the execution of parallel multithreaded programs.  相似文献   

20.
基于Hausdorff距离的图像匹配算法鲁棒性较好,但计算代价较大,软件实现方案很难满足实时性要求。为了解决这个问题,本文在基于局部Hausdorff距离的图像匹配算法基础上提出了一种鲁棒而实时的FPGA实现方案。为了充分有效利用FPGA的硬件资源,首先对传统串行算法进行并行性分析,提出了一个并行算法;然后以此为基础设计了一种三段式粗粒度流水体系结构,并将其映射到FPGA上进行实现。实验结果表明,该系统在性能上优于其它相关工作,与PC(Pentium42.8GHz)上的软件实现方案相比可以达到接近50倍的加速比。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号