期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Enhancing image processing architecture using deep learning for embedded vision systems

《Microprocessors and Microsystems》2020

In recent years, the success and capabilities of embedded vision have showed up in embedded applications. The embedding of vision into electronic devices such as embedded medical applications is being driven by the availability of high-performance processors, integrating with deep learning algorithms, as well as advances in image processing technology. But, including image processing in embedded vision systems need huge amount of computational capabilities even to process a single image to detect an object and it's extremely challenging to implement in embedded systems. Implementing deep learning algorithms and testing it on a task specific data set could provide enhanced results. In this paper, an approach for enhancing image processing architecture using deep learning for embedded vision systems is proposed and analyzed. Implementing deep learning algorithms and testing it on embedded vision yielded effective results. 相似文献

2.

Brain Derived Vision Algorithm on High Performance Architectures

Jayram Moorkanikara Nageswaran Andrew Felch Ashok Chandrasekhar Nikil Dutt Richard Granger Alex Nicolau Alex Veidenbaum 《International journal of parallel programming》2009,37(4):345-369

Even though computing systems have increased the number of transistors, the switching speed, and the number of processors, most programs exhibit limited speedup due to the serial dependencies of existing algorithms. Analysis of intrinsically parallel systems such as brain circuitry have led to the identification of novel architecture designs, and also new algorithms than can exploit the features of modern multiprocessor systems. In this article we describe the details of a brain derived vision (BDV) algorithm that is derived from the anatomical structure, and physiological operating principles of thalamo-cortical brain circuits. We show that many characteristics of the BDV algorithm lend themselves to implementation on IBM CELL architecture, and yield impressive speedups that equal or exceed the performance of specialized solutions such as FPGAs. Mapping this algorithm to the IBM CELL is non-trivial, and we suggest various approaches to deal with parallelism, task granularity, communication, and memory locality. We also show that a cluster of three PS3s (or more) containing IBM CELL processors provides a promising platform for brain derived algorithms, exhibiting speedup of more than 140 × over a desktop PC implementation, and thus enabling real-time object recognition for robotic systems. 相似文献

3.

Parallel image understanding algorithms on MIMD multicomputers

A. Petrosino E. Tarantino 《Computing》1998,60(2):91-107

The heterogeneous nature of data types and computational structures involved in Computer Vision algorithms make the design and implementation of massively parallel image processing systems a not yet fully solved problem. It is common belief that in the next future MIMD architectures with their high degree of flexibility will play a very important role in this research area, by using a limited number of identical but powerful processing elements. The aim of this paper is to show how a selected list of algorithms in which a unique Image Understanding process can be decomposed could map onto a distributed-memory MIMD architecture. The operative modalities we adopt are the SPMD modality for the low level processing and the MIMD modality for the intermediate and high levels of processing. Either efficient parallel formulations of the algorithms with respect to the interconnection topology of processors and their optimized implementations on a target transputer-based architecture are reported. 相似文献

4.

基于数据预处理的并行分层聚类算法* 总被引：3，自引：0，他引：3

李朝鹏李肯立成运李朝健《计算机应用研究》2010,27(1):71-73

分层聚类技术在图像处理、入侵检测和生物信息学等方面有着极为重要的应用,是数据挖掘领域的研究热点之一。针对目前基于SIMD模型的并行分层聚类算法处理海量数据时效果不理想的问题,提出一种基于数据预处理的自适应并行分层聚类算法,在O((λn)2/p)的时间内对n个输入数据点进行聚类。其中1≤p≤n/log n,0.1≤λ≤0.3。将提出的算法与现有文献结论进行的性能对比分析表明,本算法明显改进了现有文献的研究结果。相似文献

5.

Prediction of performance and processor requirements in real-timedata flow architectures

Som S. Mielke R.R. Stoughton J.W. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(11):1205-1216

Presents a new data flow graph model for describing the real-time execution of iterative control and signal processing algorithms on multiprocessor data flow architectures. Identified by the acronym ATAMM, for Algorithm to Architecture Mapping Model, the model is important because it specifies criteria for a multiprocessor operating system to achieve predictable and reliable performance. Algorithm performance is characterized by execution time and iteration period. For a given data flow graph representation, the model facilitates calculation of greatest lower bounds for these performance measures. When sufficient processors are available, the system executes algorithms with minimum execution time and minimum iteration period, and the number of processors required is calculated. When only limited processors are available or when processors fail, performance is made to degrade gracefully and predictably. The user off-line is able to specify tradeoffs between increasing execution time or increasing iteration period. The approach to achieving predictable performance is to control the injection rate of input data and to modify the data flow graph precedence relations so that a processor is always available to execute an enabled graph node. An implementation of the ATAMM model in a four-processor architecture based on Westinghouse's VHSIC 1750A Instruction Set Processor is described 相似文献

6.

Performance of symbolic applications on a parallel architecture

Adolfo Guzman Edward J. Krall Patrick F. McGehearty Nader Bagherzadeh 《International journal of parallel programming》1987,16(3):183-214

The results of a study of a family of parallel symbolic architectures executing several parallel applications are presented. The class of architectures being simulated is characterized by a shared memory structure, by a hierarchical interconnect, and by clustered processors. Speedup measurements were obtained from six different application kernels. Measurements were also performed to assess the degradation of speedup as a function of the interconnection delays, and to study the effect of different scheduling algorithms. The results presented support the claim that the proposed architecture would be a powerful parallel symbolic computation system. The paper discusses processor starvation, fine grain parallelism, unever loads, foreign reference, schedule and indeterminate computation with respect to the applications chosen.This work was completed within the Advanced Computer Architecture Program, Micro-electronics and Technology Computer Corporation, Austin, Texas. 相似文献

7.

General-purpose vision chip architecture for real-time machine vision

《Advanced Robotics》2013,27(6):619-627

To solve the I/O bottleneck problem in existing vision systems and to realize versatile processing adaptive to various and changing environments, we propose a new vision chip architecture for applications such as robot vision. The chip has general-purpose processing elements (PEs) with each PE being directly connected to a photo detector (PD) and can implement various visual processing algorithms. We developed and simulated some sample programs for the chip and proved that they can be processed within 1 ms/frame, a rate that is high enough for high-speed visual feedback for robot control. Aiming to complete the chip, we are now developing test chips based on the architecture. The latest design has 8 x 8 PEs and PDs in an area 3.3 mm x 3.0 mm using a 0.8 μm CMOS process. 相似文献

8.

Agent-based computer vision in a dynamic, real-time environment

Qiang Zhou Author Vitae Author Vitae Matthew Gillen Author Vitae Author Vitae Lonnie Welch Author Vitae 《Pattern recognition》2004,37(4):691-705

For computer vision systems to operate in many real-world environments, processing must occur in real-time under dynamic conditions. An agent-based methodology offers an approach to increase flexibility and scalability to accommodate the demands of a real-time, dynamic environment. This paper presents an agent-based architecture that uses a utility optimization technique to guarantee that important vision tasks are fulfilled even under resource constraints. To ensure that the processing of vision tasks is both reliable and flexible, multiple behaviors are utilized to accomplish the vision application's requirements. A vision behavior consists of a grouping of vision algorithms and a set of service levels associated with these algorithms. Utility functions are adopted to evaluate the performance of all possible behaviors that can address the requirements of a vision application within resource constraints. The maximum overall utility corresponds to the optimal behavior. Two example systems using this model are presented to show the applicability of the architecture. Experimental results show that this agent-based architecture outperforms traditional non-agent-based approaches. 相似文献

9.

Orthogonal multiprocessor sharing memory with an enhanced mesh for integrated image understanding

《CVGIP: Image Understanding》1991,53(1):31-45

This paper proposes a new parallel architecture, which has the potential to support low-level image processing as well as intermediate and high-level vision analysis tasks efficiently. The integrated architecture consists of an SIMD mesh of processors enhanced with multiple broadcast buses, and MIMD multiprocessor with orthogonal access buses, and a two-dimensional shared memory array. Low-level image processing is performed on the mesh processor, while intermediate and high-level vision analysis is performed on the orthogonal multiprocessor. The interaction between the two levels is supported by a common shared memory. Concurrent computations and I/O are made possible by partitioning the memory into disjoint spaces so that each processor system can access a different memory space. To illustrate the power of such a two-level system, we present efficient parallel algorithms for a variety of problems from low-level image processing to high-level vision. Representative problems include matrix based computations, histogramming and key counting operations, image component labeling, pyramid computations, Hough transform, pattern clustering, and scene labeling. Through computational complexity analysis, we show that the integrated architecture meets the processing requirements of most image understanding tasks. 相似文献

10.

A digital image system for atmospheric research

Michael Andrews Robert Fitch 《Computers & Electrical Engineering》1978,5(4):345-364

This paper describes the architecture of a medium scale digital image processing system developed as a research tool for analysis of meteorological data. The system is also being used for research on efficient image processing systems. Four qualitative performance measures for any image processor are introduced with specific application to the present machine. Preliminary results with noise reduction algorithms in satellite data are presented. Lastly, the versatility of the machine as a test bed for architectural studies of the computational structure of image processors with a microprogrammable control unit is discussed. 相似文献

11.

Vision-based mobile robots on highways 总被引：1，自引：0，他引：1

《Advanced Robotics》2013,27(4):417-427

Intelligent vehicles are mobile robots on highways. They are expected to improve the safety, efficiency and environmental impacts of the current highway traffic systems. Vision systems will play an important role as sensors for the intelligent vehicles. This paper first compares the vision sensors with other sensing methods from an application point of view and then describes two vision systems, one which we have developed and another which we are developing. Two important features are required for the vision systems applied to intelligent vehicles: three-dimensional (3D) measurement capability and real-time operation. We chose a trinocular stereo vision scheme among a number of 3D vision processing methods because it is suitable for real-time operations with dedicated processor architectures. The trinocular stereo algorithm requires a large number of operations, but all the operations are relatively straightforward and, therefore, they are suitable for custom architecture implementation. The system takes three images simultaneously by using three TV cameras installed on a single horizontal line at the front grill of the test car. Vertical edges are extracted from these images and the spatial offsets (or disparities) among the images are calculated for measuring the distances to the objects. The first version was developed and installed in a car for highway testing. Two custom digital processors were developed: one for edge detection and the other for stereo matching. The test results were encouraging and the architectures based on ASIC (Application Specific Integrated Circuits) are 800 and 550 times more efficient, respectively, compared with conventional microprocessors for edge detection and stereo matching. The second version is currently being developed in order to further reduce the silicon area size. It uses hybrid analog/digital circuit technology while the first version uses only digital circuits. We are developing a hybrid analog/digital array processor chip which includes a large number of processing elements. Each processing element includes a digital memory unit, a data flow control switch unit and an analog arithmetic/logic unit. The analog arithmetic/logic unit reduces the silicon area size significantly compared with the digital one. The data flows among multiple processing elements in the array chip in a form of analog voltage. The data flow is controlled by the data flow switches. The digital memory unit controls the set-up of the data flow control switch and arithmetic/logic units. 相似文献

12.

Parallel computation of symbolic robot models and control laws: Theory and application to transputer networks

N. Kir&#x;anski T. Petrovi&#x; M. Vukobratovi&#x; 《野外机器人技术杂志》1993,10(3):345-368

New computer architectures based on large numbers of processors are now used in various application areas ranging from embedded systems to supercomputers. Efficient parallel processing algorithms are applied in a wide variety of applications such as simulation, robot control, and image synthesis. This article presents two novel parallel algorithms for computing robot inverse dynamics (as well as control laws) starting from customized symbolic robot models. To gain the most benefit from the concurrent processor architecture, the whole job is divided into a large number of simple tasks, each involving only a single floating-point operation. Although requiring sophisticated scheduling schemes, fine granularity of tasks was the key factor for achieving nearly maximum efficiency and speedup. The first algorithm resolves the scheduling problem for an array of pipelined processors. The second one is devoted to parallel processors connected by a complete crossbar interconnection network. The main feature of the proposed algorithms is that they take into account the communication delays between processors and minimize both the execution time and communication cost. To prove the theoretical results, the algorithms have been verified by experiments on an INMOS T800 transputer-based system. We used four transputers in serial and parallel configurations. The experimental results show that the most complicated dynamic control laws can be executed in a submilisecond time interval. © 1993 John Wiley & Sons, Inc. 相似文献

13.

CEPROL: A cellular programming language

Friedhelm Seutter 《Parallel Computing》1985,2(4):327-333

Realized cellular automata may be operated by universal computer systems as programmable special-purpose processors for parallelizable problems. Because of their architecture (local neighbourhood, small storage size per cell, they are well suited for processing systolic algorithms. A cellular programming language — named CEPROL — is presented which offers means for programming and controlling cellular automata processing such algorithms. 相似文献

14.

Efficient mapping algorithms for a class of hierarchical systems

Ziavras S.G. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(11):1230-1245

Proposes techniques for mapping application algorithms onto a class of hierarchically structured parallel computing systems. Multiprocessors of this type are capable of efficiently solving a variety of scientific problems because they can efficiently implement both local and global operations for data in a two-dimensional array format. Among the set of candidate application domains, low-level and intermediate-level image processing and computer vision (IPCV) are characterized by high-performance requirements. Emphasis is given to IPCV algorithms. The importance of the mapping techniques stems from the fact that the current technology cannot be used to build cost-effective and efficient systems composed of very large numbers of processors, so the performance of various systems of lower cost should be investigated. Both analytical and simulation results prove the effectiveness and efficiency of the proposed mapping techniques 相似文献

15.

并行自适应控制算法及双处理机实现 总被引：4，自引：0，他引：4

张志勇王诗宓方崇智康景利《自动化学报》1995,21(1):110-115

该文提出了一种新的并行处理自适应控制算法.该算法将自适应控制中辨识和控制算法分为运行时间相当的两大部分,在每个控制间隔内,两部分交换信息后分别同时计算,从而在任务级上实现了并行处理.文中讨论了并行算法流程及性能评价指标,并给出了在制导系统中的应用实例.试验表明,通过双机并行处理,自适应控制的处理速度有显著提高. 相似文献

16.

Implementing fast Fourier transforms using the Am29500 family

《Microprocessors and Microsystems》1987,11(8):423-430

The paper discusses the implementation of fast Fourier transform (FFT) algorithms using members of the Am29500 family of microprocessors and peripherals. First the suitability of the Am29500 family for signal processing applications is discussed. The architectural requirements of FFT processors are then outlined. A parallel processing architecture using pipelining is developed and the microprogramming of the system is described. Timing and implementation details, together with some practical test results, are given. The paper concentrates mainly on radix-2 decimation-in-time (DIT) FFT computations, but the architecture described can be applied to variable-radix processors running DIT or DIF (decimation-in-frequency) algorithms. 相似文献

17.

面向Storm的数据流编程模型与编译优化方法研究

杨秋吉于俊清莫斌生何云峰《计算机工程与科学》2016,38(12):2409-2418

数据流编程模型将程序的计算与通信分离,暴露了应用程序潜在的并行性并简化了编程难度。分布式计算框架利用廉价PC构建多核集群解决了大规模并行计算问题,但多核集群层次性存储结构和处理单元对数据流程序的性能提出了新的挑战。针对数据流程序在分布式架构下所面临的问题,设计并实现了数据流编程模型和分布式计算框架的结合——在COStream的基础上提出了面向Storm的编译优化框架。框架包括两个模块:面向Storm的层次性任务划分与调度,以及面向Storm的层次性软件流水与代码生成。层次性任务划分利用Storm的任务调度机制将程序所有子任务分配到Storm集群节点内的多核上。层次性软件流水与代码生成将子任务构造成集群节点间的软件流水和节点内多核间的软件流水,并生成相应的目标代码。实验以多核集群为目标平台,在集群上搭建Storm分布式架构,选取数字媒体处理领域典型程序作为测试程序,对面向Storm的编译优化后的程序进行实验分析。实验结果表明了结合方法的有效性。相似文献

18.

Image Compression and Video Segmentation Using Hierarchical Self-Organization

Esteban J. Palomo Enrique Domínguez Rafael M. Luque-Baena José Muñoz 《Neural Processing Letters》2013,37(1):69-87

Both image compression based on color quantization and image segmentation are two typical tasks in the field of image processing. Several techniques based on splitting algorithms or cluster analyses have been proposed in the literature. Self-organizing maps have been also applied to these problems, although with some limitations due to the fixed network architecture and the lack of representation in hierarchical relations among data. In this paper, both problems are addressed using growing hierarchical self-organizing models. An advantage of these models is due to the hierarchical architecture, which is more flexible in the adaptation process to input data, reflecting inherent hierarchical relations among data. Comparative results are provided for image compression and image segmentation. Experimental results show that the proposed approach is promising for image processing, and the powerful of the hierarchical information provided by the proposed model. 相似文献

19.

The image understanding architecture

Charles C. Weems Steven P. Levitan Allen R. Hanson Edward M. Riseman David B. Shu J. Gregory Nash 《International Journal of Computer Vision》1989,2(3):251-282

This paper provides an overview of the Image Understanding Architecture (IUA), a massively parallel, multilevel system for supporting real-time image understanding applications and research in knowledge-based computer vision. The design of the IUA is motivated by considering the architectural requirements for integrated real-time vision in terms of the type of processing element, control of processing, and communication between processing elements.The IUA integrates parallel processors operating simultaneously at three levels of computational granularity in a tightly coupled architecture. Each level of the IUA is a parallel processor that is distinctly different from the other two levels, designed to best meet the processing needs at each of the corresponding levels of abstraction in the interpretation process. Communication between levels takes place via parallel data and control paths. The processing elements within each level can also communicate with each other in parallel, via a different mechanism at each level that is designed to meet the specific communication needs of each level of abstraction.An associative processing paradigm has been utilized as the principle control mechanism at the low and intermediate levels. It provides a simple yet general means of managing massive parallelism, through rapid responses to queries involving partial matches of processor memory to broadcast values. This has been enhanced with hardware operations that provide for global broadcast, local compare, Some/None response, responder count, and single responder select. To demonstrate how the IUA may be used for vision processing, several sample algorithms and a typical interpretation scenario on the IUA are presented.We believe that the IUA represents a major step toward the development of a proper combination of integrated processing power, communication, and control required for real-time computer vision. A proof-of-concept prototype of 1/64th of the IUA is currently being constructed by the University of Massachusetts and Hughes Research Laboratories. 相似文献

20.

Scalability aspects of instruction distribution algorithms for clustered processors

Aneesh Aggarwal Franklin M. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(10):944-955

In the evolving submicron technology, making it particularly attractive to use decentralized designs. A common form of decentralization adopted in processors is to partition the execution core into multiple clusters. Each cluster has a small instruction window, and a set of functional units. A number of algorithms have been proposed for distributing instructions among the clusters. The first part of this paper analyzes (qualitatively as well as quantitatively) the effect of various hardware parameters such as the type of cluster interconnect, the fetch size, the cluster issue width, the cluster window size, and the number of clusters on the performance of different instruction distribution algorithms. The study shows that the relative performance of the algorithms is very sensitive to these hardware parameters and that the algorithms that perform relatively better with four or fewer clusters are generally not the best ones for a larger number of clusters. This is important, given that with an imminent increase in the transistor budget, more clusters are expected to be integrated on a single chip. The second part of the paper investigates alternate interconnects that provide scalable performance as the number of clusters is increased. In particular, it investigates two hierarchical interconnects - a single ring of crossbars and multiple rings of crossbars - as well as instruction distribution algorithms to take advantage of these interconnects. Our study shows that these new interconnects with the appropriate distribution techniques achieve an IPC (instructions per cycle) that is 15-20 percent better than the most scalable existing configuration, and is within 2 percent of that achieved by a hypothetical ideal processor having a 1-cycle latency crossbar interconnect. These results confirm the utility and applicability of hierarchical interconnects and hierarchical distribution algorithms in clustered processors. 相似文献