期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A new parallel ray-tracing system based on object decomposition

Hyun-Joon Kim Chong-Min Kyung 《The Visual computer》1996,12(5):244-253

We propose a new parallel ray-tracing hardware architecture in which processors are connected as a ring. Most parallel ray-tracing algorithms subdivide the whole object space into subregions; A processor handles only rays entering the subregion assigned to it. Here we assign each processor objects that are spread over the whole object space. The processors trace rays on their own objects. The respective partial results are combined to form the final image. This scheme is especially suitable for synthesizing animated sequences because objects need not be reallocated for every frame. Preliminary results show a speed-up factor almost linearly proportional to the number of processors. 相似文献

2.

基于申威众核处理器的HOG特征提取算法并行加速

赵美婷刘轶刘锐宋凯达钱德沛《计算机工程与科学》2017,39(4):611-618

HOG特征是一种简单高效的常用来进行物体检测的特征描述子,广泛应用于行人检测等领域,然而在处理海量图片时却面临着严峻的性能挑战。解决方法之一就是通过使用"神威太湖之光"超级计算机的处理器节点对海量图像背景下的行人检测算法进行加速。主要采用了两种并行方案:一种是一个处理器同时处理4张图片,另一种是同时处理256张图片。大量的串行和并行处理的实验测试结果表明,对高分辨率多幅图像的并行处理可采用第一种方案,加速比可达83倍;对低分辨率图像可采用第二种方案,加速比最高可达到95。两种并行设计方案在"神威太湖之光"的多处理器节点上具有很好的可扩展性能。相似文献

3.

高性能并行FFT处理器的设计与实现

下载免费PDF全文

石长振杨雪王贞松《计算机工程》2012,38(2):242-244

提出一种高性能并行快速傅里叶变换(FFT)处理器的设计方案,采用4个蝶形单元进行并行处理,利用改进的无冲突操作数地址映射方式,保证每个周期同时读取和写入16个数据。给出该处理器的FPGA实现,性能评测结果表明,与其他FFT处理器相比,该并行FFT处理器的性能较优,能满足实际应用需求。相似文献

4.

Multiprocessing for ray tracing: a hierarchical self-balancing approach 总被引：1，自引：1，他引：0

Issac D. Scherson Elisha Caspary 《The Visual computer》1988,4(4):188-196

相似文献

5.

Orthogonal multiprocessor sharing memory with an enhanced mesh for integrated image understanding

《CVGIP: Image Understanding》1991,53(1):31-45

This paper proposes a new parallel architecture, which has the potential to support low-level image processing as well as intermediate and high-level vision analysis tasks efficiently. The integrated architecture consists of an SIMD mesh of processors enhanced with multiple broadcast buses, and MIMD multiprocessor with orthogonal access buses, and a two-dimensional shared memory array. Low-level image processing is performed on the mesh processor, while intermediate and high-level vision analysis is performed on the orthogonal multiprocessor. The interaction between the two levels is supported by a common shared memory. Concurrent computations and I/O are made possible by partitioning the memory into disjoint spaces so that each processor system can access a different memory space. To illustrate the power of such a two-level system, we present efficient parallel algorithms for a variety of problems from low-level image processing to high-level vision. Representative problems include matrix based computations, histogramming and key counting operations, image component labeling, pyramid computations, Hough transform, pattern clustering, and scene labeling. Through computational complexity analysis, we show that the integrated architecture meets the processing requirements of most image understanding tasks. 相似文献

6.

Flag-oriented parallel associative architectures and applications

Tavangarian D. 《Computer》1994,27(11):41-52

Flag transformation, a new design concept for parallel associative memory and processor architectures, maps word-oriented data into flag-oriented data. A flag vector represents each word in a set. The flag position corresponds to the value of the transformed word, and all flags in a vector are processed simultaneously to obtain parallel operations. The results of complex search operations performed by modular, cascadable hardware components are also represented by flags and retransformed into word-oriented data. This transformation method allows parallel processing of associative or content-addressable data in uniprocessor architectures, expedites IC design rule checks, and accelerates complex memory tests. It can also be used to develop associative processor architectures and to emulate very fast, modular, cascadable artificial neural networks 相似文献

7.

基于通信感知任务划分的异构系统低功耗优化方法

王桂彬《小型微型计算机系统》2011,32(12)

针对由通用微处理器和专用加速部件构成的异构并行系统,提出结合通信感知的并行任务划分和动态电压频率调节技术的异构系统能耗优化方法,该方法旨在将并行任务图划分并映射在异构处理单元,在满足性能约束的条件下最小化系统能耗.在目前典型异构并行系统中,主处理器与加速部件大都通过系统总线连接,必然引入不可忽略的通信开销,因此通信感知的任务划分技术是该问题的关键.提出了基于整数线性规划的静态最优能耗优化方法和基于遗传算法的动态能耗优化方法.并通过一个典型科学计算应用验证了本文方法的有效性. 相似文献

8.

Message based cooperation between parallel depth and intensity matching algorithms

W. J. AUSTIN A. M. WALLACE 《Concurrency and Computation》1997,9(2):141-162

A parallel vision system for object recognition and location based on cooperative depth and intensity processing is described. The parallel algorithm for intensity data processing is based on generation of hypothesised matches between line junctions in the image and space curve intersections in the model. These hypotheses lead to back-projection of the model and verification of promising hypotheses. The parallel algorithm for depth data processing is based on a tree search algorithm constrained by pairwise geometry between primitives. As each algorithm proceeds, partial results are interchanged to direct the other concurrent process to a more promising or more viable solution. The architecture has been implemented and evaluated on a multi-transputer machine, and is illustrated by several examples of pose definition of a test object. © 1997 by John Wiley & Sons, Ltd. 相似文献

9.

Dynamic routing of data stream tuples among parallel query plan running on multi-core processors

Ali A. Safaei Ali Sharifrazavian Mohsen Sharifi Mostafa S. Haghjoo 《Distributed and Parallel Databases》2012,30(2):145-176

In this paper, a method for fast processing of data stream tuples in parallel execution of continuous queries over a multiprocessing environment is proposed. A copy of the query plan is assigned to each of processing units in the multiprocessing environment. Dynamic and continuous routing of input data stream tuples among the graph constructed by these copies (called the Query Mega Graph) for each input tuple determines that, after getting processed by each processing unit (e.g., processor), to which next processor it should be forwarded. Selection of the proper next processor is performed such that the destination processor imposes the minimum tuple latency to the corresponding tuple, among all of the alternative processors. The tuple latency is derived from processing, buffering and communication time delay which varies in different practical parallel systems. 相似文献

10.

Vision processor system for moving-object analysis

Hiroaki Kubota Yasukazu Okamoto Hiroshi Mizoguchi Yoshinori Kuno 《Machine Vision and Applications》1993,7(1):37-43

This paper proposes a vision processor for moving-object analysis in time-varying images. The process of motion analysis can be divided into three stages: moving-object candidate detection, object tracking, and final motion analysis. The processor Consists of three components corresponding to these three stages. The first isan overall image processing unitwith local parallel architecture. It locates candidate regions for moving objects. The second is a multimicroprocessor system consisting of 16 local modules. Each module tracks one candidate region. The third is the host workstation. In this paper, we describe both the architecture and the software of the vision processor. 相似文献

11.

基于DSP/FPGA的嵌入式实时目标跟踪系统 总被引：1，自引：1，他引：1

田茜何鑫《计算机工程》2005,31(15):219-221

提出了一套基于DSP／FPGA的协处理器结构用以实现实时目标跟踪的嵌入式视觉系统。系统由DSP作为主处理器进行全局控制，利用具有流水线并行处理结构的FPGA作为协处理器实时完成DSP分配的处理任务。系统由FPGA快速完成最初的运动估计的结果，DSP在此基础上进一步分析和校正，并将校正信息反馈给FPGA，实现快速而准确的跟踪。相似文献

12.

Parallel implementation of back-propagation algorithm in networks of workstations

Suresh S. Omkar S.N. Mani V. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(1):24-34

This work presents an efficient mapping scheme for the multilayer perceptron (MLP) network trained using back-propagation (BP) algorithm on network of workstations (NOWs). Hybrid partitioning (HP) scheme is used to partition the network and each partition is mapped on to processors in NOWs. We derive the processing time and memory space required to implement the parallel BP algorithm in NOWs. The performance parameters like speed-up and space reduction factor are evaluated for the HP scheme and it is compared with earlier work involving vertical partitioning (VP) scheme for mapping the MLP on NOWs. The performance of the HP scheme is evaluated by solving optical character recognition (OCR) problem in a network of ALPHA machines. The analytical and experimental performance shows that the proposed parallel algorithm has better speed-up, less communication time, and better space reduction factor than the earlier algorithm. This work also presents a simple and efficient static mapping scheme on heterogeneous system. Using divisible load scheduling theory, a closed-form expression for number of neurons assigned to each processor in the NOW is obtained. Analytical and experimental results for static mapping problem on NOWs are also presented. 相似文献

13.

Parallel computers for region-level image processing

Azriel Rosenfeld Angela Y. Wu 《Pattern recognition》1982,15(1):41-50

It is well known that parallel computers can be used very effectively for image processing at the pixel level, by assigning a processor to each pixel or block of pixels, and passing information as necessary between processors whose blocks are adjacent. This paper discusses the use of parallel computers for processing images at the region level, assigning a processor to each region and passing information between processors whose regions are related. The basic difference between the pixel and region levels is that the regions (e.g. obtained by segmenting the given image) and relationships differ from image to image, and even for a given image, they do not remain fixed during processing. Thus, one cannot use the standard type of cellular parallelism, in which the set of processors and interprocessor connections remain fixed, for processing at the region level. Reconfigurable cellular computers, in which the set of processors that each processor can communicate with can change during a computation, are more appropriate. A class of such computers is described, and general examples are given illustrating how such a computer could initially configure itself to represent a given decomposition of an image into regions, and dynamically reconfigure itself, in parallel, as regions merge or split. 相似文献

14.

虚拟环境的系统设计方法及计算模型研究 总被引：11，自引：0，他引：11

李红兵张东摩陈世福《计算机学报》1999,22(3):313-318

面向对象技术和面向Ａｇｅｎｔ技术是虚环境系统的基本设计方法。本文用面向对象的方法来构造Ａｇｅｎｔ,并提供一组支撑Ａｇｅｎｔ的底层计算模型,如神经网络,遗传算法,专家系统和规划管理等。相似文献

15.

A digital retina-like low-level vision processor

Mertoguno S. Bourbakis N.G. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2003,33(5):782-788

This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper. 相似文献

16.

Parallel Sequence Mining on Shared-Memory Machines

《Journal of Parallel and Distributed Computing》2001,61(3):401-426

We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques and simple join operations. Furthermore, each class can be solved independently on each processor requiring no synchronization. However, dynamic interclass and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results. 相似文献

17.

A programmable state machine architecture for packet processing

Wangyang Lai Chin-Tau Lea 《Micro, IEEE》2003,23(4):32-42

The Internet is expanding rapidly and constantly adding new protocols and features. To shorten the design cycle, many companies have adopted a common hardware platform for a variety of products. In these products, specialized packet processors tailored for packet processing handle multiple protocols and feature changes. A packet processor usually incorporates multiple RISC engines that are configurable as several instances of parallel processors, working simultaneously or in a pipelined fashion. In either approach, packet processors are complex and expensive. Packet processing has many levels of programmability requirements. Some tasks require only mild programmability and can't justify the use of a full-fledged packet processor. A finite scare machine (FSM), on the other hand, has high performance but cannot adapt to protocol changes. The solution is something in between: fast, programmable, but not as complicated as a packet processor. A programmable state machine (PSM) is such an idea. 相似文献

18.

A network of microprocessors to execute reduction languages,part I

Gyula A. Magó 《International journal of parallel programming》1979,8(5):349-385

This paper describes the architecture of a cellular processor capable of directly and efficiently executing reduction languages as defined by Backus. The processor consists of two interconnected networks of microprocessors, one of which is a linear array of identical cells, and the other a tree-structured network of identical cells. Both kinds of cells have modest processing and storage requirements. The processor directly interprets a high-level language, and its efficient operation is not restricted to any special class of problems. Memory space permitting, the processor accommodates the unbounded parallelism allowed by reduction languages in any single user program; it is also able to execute many user programs simultaneously. 相似文献

19.

A network of microprocessors to execute reduction languages,part II

Gyula A. Magó 《International journal of parallel programming》1979,8(6):435-471

This paper describes the architecture of a cellular processor capable of directly and efficiently executing reduction languages as defined by Backus. The processor consists of two interconnected networks of microprocessors, one of which is a linear array of identical cells, and the other a tree-structured network of identical cells. Both kinds of cells have modest processing and storage requirements. The processor directly interprets a high-level language, and its efficient operation is not restricted to any special class of problems. Memory space permitting, the processor accommodates the unbounded parallelism allowed by reduction languages in any single user program; it is also able to execute many user programs simultaneously. 相似文献

20.

The image understanding architecture

Charles C. Weems Steven P. Levitan Allen R. Hanson Edward M. Riseman David B. Shu J. Gregory Nash 《International Journal of Computer Vision》1989,2(3):251-282

This paper provides an overview of the Image Understanding Architecture (IUA), a massively parallel, multilevel system for supporting real-time image understanding applications and research in knowledge-based computer vision. The design of the IUA is motivated by considering the architectural requirements for integrated real-time vision in terms of the type of processing element, control of processing, and communication between processing elements.The IUA integrates parallel processors operating simultaneously at three levels of computational granularity in a tightly coupled architecture. Each level of the IUA is a parallel processor that is distinctly different from the other two levels, designed to best meet the processing needs at each of the corresponding levels of abstraction in the interpretation process. Communication between levels takes place via parallel data and control paths. The processing elements within each level can also communicate with each other in parallel, via a different mechanism at each level that is designed to meet the specific communication needs of each level of abstraction.An associative processing paradigm has been utilized as the principle control mechanism at the low and intermediate levels. It provides a simple yet general means of managing massive parallelism, through rapid responses to queries involving partial matches of processor memory to broadcast values. This has been enhanced with hardware operations that provide for global broadcast, local compare, Some/None response, responder count, and single responder select. To demonstrate how the IUA may be used for vision processing, several sample algorithms and a typical interpretation scenario on the IUA are presented.We believe that the IUA represents a major step toward the development of a proper combination of integrated processing power, communication, and control required for real-time computer vision. A proof-of-concept prototype of 1/64th of the IUA is currently being constructed by the University of Massachusetts and Hughes Research Laboratories. 相似文献