首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The design of specialized processing array architectures, capable of executing any given arbitrary algorithm, is proposed. An approach is adopted in which the algorithm is first represented in the form of a dataflow graph and then mapped onto the specialized processor array. The processors in this array execute the operations included in the corresponding nodes (or subsets of nodes) of the dataflow graph, while regular interconnections of these elements serve as edges of the graph. To speed up the execution, the proposed array allows the generation of computation fronts and their cancellation at a later time, depending on the arriving data operands; thus it is called a data-driven array. The structure of the basic cell and its programming are examined. Some design details are presented for two selected blocks, the instruction memory and the flag array. A scheme for mapping a dataflow graph (program) onto a hexagonally connected array is described and analyzed. Two distinct performance measures-mapping efficiency and array utilization-and some performance results are discussed  相似文献   

2.
Contemplating the advancements in communication technology, the analysis of the features of reflectarray, transmitarray, and transmit‐reflectarray becomes essential for future adaptability. This article presents a thorough review of such high‐gain antennas, presenting some of the most relevant solutions published by the scientific society in the field of antennas and wave propagation. Several examples of unit cells for array implementation and complete array designs discussed in various literatures are analyzed. The analysis is focused in identifying the unit cell layouts, such as those developed using microstrip patches, frequency selective surfaces, or metamaterials. The analysis is extended to the ways of improving bandwidth, for example, true time delay elements, phase delay lines, meander lines, and so on, and the various methods used to enable reconfiguration, for example, p‐i‐n diodes, varactor diodes, or microelectromechanical systems. In addition, some antennas, which produce bidirectional beams simultaneously, are also discussed. Finally, all the models are compared against each other in order to highlight their benefits and limitations, summarizing their main characteristics, such as the frequency of operation, bandwidth, phase range, gain, aperture efficiency, sidelobe levels, cross polarization levels, and maximum beam‐steering range.  相似文献   

3.
A parallel memory system for efficient parallel array access using perfect latin squares as skewing functions is discussed. Simple construction methods for building perfect latin squares are presented. The resulting skewing scheme provides conflict free access to several important subsets of an array. The address generation can be performed in constant time with simple circuitry. The skewing scheme can provide constant time access to rows, columns, diagonals, and N1/2 ×N1/2 subarrays of an N× N array with maximum memory utilization. Self-routing Benes networks can be used to realize the permutations needed between the processing elements and the memory modules. Two skewing schemes that provide conflict free access to three-dimensional arrays are also discussed. Combined with self-routing Benes networks, these schemes provide efficient access to frequently used subsets of three-dimensional arrays  相似文献   

4.
5.
A systolic array is proposed which is specifically designed to solve a system of sparse linear equations. The array consists of a number of processing elements connected in a ring. Each processing element has its own content-addressable memory where the nonzero elements of the sparse matrix are stored. Matrix elements to which elementary operations are applied are extracted from the memory by content addressing. The system of equations is solved in a systolic fashion and the solution is obtained in NZ + 5n ? 2 steps, where NZ is the number of nonzero elements along and below the diagonal and n is the number of equations.  相似文献   

6.
We are capable of drawing a variety of inferences effortlessly, spontaneously, and with remarkable efficiency—as though these inferences are a reflex response of our cognitive apparatus. This remarkable human ability poses a challenge for cognitive science and computational neuroscience: How can a network of slow neuron-like elements represent a large body of systematic knowledge and perform a wide range of inferences with such speed? The connectionist model SHRUTI attempts to address this challenge by demonstrating how a neurally plausible network can encode a large body of semantic and episodic facts, systematic rules, and knowledge about entities and types, and yet perform a wide range of explanatory and predictive inferences within a few hundred milliseconds. Relational structures (frames, schemas) are represented in SHRUTI by clusters of cells, and inference in SHRUTI corresponds to a transient propagation of rhythmic activity over such cell-clusters wherein dynamic bindings are represented by the synchronous firing of appropriate cells. SHRUTI encodes mappings across relational structures using high-efficacy links that enable the propagation of rhythmic activity, and it encodes items in long-term memory as coincidence and coincidence-error detector circuits that become active in response to the occurrence (or non-occurrence) of appropriate coincidences in the on going flux of rhythmic activity. Finally, “understanding” in SHRUTI corresponds to reverberant and coherent activity along closed loops of neural circuitry. Over the past several years, SHRUTI has undergone several enhancements that have augmented its expressiveness and inferential power. This paper describes some of these extensions that enable SHRUTI to (i) deal with negation and inconsistent beliefs, (ii) encode evidential rules and facts, (iii) perform inferences requiring the dynamic instantiation of entities, and (iv) seek coherent explanations of observations.  相似文献   

7.
In distributed memory multicomputers, local memory accesses are much faster than those involving interprocessor communication. For the sake of reducing or even eliminating the interprocessor communication, the array elements in programs must be carefully distributed to local memory of processors for parallel execution. We devote our efforts to the techniques of allocating array elements of nested loops onto multicomputers in a communication-free fashion for parallelizing compilers. We first analyze the pattern of references among all arrays referenced by a nested loop, and then partition the iteration space into blocks without interblock communication. The arrays can be partitioned under the communication-free criteria with nonduplicate or duplicate data. Finally, a heuristic method for mapping the partitioned array elements and iterations onto the fixed-size multicomputers under the consideration of load balancing is proposed. Based on these methods, the nested loops can execute without any communication overhead on the distributed memory multicomputers. Moreover, the performance of the strategies with nonduplicate and duplicate data for matrix multiplication is studied  相似文献   

8.
In the ambition to go beyond a single-processor architecture, to enhance programmability, and to take advantage of the power brought by VLSI devices, data-flow systems and languages were devised. Indeed, due to their functional semantics, these languages offer promise in the area of multiprocessor systems design and will possibly enable the development of computers comprising large numbers of processors with a corresponding increase in performance. Several important design problems have to be surmounted and are described here. We thus present a “variable-resolution” scheme, where the level of primitives can be selected so that the overhead due to the data-flow mode of operation is reduced. A deterministic simulation of a data-flow machine with a variable number of processing elements was undertaken and is described here. The tests were performed using various program structures such as directed acyclic graphs, vector operations, and array handling. The performance results observed confirm the advantage of actors with variable size and indicate the presence of a trade-off between overhead control and the need to control parallelism in the program. We also look at some of the communication issues and examine the effect of several interconnection networks (dual counter-rotating rings, daisy chain, and optimal double loop network) on the performance. It is shown how increasing communication costs induce a performance degradation that can be masked when the size of the basic data-flow actor is increased. The asociative memory cycle time is also changed with similar conclusions. Finally, the lower-resolution scheme is applied to the array handling case; the observations confirm the advantage of a more complex actor at the array level.  相似文献   

9.
The paper is devoted to the problem of formalization of software-hardware solutions in designing real-time computer vision systems. The main attention is paid to the methods of implementation of low-level operations that find features (simple elements) in the image input to the system. Algorithmic types of detectors of simple elements in images are analyzed from the point of view of hardware organization in computer vision systems. In this connection, the necessary performance and memory resources are estimated. The capabilities of parallel and pipeline execution of detector algorithms are investigated. A method of using a field programmable gate array and a digital signal processor in solving the problem of image processing in real-time computer vision systems is considered in detail.  相似文献   

10.
本文介绍的copy方法就是针对大规模线性代数问题而提出的一种cache优化方法,基本思想是把不连续擞组元素拷贝到一连续的数组,以减少cache自冲突,且使分块算法的访存连续,从而提高cache利用率,本文以大规模矩阵乘法为例进行了验算,试算结果与理论分析一致。  相似文献   

11.
An associative memory circuit that may let designers expand neural networks around a matrix of analog synapses is described. The architecture of the chip and its basic cell are discussed, and some SPICE simulation results are presented and compared with measures provided by the first prototype. In particular, the linearity and dynamic response of the complete chip, which includes an array of 25 synapses and two address decoders used for programming the weights, are examined  相似文献   

12.
The problems in computer vision range from edge detection and segmentation at the lowest level to the problem of cognition at the highest level. This correspondence describes the organization and operation of a semantic network array processor (SNAP) as applicable to high level computer vision problems. The architecture consists of an array of identical cells each containing a content addressable memory, microprogram control, and a communication unit. The applications discussed in this correspondence are the two general techniques, discrete relaxation and dynamic programming. While the discrete relaxation is discussed with reference to scene labeling and edge interpretation, the dynamic programming is tuned for stereo.  相似文献   

13.
纳米交叉杆结构因其结构简单、制备工艺成熟而成为研究者最为关注的一种纳米存储器件.纳米交叉杆基于具有双稳态性质的纳米器件,有机分子层交叉结构和碳纳米管交叉结构都是比较成熟的纳米交叉结构.基于纳米交叉杆的存储器一般由外围微-纳结构多路选择器和存储阵列组成,要想在高密度存储的基础上实现快速读写必须研究并行读写方法.并行读写的基础是并行寻址,一种可选的并行寻址方式是地址加掩码的模式,这种模式后再加一个筛选向量即可大大增加并行寻址的灵活度.纳米交叉杆存储器的并行写可分为写1和写0两个子过程,安排最佳的并行访问方式是二维平面上的背包问题.并行读过程可以一次将一行或一列的内容读取出来.  相似文献   

14.
Systolic array architectures are favourable for special purpose systems as they are simple and offer a high degree of concurrency. A programmable systolic device is designed to cater for all tasks of image processing based on mathematical morphology. The design consists of a systolic memory matrix accessible via a rotation operation by a linear systolic array of simple processing elements. The instruction set consists of 1-bit assignmments, logical and and or and shift operations on the memory. Thus extremely short clock cycles and a high degree of parallelism can be achieved.  相似文献   

15.
The organization and operation of a semantic network array processor (SNAP) are described. The architecture consists of an array of identical cells each containing a content addressable memory, microprogram control, and communication unit. Each cell is dedicated to one node of the semantic network and its associated relations. The array can perform global associative functions under the supervision of an outside controller. In addition, each-cell is equipped with the necessary logic to perform individual functions. A set of primitive instructions was carefully chosen. Some of the applications discussed include pattern search operations, production systems, and inferences. A LISP simulator was developed for this architecture, and some simulation results are presented.  相似文献   

16.
A memory allocation scheme for list structures (ral system) is proposed, which allows random access and search for the elements of the structure. A comparative study of classical list systems and ral systems is given, for the basic operations of search, insertion, deletion and sorting of the structure elements. It is shown that, in general, ral systems exhibit lower order expected time complexities for such operations, possibly at the expense of a reasonable increase in memory occupation. Allocation and processing of linear ordered lists and trees are discussed in particular.  相似文献   

17.
Abstract

In this paper an approach is presented to combine the design of background memory architectures and processor arrays for data dominated real-time applications. The formalized data transfer and storage exploration (DTSE) approach of IMEC involves a methodology for the design of a low-power small-size background memory organizations, meeting real-time constraints. The systematic space-time transformation and the subsequent co-partitioning approach of the Dresden University of Technology, allow the design of realistic processor arrays adapted to a given memory architecture. However, neither methodology can derive on its own the complete solution of a fully optimized memory organization, combining background and foreground memory. Extensions to enable this important problem will be presented here. First, both complementary methodologies will be summarized. Next, the main emphasis in this paper will be on the approach to design the processor array within the context of an already optimized and hence given memory architecture. The feasibility of the proposed combination is demonstrated on a representative test-vehicle for an important class of applications, namely a full motion estimation kernel in MPEG.  相似文献   

18.
Many researchers approach the problem of programming distributed memory machines by assuming a global shared name space. Thus the user views the distributed memory of the machine as though it were shared. A major issue that arises at this point is how to manage the memory. When a processor accesses data stored on another processor's memory, data must be moved between the two processors. Once these data are retrieved from another processor's memory, several interesting issues are raised. Where should these data be stored locally? What transformations must be performed to the code to guarantee that the nonlocal accesses reference the correct memory location? What optimizations can be performed to reduce the time spent in accessing the nonlocal data? In this paper we examine various data migration mechanisms that allow an explicit and controlled mapping of data to memory. We describe, experimentally evaluate, and model a set of schemes for storing and retrieving off-processor array elements. The schemes are all based on using hash tables for efficient access of nonlocal data. The three different techniques evaluated are the basic hashed cache, partial enumeration, and full enumeration, the details of which are described in the paper. In all three schemes, nonlocal data are stored in hash tables—the difference is in the amount of memory used by the schemes and the retrieval mechanisms for nonlocal data.  相似文献   

19.
Multiprocessor system-on-chip (MP-SoC) platforms represent an emerging trend for embedded multimedia applications. To enable MP-SoC platforms, scalable communication-centric interconnect fabrics, such as networks-on-chip (NoCs), have been recently proposed. The shared memory represents one of the key elements in designing MP-SoCs to provide data exchange and synchronization support.This paper focuses on the energy/delay exploration of a distributed shared memory architecture, suitable for low-power on-chip multiprocessors based on NoC. A mechanism is proposed for the data allocation on the distributed shared memory space, dynamically managed by an on-chip hardware memory management unit (HwMMU). Moreover, the exploitation of the HwMMU primitives for the migration, replication, and compaction of shared data is discussed. Experimental results show the impact of different distributed shared memory configurations for a selected set of parallel benchmark applications from the power/-performance perspective. Furthermore, a case study for a graph exploration algorithm is discussed, accounting for the effects of the core mapping and the network topology on energy and performance at the system level.  相似文献   

20.
对C语言指针教学问题的探究   总被引:1,自引:1,他引:0  
本文讨论了C语言中指针的各种应用形式,认真分析了指针在数组、函数和动态内存分配等方面应用的特点和优点,对如何学好和掌握C语言的指针有十分重要的指导意义。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号