首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
ASP (Associative String Processor) architecture and support software provide the base technology for the development of versatile, replaceable, and highly compact building-blocks for the simple construction of modular real-time DSP systems, offering step-function improvements in cost-performance, application flexibility, reliability, and ease of maintenance. Based on a fully programmable and fault-tolerant homogeneous computational architecture, emerging from research at Brunel University and being developed by Aspex Microsystems, ASP modules offer cost-effective support of a particularly wide range of DSP applications, by mapping application data structures to a common string representation supporting content-addressing, parallel processing and a reconfigurable inter-processor communication network. Moreover, by exploiting state-of-the-art microelectronics and packaging technologies, the ASP modules achieve processor packing-densities which are more ussually associated with memory components. Indeed, the ASP has been designed to benefit from the inevitable VLSI-to-ULSI-to-WSI technological trend, with a fully integrated simply scalable, and defect/fault-tolerant processor interconnection strategy. The architecture, software, and implementation of ASP modules are discussed, and the paper indicates that the potential of a peak performance of 1 TOPS (i.e., 1E12 operations (e.g., 12-bit adds) per second) with an input-output bandwidth of 3,200 Mbytes/second could be achieved with only 10 ASP modules, within less than a cubic-foot, dissipating 1 KW, and for less than $1M.  相似文献   

2.
This paper presents a VLSI architecture specifically designed as a video/communication controller to support emerging applications in the area of video/data communications. The controller is a parallel architecture consisting of three (3) processing modules, a shared memory with four (4) banks and two (2) input/output modules and operating at the transfer speed of 622 Mbits/sec. The processing modules and memory banks communicate through a low cost interconnection scheme able though to perform at system's required data transfer rate. The entire system constitutes a component which can accommodate a switching system as an intelligent buffer with real time processing and multiplexing capabilities. The component performs operations on fixed and/or variable length packets of data on a stream basis. The architecture embeds both the processing and the memory modules, thus producing a system on a chip solution.  相似文献   

3.
This paper presents an Application Specific Instruction Set Processor (ASIP) for implementation of H.264/AVC, called Video Specific Instruction-set Processor (VSIP). The proposed VSIP has novel instructions and optimized hardware architectures for specific applications, such as intra prediction, in-loop deblocking filter, integer transform, etc. Moreover, VSIP has coprocessors for computation intensive parts in video signal processing, such as inter prediction and entropy coding. The proposed VSIP has much smaller area and can dramatically reduce the number of memory access compared with commercial DSP chips, which result in low power consumption. Moreover, the proposed hardware accelerators have small size, consume low power consumption, and thus, they can support real-time video processing. VSIP has been thoroughly verified using an FPGA board having the Xilinx™ Virtex II. VSIP can implement a real-time H.264/AVC decoder. The proposed VSIP is one of promising solutions for video signal processing.
Sung Dae KimEmail:
  相似文献   

4.
A self-pruning binary tree (SPBT) interconnection network architecture that tolerate faults in a wafer scale integration (WSI) environment is proposed. The goal of the SPBT network is to provide a reliable and a quickly reconfigured interconnection network architecture for linear WSI arrays. The proposed architecture uses a bottom-up approach to reconfigure a linear pipelined array on a potentially defective WSI array using a binary tree interconnection scheme. The binary tree is generated by successive formation of hierarchical modules. For N processing elements (PEs) on the wafer, reconfiguration time is O(log N). The propagation delay is bounded by Θ(log N) and is independent of the number of faulty PEs. Faults in the switching network as well as faulty processing elements are tolerated  相似文献   

5.
We explore the energy dissipation of the Linear Processor Array (LPA) as a function of the number of available resources (Processor Units P) within the array. This number P is an important parameter, as it reflects performance, relates parallel processing to energy dissipation, and influences the scaling of the various parts of the LPA architecture (memory, address generator, communication network).To make a comparison of the different design variants for a fixed datawidth possible, we propose a high-level energy dissipation model of the processor, which is based on a detailed analysis of a general convolution algorithm.It is shown that the energy dissipation of the LPA can roughly be described by the relationship E total N/P with N presenting the datawidth in pixels. This relationship is derived from two observations: first, the largest contribution to E total is formed by the energy dissipated by the memories, and second, in our model of the LPA, the datawidth of the memories corresponds with the number of pixels N to be processed, which results in an increase of the access rate when P decreases.Furthermore, we have shown that the energy dissipation caused by communication within the LPA, increases with increasing number of resources: the trade-off between communication versus computation in parallel computing. This turns out to be negligible in the total energy dissipation, and we therefore conclude, that the optimum solution is found, when a full number of resources is applied within the LPA.  相似文献   

6.
A systematic efficient fault diagnosis method for reconfigurable VLSI/WSI array architectures is presented. The basic idea is to utilize the output data path independence among a subset of processing elements (PEs) based on the topology of the array under test. The divide and conquer technique is applied to reduce the complexity of test application and enhance the controllability and observability of a processor array. The array under test is divided into nonoverlapping diagnosis blocks. Those PEs in the same diagnosis block can be diagnosed concurrently. The problem of finding diagnosis blocks is shown equivalent to a generalizedEight Queens problem. Three types of PEs and one type of switches, which are designed to be easily testable and reconfigurable, are used to show how to apply this approach. The main contribution of this paper is an efficient switch and link testing procedure, and a novel PE fault diagnosis approach which can speed up the testing by at leastO(V1/2) for the processor arrays considered in this paper, where V is the number of PEs. The significance of our approach is the ability to detect as well as to locate multiple PE, switch, and link faults with little or no hardware overhead.  相似文献   

7.
For applications ranging from phase equilibria to the processing of second-generation high T c superconductor-coated-conductors, phase diagrams constructed under carbonate-free conditions are needed. Subsolidus phase equilibria of BaO-R2O3-CuO z (R = Ho) have been investigated at (810°C), 21 kPa (875°C) and 0.1 MPa (850 and 930°C) by applying controlled atmosphere methods to minimize the presence of carbonate and CO2 and H2O contamination. Under carbonate-free conditions, most of these phase diagrams are different from those reported in the literature. In this paper, we also review and compare the phase diagrams of ten BaO-R2O3-CuO z systems (R = Nd, Sm, Eu, Gd, Dy, Y, Ho, Er, Tm and Yb) that were previously determined in this laboratory under Among these diagrams, a distinct trend of phase formation and tie-line relationships is observed.  相似文献   

8.
Associative processing architectures as candidates for second-generation massive parallel computers (MPCs) are discussed. The authors outline the principle of operation of associative processors, and briefly review their development. They explore the associative string processor (ASP) architecture as a representative case study in the development of a second-generation MPC  相似文献   

9.
We first show that for some kinds of signals a bandwidth and time duration reduction technique can be used to simulate waveform distortions caused by moving targets, that is, it is correct to measure the waveform distortions at very large TB with relatively small by reducing TB and increasing while keeping TB unchanged, where T is the duration of the transmitted signal, B is the bandwidth and is the relative speed of targets. We then study the waveform distortions in SAR signals caused by moving antenna. Based on the bandwidth and time duration reduction technique, a lot of time and memory are saved in simulations. We then confirm by simulations that waveform distortions do pose problems when processing very large bandwidth and long duration SAR data using conventional SAR processing methods. Finally we propose the concepts of wideband and narrowband processing of SAR data. Models are set up for wideband and narrowband SAR data processing, and new methods are presented for reconstructing targets using the proposed models. Simulations show that the methods can improve the quality of the simulated SAR images.  相似文献   

10.
From a system level perspective, this paper presents a 128 × 128 flexible and reconfigurable Focal-Plane Analog Programmable Array Processor, which has been designed as a single chip in a 0.35 m standard digital 1P-5M CMOS technology. The core processing array has been designed to achieve high-speed of operation and large-enough accuracy (7 bits) with low power consumption. The chip includes on-chip program memory to allow for the execution of complex, sequential and/or bifurcation flow image processing algorithms. It also includes the structures and circuits needed to guarantee its embedding into conventional digital hosting systems: external data interchange and control are completely digital. The chip contains close to four million transistors, 90% of them working in analog mode. The chip features up to 330 GOPs (Giga Operations per second), and uses the power supply (180 GOP/Joule) and the silicon area (3.8 GOPS/mm2) efficiently, and is able to maintain VGA processing throughputs of 100 Frames/s with about 10–20 basic image processing tasks on each frame.  相似文献   

11.
Products motivated by performance-driven and/or density-driven goals often use Multi-Chip Module (MCM) technology, even though it still faces several challenging problems that need to be resolvedbefore it becomes a widely adopted technology. Among its mostchallenging problems is achieving acceptable MCM assembly yieldswhile meeting quality requirements. This problem can be significantlyreduced by adopting adequate MCM test strategies: to guarantee thequality of incoming bare (unpackaged) dies prior to module assembly;to ensure the structural integrity and performance of assembled modules; and to help isolate the defective parts and apply the repair process.This paper describes todays MCM test problems and presents thecorresponding test and design-for-testability (DFT) strategies usedfor bare dies, substrates, and assembled MCMs.  相似文献   

12.
Alternative methods are proposed for test of output feedback stabilizability and construction of a stable closed-loop polynomial for 2D systems. By the proposed methods, the problems can be generally reduced to the 1D case and solved by using 1D algorithms or Gröbner basis approaches. Another feature of the methods is that their extension to certain specialnD (n>2) cases can be easily obtained.Moreover, the Rabinowitsch trick, a technique ever used in showing the well-known Hilbert's Nullstellensatz, is generalized in some sense to the case of modules over polynomial ring. These results eventually lead to a new solution algorithm for the 2D polynomial matrix equationD(z, w)X(z, w)+N(z, w)Y(z, w)=V(z, w) withV(z, w) stable, which arises in the 2D feedback design problem. This algorithm shows that the equation can be effectively solved by transforming it to an equivalent Bezout equation so that the Gröbner basis approach for polynomial modules can be directly applied.Notation R the field of real numbers - C the field of complex numbers - R[z, w] commutative ring of 2D polynomials inz andw with coefficients inR - M(R[z, w]) set of matrices with appropriate dimensions with entries inR[z, w] - R[z, w] n module of orderedn-tuples inR[z, w] - R[z, w] n ×m set ofn ×m matrices with entries inR[z, w] - closed unit disc inC, i.e., {z C| |z| 1} - 2 closed unit bidisc, i.e., {(z, w) C 2| |z| 1, |w| 1} - A T transpose of matrixA  相似文献   

13.
This paper presents an efficient architecture of an application specific processor (ASP) designed for the deblocking filter algorithm of the H.264 video compression standard. Several optimization techniques at different design levels, such as vector register, pipeline processing, very long instruction word (VLIW) processor, and predication, are utilized in this design. The proposed ASP can meet the real time constraint of the deblocking filter algorithm for the 16:9 video format (4690$,times,$ 2304) at 30 frames per second (fps) at 200-MHz clock rate.   相似文献   

14.
The thermoelectric properties of ErAs:InGaAlAs were characterized by variable-temperature measurements of thermal conductivity, electrical conductivity, and Seebeck coefficient from 300 K to 600 K, which shows that the ZT(, where and T are the Seebeck coefficient, electrical conductivity, thermal conductivity, and absolute temperature, respectively) of the material is greater than 1 at 600 K. Power generator modules of segmented elements of 300 μm Bi2Te3 and 50 μm thickness ErAs:(InGaAs)1−x (InAlAs) x were fabricated and characterized. The segmented element is 1 mm × 1 mm in area, and each segment can work at different temperature ranges. An output power up to 5.5 W and an open-circuit voltage over 10 V were measured. Theoretical calculations were carried out and the results indicate that the performance of the thermoelectric generator modules can be improved further by improving the thermoelectric properties of the element material, and reducing the electrical and thermal parasitic losses.  相似文献   

15.
Global motion estimation and compensation (GME/GMC) is an important video processing technique and has been applied to many applications including video segmentation, sprite/mosaic generation, and video coding. In MPEG-4 Advanced Simple Profile (ASP), GME/GMC is adopted to compensate camera motions. Since GME is important, many GME algorithms have been proposed. These algorithms have two common characteristics, huge computation complexity and ultra large memory bandwidth. Hence for realtime applications, a hardware accelerator of GME is required. However, there are many hardware design challenges of GME like irregular memory access and huge memory bandwidth, and only few hardware architectures have been proposed. In this paper, we first analyzed three typical algorithms of GME, and a fast GME algorithm is proposed. By using temporal prediction and skipping the redundant computation, 91% memory bandwidth and 80% iterations are saved, while the performance is kept, compared to Gradient Descent in MPEG-4 Verification Model. Based on our proposed algorithm, a hardware architecture of GME is also presented. A new scheduling, Reference-Based Scheduling, is developed to solve the irregular memory access problem. An interleaved memory arrangement is applied to satisfy the memory access requirement of interpolation. The total gate count of hardware implementation is 131 K with Artisan 0.18 um cell library, and the internal memory size is about 7.9 Kb. Its processing ability is MPEG-4 ASP@L3, which is 352×288 with 30 fps, at 30 MHz.
Liang-Gee ChenEmail:
  相似文献   

16.
A novel algorithm and architecture for computing the optimal decision feedback equalizer (DFE) coefficients from a channel state information (CSI) estimate is present. The proposed algorithm maps well onto a linear chain of n highly pipelineable CORDIC based processing elements. It is thus well suited for VLSI implementation. Due to the very regular data flow, the number of processing elements may be reduced without sacrificing computational latency by recycling the data through a chain of less than n processing elements.The proposed architecture computes the optimal DFE coefficients of a twelve tap symbol spaced DFE suitable for HIPERLAN I in 2.7 s and requires only 0.7 mm2 area on a 0.35 m CMOS process, assuming a clock frequency of 100 MHz.  相似文献   

17.
This paper investigates the properties of the two-variable polynomialu (, z) built on the first column of the adjoint matrix ofI -C, whereC is a given Hermitian Toeplitz matrix. In particular, the stability properties ofu (,z) are discussed and are shown to depend essentially on the location of X with respect to the eigenvalues ofC. The eigenvectors ofC, which have recently found some applications in signal processing and estimation theory, are obtained from the polynomialu(,z) when tends to the eigenvalues ofC. This allows one to derive several results concerning the eigenpolynomials, including those for the case of multiple eigenvalues.  相似文献   

18.
分析了在多维分组交换结构(MPSF, multi-dimensional packet switching fabric)中,采用传统的缓存设置方法时存在的缺陷.为此,提出了一种新的缓存设置方法.该方法可简化交换节点调度策略的实现,提升交换结构的吞吐量.分析和仿真结果验证,相比与传统的在MPP (massively parallel processor)中采用的缓存设置方法,采用新方法可获得更好的吞吐量性能,且降低实际开销.  相似文献   

19.
The equivalent impedance of the conventional ideal inductance implemented from two second-generation current conveyors is firstly calculated taking all the parasitic elements into account. Its equivalent electrical schema, which comprises six components is characterized. It is demonstrated that the most important deviation at high frequencies comes from the phase shifts of the transfers of the conveyors. Compensation of these effects are obtained from the first-order compensation method using a single additive resistor. SPICE simulations using an industrial BiCMOS process are used to demonstrate the validity of this approach. As an example, the current conveyors being DC biased with I0 = 100 A and supplied under ±2.5 V, an inductance of 0.67 H was found directly usable without compensation up to about 15 MHz. This frequency range is then extended up to about 100 MHz when the circuit has been compensated from a single resistor of 75 .  相似文献   

20.
A current feedback op-amp (CFOA) has the advantage that feedback structures in current-mode circuits are more easily devised because the voltage buffer at the output of the CFOA does not load the output of the integral positive, second-generation current conveyor (CCII+) that constitutes part of the CFOA. In this paper, the changes on the current mode transfer function of a linear circuit composed of a subnetwork connected to a CCII+ are determined when the CCII+ is replaced by a CFOA and a feedback component is connected from the output of the CFOA to an independent node of . Two applications of the results are provided. A new theorem is then presented which generalizes the results. This theorem should be useful for the comparison, synthesis, and improvement of linear current-mode signal processing circuits.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号