首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The authors describe a parallel processing architecture for real-time digital signal processing that has demonstrated virtually 100% data processing efficiency in a number of areas. The Teamed-Architecture Signal Processor (T-ASP) is a field-proven, commercially available optimal system solution to the extremely high computational and I/O rates encountered in modern digital-signal-processing environments. The design of T-ASP involves the consideration and implementation of many architectural concepts used to enhance the performance of a computer, including programmability, parallel processing, vector processing and pipelining, memory interleaving, double cache memories, multiple high-speed I/O interfaces, and segmentation of the processors for elimination of both CPU and data-handling overhead. The authors discuss hardware architecture design and implementation; hardware management; and software architecture design and implementation.<>  相似文献   

3.
YHFT-D4是一款具有分簇的VLIW体系结构的DSP,它有多个功能单元,可在单个时钟周期并行地执行多条指令。指令执行的功能单元是哪个,哪些指令并行执行,这些由编译器或程序员静态决定,文章给出了YHFT-D4汇编器的设计和实现方法。  相似文献   

4.
Variable block size (VBS) motion compensated prediction (MCP) provides substantial rate-distortion performance gain over conventional fixed-block-size MCP and is a key feature of the H.264/AVC video coding standard. VBS–MCP requires the encoder to perform VBS motion estimation (VBSME), a computationally complex operation. In this paper, we propose a high motion vector throughput full-search VBSME architecture. High performance is achieved by performing parallel computations for multiple pixels within a macroblock, as well as computing several candidate motion vector (MV) positions in parallel. Two implementations of the architecture are examined, a four pixel-parallel implementation, and a higher performance 16 pixel-parallel implementation. A high degree of scalability is achieved by allowing for a variable length processing element array, where more processing elements yields a higher degree of candidate MV parallelism. The proposed architecture achieves a throughput exceeding current full-search VBSME architectures.  相似文献   

5.
细胞神经网(CNN)是一种大规模非线性模拟电路。它的两个重要特点是时间连续特性和局部连接特性,这使CNN在数字领域能实现实时、高速、并行的信号处理,并特别适于大规模集成电路(VLSI)的实现。本文阐述了CNN的结构和特点,并介绍了CNN在通信系统中的应用,主要包括信号处理及其硬件实现、混沌通信和通信中的优化问题等方面。  相似文献   

6.
The paper discusses the implementation of fast Fourier transform (FFT) algorithms using members of the Am29500 family of microprocessors and peripherals. First the suitability of the Am29500 family for signal processing applications is discussed. The architectural requirements of FFT processors are then outlined. A parallel processing architecture using pipelining is developed and the microprogramming of the system is described. Timing and implementation details, together with some practical test results, are given. The paper concentrates mainly on radix-2 decimation-in-time (DIT) FFT computations, but the architecture described can be applied to variable-radix processors running DIT or DIF (decimation-in-frequency) algorithms.  相似文献   

7.
GAGAN AGRAWAL  JOEL SALTZ 《Software》1997,27(5):519-545
Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications having irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (IPRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present two new interprocedural optimizations: placement of scatter routines and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D compilation system as the necessary infrastructure. We present experimental results from two codes compiled usng our system to demonstrate the efficacy of the presented schemes. ©1997 John Wiley & Sons, Ltd.  相似文献   

8.
Highly regular multi-processor architectures are suitable for inherently highly parallelizable applications such as most of the image processing domain. Systems embedded in a single programmable chip platform (SoPC) allow hardware designers to tailor every aspect of the architecture in order to match the specific application needs. These platforms are now large enough to embed an increasing number of cores, allowing implementation of a multi-processor architecture with an embedded communication network. In this paper we present the parallelization and the embedding of a real time image stabilization algorithm on a SoPC platform. Our overall hardware implementation method is based upon meeting algorithm processing power requirements and communication needs with refinement of a generic parallel architecture model. Actual implementation is done by the choice and parameterization of readily available reconfigurable hardware modules and customizable commercially available IPs (Intellectual Property). We present both software and hardware implementation with performance results on a Xilinx SoPC target.  相似文献   

9.
Finite-element neural networks for solving differential equations   总被引:1,自引:0,他引:1  
The solution of partial differential equations (PDE) arises in a wide variety of engineering problems. Solutions to most practical problems use numerical analysis techniques such as finite-element or finite-difference methods. The drawbacks of these approaches include computational costs associated with the modeling of complex geometries. This paper proposes a finite-element neural network (FENN) obtained by embedding a finite-element model in a neural network architecture that enables fast and accurate solution of the forward problem. Results of applying the FENN to several simple electromagnetic forward and inverse problems are presented. Initial results indicate that the FENN performance as a forward model is comparable to that of the conventional finite-element method (FEM). The FENN can also be used in an iterative approach to solve inverse problems associated with the PDE. Results showing the ability of the FENN to solve the inverse problem given the measured signal are also presented. The parallel nature of the FENN also makes it an attractive solution for parallel implementation in hardware and software.  相似文献   

10.
《Real》2000,6(4):297-312
This paper presents a VLSI implementation of One Dimensional Direct Discrete Wavelet transform (1-D DWT). The DDWT can be viewed as a multi-resolution decomposition of a signal. This means that it decomposes a signal into its components in different frequency bands (octave bands). We propose a new architecture using parallel filters. We consider the implementation of 1-D three levels DWT. The proposed architecture is simple and offers 16-bit precision on input and output data. It is constituted of three basic units: one register bank, four filters, and a control unit. The filters are of different lengths and with new coefficients derived from Daubechies filter coefficients. The designed processor architecture requires no interface circuitry for interconnection to a standard communication bus. The architecture can compute DWT at a data rate of 12×106samples/s corresponding to a typical clock speed of 12 MHz. The architecture is simulated at the gate level in VLSI.  相似文献   

11.
12.
Music, a digital signal processor (DSP)-based system with a parallel distributed-memory architecture that provides enormous computing power yet retains the flexibility of a general-purpose computer, is discussed. It is shown that Music reaches a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers. The Music system hardware, programming, and backpropagation implementation are described  相似文献   

13.
Parallel implementation of a three-dimensional direct simulation Monte Carlo (DSMC) code employing complex data structures and dynamic memory allocation is detailed for shared memory systems using Open Multi-Processing (OpenMP). Several techniques to optimize the serial implementation of the DSMC method are first discussed. Specifically for a 3-level Cartesian grid, a Cartesian-based movement technique including particle indexing is demonstrated to result in a modest decrease in overall simulation expense of 34% compared with a ray-tracing technique combined with stored cell-connectivity. Two strategies for data localization leading to optimal usage of cache memory are demonstrated to speed up certain cell-based functions (such as collision computations) by a factor of 3.38–4.36. The shared-memory parallel implementation using OpenMP is then described in detail. Synchronization points and related critical sections are identified as major factors that impact the OpenMP parallel performance. Techniques to remove all such synchronization points in the OpenMP implementation of the DSMC method are outlined. For dual-core and quad-core systems, speedups of 1.99 and 3.74, respectively, are obtained for a (free-stream flow) test simulation with low granularity. Finally, the parallel performance of identical source code employing OpenMP is shown to be strongly correlated to the underlying computer architecture. Both Symmetric Multiprocessor (SMP) and non-uniform memory access (NUMA) systems are studied in order to achieve a better understanding of their impacts on parallel scalability when using OpenMP.  相似文献   

14.
This paper describes the programming language Actus II, which has evolved from the Pascal-based parallel language Actus, and has also been influenced by the architecture of array processors. This language facilitates the construction of parallel algorithms in a notation which is independent of the underlying architecture. Work on the implementation of a compiler for the ICL Distributed Array Processor (DAP) is currently under way and some aspects of this implementation are described.  相似文献   

15.
双余度传感器的故障检测与识别   总被引:3,自引:0,他引:3  
传感器是任何测控系统中不可缺少的部件,也是最容易出故障的环节,传感器故障检测、 识别和信号重构一直得到极大的重视.本文研究了仅利用双余度传感器的输出信号进行故障 检测与识别问题,提出了一个双余度传感器故障信号识别器(FSD),建立了故障信号识别的基 本原理,导出了相应的传感器故障检测与识别的递推算法,并给出了仿真结果.  相似文献   

16.
In this paper we present results from experimental studies investigating implementation strategies for explicit-state temporal-logic model checking on a virtual shared-memory high-performance parallel machine architecture. In particular, a parallel state exploration algorithm using a two-queue structure for load balancing is proposed and its performance analysed at the hand of experimental studies. We then discuss implementation issues for parallel automata-theoretic model checking using this parallel state exploration algorithm.  相似文献   

17.
Matched filter techniques are a staple of modern signal and image processing. They provide a firm foundation (both theoretical and empirical) for detecting and classifying patterns in statistically described backgrounds. Application of these methods to databases has become increasingly common in certain fields (e.g. astronomy). This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters in large astronomical databases. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations.  相似文献   

18.
夏锐  肖明清 《计算机工程》2007,33(9):62-63,1
并行测试技术是未来自动测试系统的发展方向之一,而目前国内外尚无实用的并行测试系统体系结构的开发模型。该文介绍了基于UML的并行测试系统分析、设计和实现的全过程,为并行测试系统的研发提供了一个参考。  相似文献   

19.
The high chip-level integration enables the implementation of large-scale parallel processing architectures with 64 and more processing nodes on a single chip or on an FPGA device. These parallel systems require a cost-effective yet high-performance interconnection scheme to provide the needed communications between processors. The massively parallel Network on Chip (mpNoC) was proposed to address the demand for parallel irregular communications for massively parallel processing System on Chip (mppSoC). Targeting FPGA-based design, an efficient mpNoC low level RTL implementation is proposed taking into account design constraints. The proposed network is designed as an FPGA based Intellectual Property (IP) able to be configured in different communication modes. It can communicate between processors and also perform parallel I/O data transfer which is clearly a key issue in an SIMD system. The mpNoC RTL implementation presents good performances in terms of area, throughput and power consumption which are important metrics targeting an on chip implementation. mpNoC is a flexible architecture that is suitable for use in FPGA-based parallel systems. This paper introduces the basic mppSoC architecture. It mainly focuses on the mpNoC flexible IP based design and its implementation on FPGA. The integration of mpNoC in mppSoC is also described. Implementation results on a Stratix II FPGA device are given for three data-parallel applications ran on mppSoC. The obtained good performances justify the effectiveness of the proposed parallel network. It is shown that the mpNoC is a lightweight parallel network making it suitable for both small as well as large FPGA-based parallel systems.  相似文献   

20.
针对多重信号分类(MUSIC)算法计算复杂度高,难以实时实现的特点,给出了适用于均匀线阵的实数化预处理算法和实用的空间谱定义,并选择了适合FPGA硬件实现的特征值分解算法,给出了MUSIC算法FPGA实现的整体架构。仿真实验结果表明,该FPGA实现能够完成MUSIC算法的准确、快速计算。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号