期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

ARIAL: rapid prototyping for mixed and parallel platforms

《Parallel Computing》2002,28(7-8):1179-1202

相似文献

2.

An optimum parallel architecture for high-speed real-time digital signal processing

Lang G.R. Dharssi M. Longstaff F.M. Longstaff P.S. Metford P.A.S. Rimmer M.T. 《Computer》1988,21(2):47-57

The authors describe a parallel processing architecture for real-time digital signal processing that has demonstrated virtually 100% data processing efficiency in a number of areas. The Teamed-Architecture Signal Processor (T-ASP) is a field-proven, commercially available optimal system solution to the extremely high computational and I/O rates encountered in modern digital-signal-processing environments. The design of T-ASP involves the consideration and implementation of many architectural concepts used to enhance the performance of a computer, including programmability, parallel processing, vector processing and pipelining, memory interleaving, double cache memories, multiple high-speed I/O interfaces, and segmentation of the processors for elimination of both CPU and data-handling overhead. The authors discuss hardware architecture design and implementation; hardware management; and software architecture design and implementation.<> 相似文献

3.

YHFT-D4汇编器的设计与实现

陈惠斌刘春林胡定磊陈书明《电脑与信息技术》2005,13(1):27-29,47

YHFT-D4是一款具有分簇的VLIW体系结构的DSP，它有多个功能单元，可在单个时钟周期并行地执行多条指令。指令执行的功能单元是哪个，哪些指令并行执行，这些由编译器或程序员静态决定，文章给出了YHFT-D4汇编器的设计和实现方法。相似文献

4.

Scalable high-throughput variable block size motion estimation architecture

Stephen Warrington Wai-Yip Chan Subramania Sudharsanan 《Microprocessors and Microsystems》2009,33(4):319-325

Variable block size (VBS) motion compensated prediction (MCP) provides substantial rate-distortion performance gain over conventional fixed-block-size MCP and is a key feature of the H.264/AVC video coding standard. VBS–MCP requires the encoder to perform VBS motion estimation (VBSME), a computationally complex operation. In this paper, we propose a high motion vector throughput full-search VBSME architecture. High performance is achieved by performing parallel computations for multiple pixels within a macroblock, as well as computing several candidate motion vector (MV) positions in parallel. Two implementations of the architecture are examined, a four pixel-parallel implementation, and a higher performance 16 pixel-parallel implementation. A high degree of scalability is achieved by allowing for a variable length processing element array, where more processing elements yields a higher degree of candidate MV parallelism. The proposed architecture achieves a throughput exceeding current full-search VBSME architectures. 相似文献

5.

细胞神经网在通信系统中的应用

胡少鹏乔蕾《微计算机信息》2006,22(18):39-41

细胞神经网(CNN)是一种大规模非线性模拟电路。它的两个重要特点是时间连续特性和局部连接特性,这使CNN在数字领域能实现实时、高速、并行的信号处理,并特别适于大规模集成电路(VLSI)的实现。本文阐述了CNN的结构和特点,并介绍了CNN在通信系统中的应用,主要包括信号处理及其硬件实现、混沌通信和通信中的优化问题等方面。相似文献

6.

Implementing fast Fourier transforms using the Am29500 family

《Microprocessors and Microsystems》1987,11(8):423-430

The paper discusses the implementation of fast Fourier transform (FFT) algorithms using members of the Am29500 family of microprocessors and peripherals. First the suitability of the Am29500 family for signal processing applications is discussed. The architectural requirements of FFT processors are then outlined. A parallel processing architecture using pipelining is developed and the microprogramming of the system is described. Timing and implementation details, together with some practical test results, are given. The paper concentrates mainly on radix-2 decimation-in-time (DIT) FFT computations, but the architecture described can be applied to variable-radix processors running DIT or DIF (decimation-in-frequency) algorithms. 相似文献

7.

Interprocedural Data Flow Based Optimizations for Distributed Memory Compilation

GAGAN AGRAWAL JOEL SALTZ 《Software》1997,27(5):519-545

Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications having irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (IPRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present two new interprocedural optimizations: placement of scatter routines and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D compilation system as the necessary infrastructure. We present experimental results from two codes compiled usng our system to demonstrate the efficacy of the presented schemes. ©1997 John Wiley & Sons, Ltd. 相似文献

8.

Embedding of a real time image stabilization algorithm on a parameterizable SoPC architecture a chip multi-processor approach

Lionel Damez Loic Sieler Alexis Landrault Jean Pierre D��rutin 《Journal of Real-Time Image Processing》2011,6(1):47-58

Highly regular multi-processor architectures are suitable for inherently highly parallelizable applications such as most of the image processing domain. Systems embedded in a single programmable chip platform (SoPC) allow hardware designers to tailor every aspect of the architecture in order to match the specific application needs. These platforms are now large enough to embed an increasing number of cores, allowing implementation of a multi-processor architecture with an embedded communication network. In this paper we present the parallelization and the embedding of a real time image stabilization algorithm on a SoPC platform. Our overall hardware implementation method is based upon meeting algorithm processing power requirements and communication needs with refinement of a generic parallel architecture model. Actual implementation is done by the choice and parameterization of readily available reconfigurable hardware modules and customizable commercially available IPs (Intellectual Property). We present both software and hardware implementation with performance results on a Xilinx SoPC target. 相似文献

9.

Finite-element neural networks for solving differential equations 总被引：1，自引：0，他引：1

Ramuhalli P. Udpa L. Udpa S.S. 《Neural Networks, IEEE Transactions on》2005,16(6):1381-1392

The solution of partial differential equations (PDE) arises in a wide variety of engineering problems. Solutions to most practical problems use numerical analysis techniques such as finite-element or finite-difference methods. The drawbacks of these approaches include computational costs associated with the modeling of complex geometries. This paper proposes a finite-element neural network (FENN) obtained by embedding a finite-element model in a neural network architecture that enables fast and accurate solution of the forward problem. Results of applying the FENN to several simple electromagnetic forward and inverse problems are presented. Initial results indicate that the FENN performance as a forward model is comparable to that of the conventional finite-element method (FEM). The FENN can also be used in an iterative approach to solve inverse problems associated with the PDE. Results showing the ability of the FENN to solve the inverse problem given the measured signal are also presented. The parallel nature of the FENN also makes it an attractive solution for parallel implementation in hardware and software. 相似文献

10.

Design of New Optimized Architecture Processor for DWT

《Real》2000,6(4):297-312

This paper presents a VLSI implementation of One Dimensional Direct Discrete Wavelet transform (1-D DWT). The DDWT can be viewed as a multi-resolution decomposition of a signal. This means that it decomposes a signal into its components in different frequency bands (octave bands). We propose a new architecture using parallel filters. We consider the implementation of 1-D three levels DWT. The proposed architecture is simple and offers 16-bit precision on input and output data. It is constituted of three basic units: one register bank, four filters, and a control unit. The filters are of different lengths and with new coefficients derived from Daubechies filter coefficients. The designed processor architecture requires no interface circuitry for interconnection to a standard communication bus. The architecture can compute DWT at a data rate of 12×10⁶samples/s corresponding to a typical clock speed of 12 MHz. The architecture is simulated at the gate level in VLSI. 相似文献

11.

Designing a custom DSP circuit using VHDL

Kumar K.A. Petrasko B. 《Micro, IEEE》1990,10(5):46-53

相似文献

12.

Achieving supercomputer performane for neural net simulation withan array of digital signal processors

Muller U.A. Baumle B. Kohler P. Gunzinger A. Guggenbuhl W. 《Micro, IEEE》1992,12(5):55-65

Music, a digital signal processor (DSP)-based system with a parallel distributed-memory architecture that provides enormous computing power yet retains the flexibility of a general-purpose computer, is discussed. It is shown that Music reaches a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers. The Music system hardware, programming, and backpropagation implementation are described 相似文献

13.

Optimizations and OpenMP implementation for the direct simulation Monte Carlo method

Da Gao Thomas E. Schwartzentruber 《Computers & Fluids》2011,42(1):73-81

Parallel implementation of a three-dimensional direct simulation Monte Carlo (DSMC) code employing complex data structures and dynamic memory allocation is detailed for shared memory systems using Open Multi-Processing (OpenMP). Several techniques to optimize the serial implementation of the DSMC method are first discussed. Specifically for a 3-level Cartesian grid, a Cartesian-based movement technique including particle indexing is demonstrated to result in a modest decrease in overall simulation expense of 34% compared with a ray-tracing technique combined with stored cell-connectivity. Two strategies for data localization leading to optimal usage of cache memory are demonstrated to speed up certain cell-based functions (such as collision computations) by a factor of 3.38–4.36. The shared-memory parallel implementation using OpenMP is then described in detail. Synchronization points and related critical sections are identified as major factors that impact the OpenMP parallel performance. Techniques to remove all such synchronization points in the OpenMP implementation of the DSMC method are outlined. For dual-core and quad-core systems, speedups of 1.99 and 3.74, respectively, are obtained for a (free-stream flow) test simulation with low granularity. Finally, the parallel performance of identical source code employing OpenMP is shown to be strongly correlated to the underlying computer architecture. Both Symmetric Multiprocessor (SMP) and non-uniform memory access (NUMA) systems are studied in order to achieve a better understanding of their impacts on parallel scalability when using OpenMP. 相似文献

14.

The design and implementation of a pascal-based language for array processor architectures

《Journal of Parallel and Distributed Computing》1987,4(3):266-287

This paper describes the programming language Actus II, which has evolved from the Pascal-based parallel language Actus, and has also been influenced by the architecture of array processors. This language facilitates the construction of parallel algorithms in a notation which is independent of the underlying architecture. Work on the implementation of a compiler for the ICL Distributed Array Processor (DAP) is currently under way and some aspects of this implementation are described. 相似文献

15.

双余度传感器的故障检测与识别 总被引：3，自引：0，他引：3

孟晓风王行仁《自动化学报》1996,22(4):393-400

传感器是任何测控系统中不可缺少的部件,也是最容易出故障的环节,传感器故障检测、识别和信号重构一直得到极大的重视.本文研究了仅利用双余度传感器的输出信号进行故障检测与识别问题,提出了一个双余度传感器故障信号识别器(FSD),建立了故障信号识别的基本原理,导出了相应的传感器故障检测与识别的递推算法,并给出了仿真结果. 相似文献

16.

Effective State Exploration for Model Checking on a Shared Memory Architecture

Cornelia P. Inggs Howard Barringer 《Electronic Notes in Theoretical Computer Science》2002,68(4)

In this paper we present results from experimental studies investigating implementation strategies for explicit-state temporal-logic model checking on a virtual shared-memory high-performance parallel machine architecture. In particular, a parallel state exploration algorithm using a two-queue structure for load balancing is proposed and its performance analysed at the hand of experimental studies. We then discuss implementation issues for parallel automata-theoretic model checking using this parallel state exploration algorithm. 相似文献

17.

Cluster Detection in Databases: The Adaptive Matched Filter Algorithm and Implementation 总被引：1，自引：0，他引：1

Jeremy Kepner Rita Kim 《Data mining and knowledge discovery》2003,7(1):57-79

Matched filter techniques are a staple of modern signal and image processing. They provide a firm foundation (both theoretical and empirical) for detecting and classifying patterns in statistically described backgrounds. Application of these methods to databases has become increasingly common in certain fields (e.g. astronomy). This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters in large astronomical databases. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations. 相似文献

18.

基于UML的并行自动测试系统的设计与实现

下载免费PDF全文

夏锐肖明清《计算机工程》2007,33(9):62-63,1

并行测试技术是未来自动测试系统的发展方向之一，而目前国内外尚无实用的并行测试系统体系结构的开发模型。该文介绍了基于UML的并行测试系统分析、设计和实现的全过程，为并行测试系统的研发提供了一个参考。相似文献

19.

Scalable mpNoC for massively parallel systems – Design and implementation on FPGA

M. Baklouti Y. Aydi Ph. Marquet J.L. Dekeyser M. Abid 《Journal of Systems Architecture》2010,56(7):278-292

The high chip-level integration enables the implementation of large-scale parallel processing architectures with 64 and more processing nodes on a single chip or on an FPGA device. These parallel systems require a cost-effective yet high-performance interconnection scheme to provide the needed communications between processors. The massively parallel Network on Chip (mpNoC) was proposed to address the demand for parallel irregular communications for massively parallel processing System on Chip (mppSoC). Targeting FPGA-based design, an efficient mpNoC low level RTL implementation is proposed taking into account design constraints. The proposed network is designed as an FPGA based Intellectual Property (IP) able to be configured in different communication modes. It can communicate between processors and also perform parallel I/O data transfer which is clearly a key issue in an SIMD system. The mpNoC RTL implementation presents good performances in terms of area, throughput and power consumption which are important metrics targeting an on chip implementation. mpNoC is a flexible architecture that is suitable for use in FPGA-based parallel systems. This paper introduces the basic mppSoC architecture. It mainly focuses on the mpNoC flexible IP based design and its implementation on FPGA. The integration of mpNoC in mppSoC is also described. Implementation results on a Stratix II FPGA device are given for three data-parallel applications ran on mppSoC. The obtained good performances justify the effectiveness of the proposed parallel network. It is shown that the mpNoC is a lightweight parallel network making it suitable for both small as well as large FPGA-based parallel systems. 相似文献

20.

MUSIC算法的FPGA实现

白银山李宏《计算机系统应用》2014,23(7):185-189

针对多重信号分类（MUSIC）算法计算复杂度高,难以实时实现的特点,给出了适用于均匀线阵的实数化预处理算法和实用的空间谱定义,并选择了适合FPGA硬件实现的特征值分解算法,给出了MUSIC算法FPGA实现的整体架构。仿真实验结果表明,该FPGA实现能够完成MUSIC算法的准确、快速计算。相似文献